FP compares and TARGET_SSE_MATH

Jan Hubicka hubicka@ucw.cz
Thu Dec 23 16:59:00 GMT 2004


> Jan Hubicka wrote:
> 
> >Do you have any benchmarks that suggest that avoiding the mixture of
> >both is a lost?  I briefly benchmarked this on SPECfp at the time I was
> >implementing this and using both sets was a win that time, but times
> >might've changed.
> > 
> >
> I have tried some FP benchmark with -mno-sse, but -mno-sse also disabled 
> cvtt* patterns, so I don't trust the results. But consider this simple 
> testcase:
> 
> double test (double a, double b) {
>        if (a > b)
>         return a;
>        else
>         return b;
> }
> 
> with '-O2 -S -march=pentium4 -mfpmath=sse -ffast-math':
> 
> test:
>        pushl   %ebp
>        movl %esp, %ebp
>        subl $8, %esp
>        fldl 8(%ebp)
>        movsd   16(%ebp), %xmm0
>        movsd   %xmm0, -8(%ebp)
>        fldl -8(%ebp)
>        fcomi   %st(1), %st
>        fcmovb  %st(1), %st
>        fstp %st(1)
>        leave
>        ret

This testcase doesn't seem terribly bad to my eyes.  SSE only equivalent
would either need the horrible SSE math on logicals or a branch and that
is not much better than the conditional move (assuming that the cmov
hadware implementation is not slow enought to be loss in all cases - I
don't know what P4 performance is on this, K8 is pretty slow but wins in
the case branch is badly predictable)
> 
> >I also sent patch to teach regclass to discover the dependencies (ie to
> >avoid putting register X in x87 when it is used in comparsion operator
> >with register Y that needs to live in SSE).  perhaps we might thing
> >about sollution in this dirrection. (I originally gave up mostly because
> >new-RA seemed to make progress)
> >
> > 
> >
> Would this patch solve these situations, when -mfpmath=387 is choosen:
> 
> 80a0f0a:       d9 03                   flds   (%ebx)
> 80a0f0c:       d9 5c 24 4c             fstps  0x4c(%esp,1)
> 80a0f10:       f3 0f 10 44 24 4c       movss  0x4c(%esp,1),%xmm0
> 80a0f16:       0f 2f 00                comiss (%eax),%xmm0
> 80a0f19:       0f 43 c3                cmovae %ebx,%eax
> 
> And similar for -mfpmath=sse:
> 
> 804d112:       dd 43 28                fldl   0x28(%ebx)
> 804d115:       f2 0f 11 04 24          movsd  %xmm0,(%esp,1)
> 804d11a:       dd 04 24                fldl   (%esp,1)
> 804d11d:       31 c0                   xor    %eax,%eax
> 804d11f:       df f1                   fcomip %st(1),%st
> 804d121:       0f 95 c0                setne  %al

Yes, such scenarios was basically what I had in mind - xmm0 having SSE
preferrence would force the comparsion to get SSE preferrence as a
whole.

Honza



More information about the Gcc mailing list