This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: FP compares and TARGET_SSE_MATH
> Jan Hubicka wrote:
>
> >Do you have any benchmarks that suggest that avoiding the mixture of
> >both is a lost? I briefly benchmarked this on SPECfp at the time I was
> >implementing this and using both sets was a win that time, but times
> >might've changed.
> >
> >
> I have tried some FP benchmark with -mno-sse, but -mno-sse also disabled
> cvtt* patterns, so I don't trust the results. But consider this simple
> testcase:
>
> double test (double a, double b) {
> if (a > b)
> return a;
> else
> return b;
> }
>
> with '-O2 -S -march=pentium4 -mfpmath=sse -ffast-math':
>
> test:
> pushl %ebp
> movl %esp, %ebp
> subl $8, %esp
> fldl 8(%ebp)
> movsd 16(%ebp), %xmm0
> movsd %xmm0, -8(%ebp)
> fldl -8(%ebp)
> fcomi %st(1), %st
> fcmovb %st(1), %st
> fstp %st(1)
> leave
> ret
This testcase doesn't seem terribly bad to my eyes. SSE only equivalent
would either need the horrible SSE math on logicals or a branch and that
is not much better than the conditional move (assuming that the cmov
hadware implementation is not slow enought to be loss in all cases - I
don't know what P4 performance is on this, K8 is pretty slow but wins in
the case branch is badly predictable)
>
> >I also sent patch to teach regclass to discover the dependencies (ie to
> >avoid putting register X in x87 when it is used in comparsion operator
> >with register Y that needs to live in SSE). perhaps we might thing
> >about sollution in this dirrection. (I originally gave up mostly because
> >new-RA seemed to make progress)
> >
> >
> >
> Would this patch solve these situations, when -mfpmath=387 is choosen:
>
> 80a0f0a: d9 03 flds (%ebx)
> 80a0f0c: d9 5c 24 4c fstps 0x4c(%esp,1)
> 80a0f10: f3 0f 10 44 24 4c movss 0x4c(%esp,1),%xmm0
> 80a0f16: 0f 2f 00 comiss (%eax),%xmm0
> 80a0f19: 0f 43 c3 cmovae %ebx,%eax
>
> And similar for -mfpmath=sse:
>
> 804d112: dd 43 28 fldl 0x28(%ebx)
> 804d115: f2 0f 11 04 24 movsd %xmm0,(%esp,1)
> 804d11a: dd 04 24 fldl (%esp,1)
> 804d11d: 31 c0 xor %eax,%eax
> 804d11f: df f1 fcomip %st(1),%st
> 804d121: 0f 95 c0 setne %al
Yes, such scenarios was basically what I had in mind - xmm0 having SSE
preferrence would force the comparsion to get SSE preferrence as a
whole.
Honza