-mfpmath=387 used on x86_64 is supposed to force gcc to use 387 for floating point{math. However, even with the option, gcc generates cvtts{s,d}2* instead of fistp* for floating-point to integer conversion. This makes a difference if/when the extra precision of 387 makes difference to the conversion - which -mfpmath=387 is supposed to prevent.
(In reply to comment #0) > This makes a difference if/when the extra precision of 387 makes difference for an extra prescision try to use a `long double'. $ cat fp.cpp int convert( long double x ) { return (int)x; } int convert( double x ) { return (int)x; } convert(long double): fldt 8(%rsp) fnstcw -10(%rsp) movzwl -10(%rsp), %eax orb $12, %ah movw %ax, -12(%rsp) fldcw -12(%rsp) fistpl -16(%rsp) fldcw -10(%rsp) movl -16(%rsp), %eax ret convert(double): cvttsd2si %xmm0, %eax ret
(In reply to comment #1) > (In reply to comment #0) > > > This makes a difference if/when the extra precision of 387 makes difference > > for an extra prescision try to use a `long double'. I'm afraid you're missing my point. The problem is that for 64-bit and 32-bit floating-point to integer conversion, x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN -mfpmath=387. This defeats the purpose of the option -mfpmath=387 which is supposed to make floating-point computations to use 387, instead of SSE2.
> I'm afraid you're missing my point. > The problem is that for 64-bit and 32-bit floating-point to integer conversion, > x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN > -mfpmath=387. > This defeats the purpose of the option -mfpmath=387 which is supposed to make > floating-point computations to use 387, instead of SSE2. If SSE is available, then SSE cvt* is used in order to avoid long control-word setting sequences. This is cheaper even if we have to move value from x87 register, as cvt* can handle mem->reg transformations. If you really need fistp* sequence, you can try with -mno-sse2 (you can't just disable sse on x86_64 target) or perhaps use -msse3, where fisttp insn will be generated. Saying that, I wonder where excess precision effects come into play here. We are talking about truncate-to-integer instruction, so I would really like to see an example of this effect. >
Actually the reason why it uses cvttsd2si is two fold, first cvttsd2si does not need to act on a SSE register which is where the argument is passed in. In fact we use cvttsd2si for 32bit also which actually simplifies the code a lot. Compare: convert: cvttsd2si 4(%esp), %eax ret To: convert: subl $8, %esp fnstcw 6(%esp) fldl 12(%esp) movzwl 6(%esp), %eax movb $12, %ah movw %ax, 4(%esp) fldcw 4(%esp) fistpl (%esp) fldcw 6(%esp) movl (%esp), %eax addl $8, %esp ret
(In reply to comment #0) > This makes a difference if/when the extra precision of 387 makes difference to > the conversion - which -mfpmath=387 is supposed to prevent. that is only done when doing lots of math so then you round at the end before using cvttsd2si which is ok and the correct thing. The cases where rounding is "wrong" are really undefined anyways.
For example: # cat m.c int todouble(double a, double b) { return (int)(a+b); } # With 4.1.0 i686-unknown-linux-gnu target: # gcc -O2 m.c -S # cat m.s ... .type todouble, @function todouble: pushl %ebp movl %esp, %ebp subl $8, %esp fnstcw -2(%ebp) fldl 16(%ebp) faddl 8(%ebp) movzwl -2(%ebp), %eax orw $3072, %ax movw %ax, -4(%ebp) fldcw -4(%ebp) fistpl -8(%ebp) fldcw -2(%ebp) movl -8(%ebp), %eax leave ret ... With x86_64-unknown-linux-gnu (without -mfpmath=387): # gcc -O2 m.c -S # cat m.s ... todouble: .LFB2: addsd %xmm1, %xmm0 cvttsd2si %xmm0, %eax ret With x86_64-unknown-linux-gnu with -mfpmath=387: # gcc -O2 m.c -mfpmath=387 -S # cat m.s ... todouble: .LFB2: movsd %xmm0, -8(%rsp) fldl -8(%rsp) movsd %xmm1, -8(%rsp) fldl -8(%rsp) faddp %st, %st(1) fstpl -8(%rsp) movlpd -8(%rsp), %xmm0 cvttsd2si %xmm0, %eax ret # All three codes can behave differently. There's no doubt that using cvt* is faster, but that's not the point either. I'm arguing that the purpose of -mfpmath=387 is to be compatible with 387 behavior, hence it should imply -mno-sse. The fact that -mfpmath=sse exists implies that -mfpmath=387 turns off sse (and that's what the description of -mfpmath=387 says). Clearly this is not the current behavior of -mfpmath=387 - so if this behavior is not going to be fixed, at the least, the documentation should be updated to reflect that. Having said that, -mno-sse is an acceptable workaround so I won't pursue the bug anymore.
Subject: Re: -mfpmath=387 doesn't use fistp for double-to-integer conversion On Thu, 2006-10-05 at 05:00 +0000, seongbae dot park at gmail dot com wrote: > With 4.1.0 i686-unknown-linux-gnu target: > > # gcc -O2 m.c -S try -O2 -msse2, you get: _Z8todoubledd: subl $12, %esp fldl 24(%esp) faddl 16(%esp) fstpl (%esp) movsd (%esp), %xmm0 addl $12, %esp cvttsd2si %xmm0, %eax ret Though I think the movsd should not be there but that is a different issue. -- Pinski
> try -O2 -msse2, you get: > _Z8todoubledd: > subl $12, %esp > fldl 24(%esp) > faddl 16(%esp) > fstpl (%esp) > movsd (%esp), %xmm0 > addl $12, %esp > cvttsd2si %xmm0, %eax > ret > > > Though I think the movsd should not be there but that is a different > issue. This is PR 19398. I have a patch that adds a bunch of peephole2 patterns to address this particular issue. The patch is already approved and waits for stage1.