Bug 29337 - -mfpmath=387 doesn't use fistp for double-to-integer conversion
Summary: -mfpmath=387 doesn't use fistp for double-to-integer conversion
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-03 21:40 UTC by Seongbae Park
Modified: 2006-10-05 07:08 UTC (History)
2 users (show)

See Also:
Host: i686-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Seongbae Park 2006-10-03 21:40:06 UTC
-mfpmath=387 used on x86_64 is supposed to force gcc to use 387 for floating point{math. However, even with the option, gcc generates cvtts{s,d}2* instead of fistp* for floating-point to integer conversion. 

This makes a difference if/when the extra precision of 387 makes difference to the conversion - which -mfpmath=387 is supposed to prevent.
Comment 1 Pawel Sikora 2006-10-03 21:57:51 UTC
(In reply to comment #0)

> This makes a difference if/when the extra precision of 387 makes difference 

for an extra prescision try to use a `long double'.

$ cat fp.cpp
int convert( long double x ) { return (int)x; }
int convert( double x ) { return (int)x; }

convert(long double):
        fldt    8(%rsp)
        fnstcw  -10(%rsp)
        movzwl  -10(%rsp), %eax
        orb     $12, %ah
        movw    %ax, -12(%rsp)
        fldcw   -12(%rsp)
        fistpl  -16(%rsp)
        fldcw   -10(%rsp)
        movl    -16(%rsp), %eax
        ret

convert(double):
        cvttsd2si       %xmm0, %eax
        ret
Comment 2 Seongbae Park 2006-10-03 23:37:54 UTC
(In reply to comment #1)
> (In reply to comment #0)
> 
> > This makes a difference if/when the extra precision of 387 makes difference 
> 
> for an extra prescision try to use a `long double'.

I'm afraid you're missing my point.
The problem is that for 64-bit and 32-bit floating-point to integer conversion,
x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN -mfpmath=387.
This defeats the purpose of the option -mfpmath=387 which is supposed to make floating-point computations to use 387, instead of SSE2.

Comment 3 Uroš Bizjak 2006-10-04 06:46:49 UTC
> I'm afraid you're missing my point.
> The problem is that for 64-bit and 32-bit floating-point to integer conversion,
> x86 (32bit) target uses fistp* whereas x86_64 (64-bit) target uses cvt* WHEN
> -mfpmath=387.
> This defeats the purpose of the option -mfpmath=387 which is supposed to make
> floating-point computations to use 387, instead of SSE2.

If SSE is available, then SSE cvt* is used in order to avoid long control-word setting sequences. This is cheaper even if we have to move value from x87 register, as cvt* can handle mem->reg transformations.

If you really need fistp* sequence, you can try with -mno-sse2 (you can't just disable sse on x86_64 target) or perhaps use -msse3, where fisttp insn will be generated.

Saying that, I wonder where excess precision effects come into play here. We are talking about truncate-to-integer instruction, so I would really like to see an example of this effect.
> 

Comment 4 Andrew Pinski 2006-10-05 04:38:38 UTC
Actually the reason why it uses cvttsd2si is two fold, first cvttsd2si does not need to act on a SSE register which is where the argument is passed in.
In fact we use cvttsd2si for 32bit also which actually simplifies the code a lot.
Compare:
convert:
        cvttsd2si       4(%esp), %eax
        ret

To:
convert:
        subl    $8, %esp
        fnstcw  6(%esp)
        fldl    12(%esp)
        movzwl  6(%esp), %eax
        movb    $12, %ah
        movw    %ax, 4(%esp)
        fldcw   4(%esp)
        fistpl  (%esp)
        fldcw   6(%esp)
        movl    (%esp), %eax
        addl    $8, %esp
        ret
Comment 5 Andrew Pinski 2006-10-05 04:45:57 UTC
(In reply to comment #0)
> This makes a difference if/when the extra precision of 387 makes difference to
> the conversion - which -mfpmath=387 is supposed to prevent.
that is only done when doing lots of math so then you round at the end before using cvttsd2si which is ok and the correct thing.  The cases where rounding is "wrong" are really undefined anyways.
Comment 6 Seongbae Park 2006-10-05 05:00:23 UTC
For example:

# cat m.c
int todouble(double a, double b) {
  return (int)(a+b);
}
#

With 4.1.0 i686-unknown-linux-gnu target:

# gcc -O2 m.c -S
# cat m.s
...
        .type   todouble, @function
todouble:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        fnstcw  -2(%ebp)
        fldl    16(%ebp)
        faddl   8(%ebp)
        movzwl  -2(%ebp), %eax
        orw     $3072, %ax
        movw    %ax, -4(%ebp)
        fldcw   -4(%ebp)
        fistpl  -8(%ebp)
        fldcw   -2(%ebp)
        movl    -8(%ebp), %eax
        leave
        ret
...

With x86_64-unknown-linux-gnu (without -mfpmath=387):

# gcc -O2 m.c -S
# cat m.s
...
todouble:
.LFB2:
        addsd   %xmm1, %xmm0
        cvttsd2si       %xmm0, %eax
        ret


With x86_64-unknown-linux-gnu with -mfpmath=387:
# gcc -O2 m.c -mfpmath=387 -S
# cat m.s
...
todouble:
.LFB2:
        movsd   %xmm0, -8(%rsp)
        fldl    -8(%rsp)
        movsd   %xmm1, -8(%rsp)
        fldl    -8(%rsp)
        faddp   %st, %st(1)
        fstpl   -8(%rsp)
        movlpd  -8(%rsp), %xmm0
        cvttsd2si       %xmm0, %eax
        ret
#

All three codes can behave differently.
There's no doubt that using cvt* is faster, but that's not the point either.
I'm arguing that the purpose of -mfpmath=387 is to be compatible with 387 behavior, hence it should imply -mno-sse. 
The fact that -mfpmath=sse exists implies that -mfpmath=387 turns off sse
(and that's what the description of -mfpmath=387 says).
Clearly this is not the current behavior of -mfpmath=387 - so if this behavior is not going to be fixed, at the least,
the documentation should be updated to reflect that.

Having said that,
-mno-sse is an acceptable workaround so I won't pursue the bug anymore.
Comment 7 Andrew Pinski 2006-10-05 05:05:42 UTC
Subject: Re:  -mfpmath=387 doesn't use fistp for
	double-to-integer conversion

On Thu, 2006-10-05 at 05:00 +0000, seongbae dot park at gmail dot com
wrote:
> With 4.1.0 i686-unknown-linux-gnu target:
> 
> # gcc -O2 m.c -S 

try -O2 -msse2, you get:
_Z8todoubledd:
        subl    $12, %esp
        fldl    24(%esp)
        faddl   16(%esp)
        fstpl   (%esp)
        movsd   (%esp), %xmm0
        addl    $12, %esp
        cvttsd2si       %xmm0, %eax
        ret


Though I think the movsd should not be there but that is a different
issue.

-- Pinski

Comment 8 Uroš Bizjak 2006-10-05 07:08:54 UTC
> try -O2 -msse2, you get:
> _Z8todoubledd:
>         subl    $12, %esp
>         fldl    24(%esp)
>         faddl   16(%esp)
>         fstpl   (%esp)
>         movsd   (%esp), %xmm0
>         addl    $12, %esp
>         cvttsd2si       %xmm0, %eax
>         ret
> 
> 
> Though I think the movsd should not be there but that is a different
> issue.

This is PR 19398. I have a patch that adds a bunch of peephole2 patterns to address this particular issue. The patch is already approved and waits for stage1.