Bug 19009 - Loading of FP constants into FP reg via SSE reg
Summary: Loading of FP constants into FP reg via SSE reg
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.0
: P2 minor
Target Milestone: 4.0.0
Assignee: Richard Henderson
URL:
Keywords: missed-optimization, ssemmx
Depends on:
Blocks:
 
Reported: 2004-12-15 12:15 UTC by Uroš Bizjak
Modified: 2005-01-14 13:16 UTC (History)
2 users (show)

See Also:
Host: pentium4-pc-linux-gnu
Target: pentium4-pc-linux-gnu
Build: pentium4-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2005-01-12 02:49:20


Attachments
Reduced testcase from PovRay (560 bytes, text/plain)
2004-12-16 13:17 UTC, Uroš Bizjak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2004-12-15 12:15:02 UTC
This is something I noticed in the build of povray-3.50c, with '-march=pentium4
-mfpmath=387'. Loading of 0.0 into the FP register is quite strange:

 8051efb:    66 0f ef c9              pxor   %xmm1,%xmm1        <- clear SSE reg
 8051eff:    f2 0f 11 4c 24 38        movsd  %xmm1,0x38(%esp,1) <- move it to stack
 8051f05:    89 de                    mov    %ebx,%esi
 8051f07:    31 c9                    xor    %ecx,%ecx
 8051f09:    dd 03                    fldl   (%ebx)
 8051f0b:    d8 cd                    fmul   %st(5),%st
 8051f0d:    dd 43 08                 fldl   0x8(%ebx)
 8051f10:    d8 cd                    fmul   %st(5),%st
 8051f12:    de c1                    faddp  %st,%st(1)
 8051f14:    dd 43 10                 fldl   0x10(%ebx)
 8051f17:    d8 cc                    fmul   %st(4),%st
 8051f19:    de c1                    faddp  %st,%st(1)
 8051f1b:    d8 c2                    fadd   %st(2),%st
 8051f1d:    f2 0f 11 0c 24           movsd  %xmm1,(%esp,1) <- move it to stack
 8051f22:    dd 04 24                 fldl   (%esp,1)      <- load it to FP reg
                        
...
 8051f39:    dd 44 24 38              fldl   0x38(%esp,1)  <- load it to FP reg
 8051f3d:    df f1                    fcomip %st(1),%st
...

This could be implemented by fldz...

Another example of constant loading via XMM reg:

 ...
 805552b:    f2 0f 10 05 e0 63 16     movsd  0x81663e0,%xmm0    <- load consant
 8055532:    08
 8055533:    f2 0f 11 44 24 20        movsd  %xmm0,0x20(%esp,1) <- move to stack
 8055539:    dd 44 24 20              fldl   0x20(%esp,1)
 805553d:    df f1                    fcomip %st(1),%st
 ...
Comment 1 Wolfgang Bangerth 2004-12-15 14:13:18 UTC
Uros, do you have a testcase? 
Comment 2 Uroš Bizjak 2004-12-16 06:28:21 UTC
(In reply to comment #1)

I'm trying to make a testcase out of PovRay sources that trigger this behaviour.
I think that "*movdf_nointeger" is somehow confused and chooses SSE register to
load constant. After that, reload moves SSE reg to FP reg via stack. Perhaps
ix86_preferred_reload_class () needs some fine-tuning regarding to -mfpmath
parameter?
Comment 3 Uroš Bizjak 2004-12-16 13:17:27 UTC
Created attachment 7754 [details]
Reduced testcase from PovRay

Compile this source with:
gcc -O3 -march=pentium4 -mfpmath=387 -ffast-math -D__NO_MATH_INLINES zero.c

This has something to do with -O3. With -O2, sse insn are _not_ generated.
Comment 4 Uroš Bizjak 2004-12-16 13:50:29 UTC
-finline-functions is needed to trigger the bug with -O2. 

The attached testcase should be compiled with '-O2 -march=pentium4 -mfpmath=387
-ffast-math -D__NO_MATH_INLINES -finline-functions' to get:
	...
	pxor	%xmm0, %xmm0
	movsd	%xmm0, -16(%ebp)
	fldl	-16(%ebp)
	fcomip	%st(1), %st
	je	.L23
	fld	%st(0)
	...

and:

grep xmm zero.s
        pxor    %xmm0, %xmm0
        movsd   %xmm0, -16(%ebp)
        movsd   %xmm0, -16(%ebp)
        movsd   %xmm0, -16(%ebp)
        movsd   %xmm0, -16(%ebp)

        movsd   %xmm0, 8(%edx)
        movsd   %xmm0, (%edx)

where movsds are followed by:
	fldl	-16(%ebp)

BTW: "#include <math.h> can be removed from testcase to avoid
-D__NO_MATH_INLINES. Ther result will be the same.
Comment 5 Uroš Bizjak 2004-12-16 14:35:51 UTC
Another candidate for TARGET_SSE_MATH cleanup...

(insn 21 20 22 0 (set (reg:CCFP 17 flags)
        (compare:CCFP (reg/v:DF 60 [ mod ])
            (reg:DF 70))) 24 {*cmpfp_i_sse} (nil)
    (nil))
Comment 6 Uroš Bizjak 2005-01-07 11:29:35 UTC
This bugreport shows similar problem for x87 code as PR 19252 shoes for SSE
code. The cause of both probems is described in
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00394.html
Comment 7 Richard Henderson 2005-01-14 00:55:24 UTC
Fixed.
Comment 8 Uroš Bizjak 2005-01-14 13:08:01 UTC
It looks that the problem with FP compares forcing constant into wrong register
part of this bug is solved. However, register allocator can still be confused
when a variable is initialized to constant value:

grep fldz povray_dump.sse | wc -l
    117
grep fld1 povray_dump.sse | wc -l
    141
grep pxor povray_dump.387 | wc -l
     20

I'll try to provide a small testcase. The testcase (zero.c) attached to this
bugreport is fixed, but anyway I suggest that we reopen this bug.
Comment 9 Andrew Pinski 2005-01-14 13:16:00 UTC
(In reply to comment #8)
> It looks that the problem with FP compares forcing constant into wrong register
> part of this bug is solved. However, register allocator can still be confused
> when a variable is initialized to constant value:

Can you open a new one with the full testcase, since this is the register allocator being dumb which is 
usual in GCC.