[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

uros at kss-loka dot si gcc-bugzilla@gcc.gnu.org
Wed May 31 10:57:00 GMT 2006



------- Comment #7 from uros at kss-loka dot si  2006-05-31 10:56 -------
IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure
luck.

Looking into 3.x RTL, these things can be observed:

Instruction that multiplies pA0 and rB0 is described as:

__.20.combine:

(insn 75 73 76 2 (set (reg:DF 84)
        (mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64])
            (reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65
(nil))
    (nil))

At this point, first input operand does not satisfy the operand constraint, so
register allocator pushes memory operand into the register:

__.25.greg:

(insn 703 73 75 2 (set (reg:DF 8 st [84])
        (mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96
{*movdf_integer} (nil)
    (nil))

(insn 75 703 76 2 (set (reg:DF 8 st [84])
        (mult:DF (reg:DF 8 st [84])
            (reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse}
(insn_list 65 (nil))
    (nil))

This RTL produces following asm sequence:

        fldl    (%rax)  #* pA0
        fmul    %st(1), %st     #


In 4.x case, we have:

__.127r.combine:

(insn 60 58 61 4 (set (reg:DF 207)
        (mult:DF (reg/v:DF 187 [ rB0 ])
            (mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ])
                    (const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
    (nil))

This instruction almost satisfies operand constraint, and register allocator
produces:

__.138r.greg:

(insn 470 58 60 5 (set (reg:DF 12 st(4) [207])
        (reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil)
    (nil))

(insn 60 470 61 5 (set (reg:DF 12 st(4) [207])
        (mult:DF (reg:DF 12 st(4) [207])
            (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
                    (const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
    (nil))

Stack handling then fixes this RTL to:

__.151r.stack:

(insn 470 58 60 4 (set (reg:DF 8 st)
        (reg:DF 8 st)) 94 {*movdf_integer} (nil)
    (nil))

(insn 60 470 61 4 (set (reg:DF 8 st)
        (mult:DF (reg:DF 8 st)
            (mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
                    (const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
    (nil))


>From your measurement, it looks that instead of:

        fld     %st(0)  #
        fmull   (%rax)  #* pA0.161

it is faster to emit

        fldl    (%rax)  #* pA0
        fmul    %st(1), %st     #,


-- 

uros at kss-loka dot si changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at kss-loka dot si


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827



More information about the Gcc-bugs mailing list