[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
uros at kss-loka dot si
gcc-bugzilla@gcc.gnu.org
Wed May 31 10:57:00 GMT 2006
------- Comment #7 from uros at kss-loka dot si 2006-05-31 10:56 -------
IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure
luck.
Looking into 3.x RTL, these things can be observed:
Instruction that multiplies pA0 and rB0 is described as:
__.20.combine:
(insn 75 73 76 2 (set (reg:DF 84)
(mult:DF (mem:DF (reg/v/f:DI 70 [ pA0 ]) [0 S8 A64])
(reg/v:DF 78 [ rB0 ]))) 551 {*fop_df_comm_nosse} (insn_list 65
(nil))
(nil))
At this point, first input operand does not satisfy the operand constraint, so
register allocator pushes memory operand into the register:
__.25.greg:
(insn 703 73 75 2 (set (reg:DF 8 st [84])
(mem:DF (reg/v/f:DI 0 ax [orig:70 pA0 ] [70]) [0 S8 A64])) 96
{*movdf_integer} (nil)
(nil))
(insn 75 703 76 2 (set (reg:DF 8 st [84])
(mult:DF (reg:DF 8 st [84])
(reg/v:DF 9 st(1) [orig:78 rB0 ] [78]))) 551 {*fop_df_comm_nosse}
(insn_list 65 (nil))
(nil))
This RTL produces following asm sequence:
fldl (%rax) #* pA0
fmul %st(1), %st #
In 4.x case, we have:
__.127r.combine:
(insn 60 58 61 4 (set (reg:DF 207)
(mult:DF (reg/v:DF 187 [ rB0 ])
(mem:DF (plus:DI (reg/v/f:DI 178 [ pA0.161 ])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))
This instruction almost satisfies operand constraint, and register allocator
produces:
__.138r.greg:
(insn 470 58 60 5 (set (reg:DF 12 st(4) [207])
(reg/v:DF 8 st [orig:187 rB0 ] [187])) 94 {*movdf_integer} (nil)
(nil))
(insn 60 470 61 5 (set (reg:DF 12 st(4) [207])
(mult:DF (reg:DF 12 st(4) [207])
(mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))
Stack handling then fixes this RTL to:
__.151r.stack:
(insn 470 58 60 4 (set (reg:DF 8 st)
(reg:DF 8 st)) 94 {*movdf_integer} (nil)
(nil))
(insn 60 470 61 4 (set (reg:DF 8 st)
(mult:DF (reg:DF 8 st)
(mem:DF (plus:DI (reg/v/f:DI 0 ax [orig:178 pA0.161 ] [178])
(const_int 960 [0x3c0])) [0 S8 A64]))) 591
{*fop_df_comm_i387} (nil)
(nil))
>From your measurement, it looks that instead of:
fld %st(0) #
fmull (%rax) #* pA0.161
it is faster to emit
fldl (%rax) #* pA0
fmul %st(1), %st #,
--
uros at kss-loka dot si changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |uros at kss-loka dot si
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
More information about the Gcc-bugs
mailing list