[BENCH] Improvements to popping x87 stack in reg-stack.c
Uros Bizjak
uros@kss-loka.si
Wed Dec 1 10:50:00 GMT 2004
Richard Henderson wrote:
>>>Surely that value is for a store to memory, not a register to
>>>register move....
>>>
>>>
>>No, this is register to register move.
>>
>>
>
>I can't believe that's anything except a typo or measurement error.
>
>The table says that the latency of a reg-reg move is larger than the
>latency of a floating point addition. That simply doesn't pass the
>sanity check.
>
>
I have made a povray benchmark with following change to
output_387_reg_move() in i386.c:
Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.747
diff -u -p -r1.747 i386.c
--- i386.c 25 Nov 2004 02:05:21 -0000 1.747
+++ i386.c 1 Dec 2004 10:22:20 -0000
@@ -15167,9 +15167,13 @@ output_387_reg_move (rtx insn, rtx *oper
if (REG_P (operands[1])
&& find_regno_note (insn, REG_DEAD, REGNO (operands[1])))
{
- if (REGNO (operands[0]) == FIRST_STACK_REG
- && TARGET_USE_FFREEP)
- return "ffreep\t%y0";
+ if (REGNO (operands[0]) == FIRST_STACK_REG)
+ {
+ if (TARGET_USE_FFREEP)
+ return "ffreep\t%y0";
+ if (TARGET_CMOVE)
+ return "fcomp\t%y0";
+ }
return "fstp\t%y0";
}
if (STACK_TOP_P (operands[0]))
Now, instead of emitting "fstp %st(0)", "fcomp %st(0)" is emitted
everywhere. [This can be done for TARGET_CMOVE, because they use fcomi
instructions, which are immune to FP clobbering]. The benchmark results
are exactly the same for "fstp %st(0)" and "fcomp %st(0)":
Time For Parse: 0 hours 0 minutes 2.0 seconds (2 seconds)
Time For Photon: 0 hours 0 minutes 35.0 seconds (35 seconds)
Time For Trace: 0 hours 5 minutes 20.0 seconds (320 seconds)
Total Time: 0 hours 5 minutes 57.0 seconds (357 seconds)
As there is no dependency on the result of "fstp %st(0)" or "fcomp
%st(0)", the latency of instructions is not important. Because all insn
have the same reciprocal throughput, I would still suggest using "fcomp
%st(0)" instead of "fstp %st(0)" and "fcompp" instead of "fstp %st(0);
fstp %st(0)" in edge compensation code. These insn use different
execution units and could be mixed into optimal sequences with "fstp
%st(x)", where x != 0;
Uros,
More information about the Gcc-patches
mailing list