This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [BENCH] Improvements to popping x87 stack in reg-stack.c

Richard Henderson wrote:

Surely that value is for a store to memory, not a register to
register move....

No, this is register to register move.

I can't believe that's anything except a typo or measurement error.

The table says that the latency of a reg-reg move is larger than the
latency of a floating point addition. That simply doesn't pass the
sanity check.

I have made a povray benchmark with following change to output_387_reg_move() in i386.c:

Index: i386.c
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.747
diff -u -p -r1.747 i386.c
--- i386.c      25 Nov 2004 02:05:21 -0000      1.747
+++ i386.c      1 Dec 2004 10:22:20 -0000
@@ -15167,9 +15167,13 @@ output_387_reg_move (rtx insn, rtx *oper
  if (REG_P (operands[1])
      && find_regno_note (insn, REG_DEAD, REGNO (operands[1])))
-      if (REGNO (operands[0]) == FIRST_STACK_REG
-         && TARGET_USE_FFREEP)
-       return "ffreep\t%y0";
+      if (REGNO (operands[0]) == FIRST_STACK_REG)
+       {
+         if (TARGET_USE_FFREEP)
+           return "ffreep\t%y0";
+         if (TARGET_CMOVE)
+           return "fcomp\t%y0";
+       }
      return "fstp\t%y0";
  if (STACK_TOP_P (operands[0]))

Now, instead of emitting "fstp %st(0)", "fcomp %st(0)" is emitted everywhere. [This can be done for TARGET_CMOVE, because they use fcomi instructions, which are immune to FP clobbering]. The benchmark results are exactly the same for "fstp %st(0)" and "fcomp %st(0)":

Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  35.0 seconds (35 seconds)
Time For Trace:    0 hours  5 minutes  20.0 seconds (320 seconds)
   Total Time:    0 hours  5 minutes  57.0 seconds (357 seconds)

As there is no dependency on the result of "fstp %st(0)" or "fcomp %st(0)", the latency of instructions is not important. Because all insn have the same reciprocal throughput, I would still suggest using "fcomp %st(0)" instead of "fstp %st(0)" and "fcompp" instead of "fstp %st(0); fstp %st(0)" in edge compensation code. These insn use different execution units and could be mixed into optimal sequences with "fstp %st(x)", where x != 0;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]