This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [BENCH] Improvements to popping x87 stack in reg-stack.c

From: Uros Bizjak <uros at kss-loka dot si>
To: Richard Henderson <rth at redhat dot com>
Cc: Roger Sayle <roger at eyesopen dot com>, gcc-patches at gcc dot gnu dot org
Date: Wed, 01 Dec 2004 11:41:17 +0100
Subject: Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
References: <41AADBDD.2030706@kss-loka.si> <20041130021009.GB1489@redhat.com> <41AD850F.4090208@kss-loka.si> <20041201085122.GA6460@redhat.com> <41AD8ABC.6080208@kss-loka.si> <20041201093829.GA6493@redhat.com>

Richard Henderson wrote:

Surely that value is for a store to memory, not a register to register move....

No, this is register to register move.

I can't believe that's anything except a typo or measurement error.

The table says that the latency of a reg-reg move is larger than the latency of a floating point addition. That simply doesn't pass the sanity check.

I have made a povray benchmark with following change to output_387_reg_move() in i386.c:

Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.747
diff -u -p -r1.747 i386.c
--- i386.c      25 Nov 2004 02:05:21 -0000      1.747
+++ i386.c      1 Dec 2004 10:22:20 -0000
@@ -15167,9 +15167,13 @@ output_387_reg_move (rtx insn, rtx *oper
  if (REG_P (operands[1])
      && find_regno_note (insn, REG_DEAD, REGNO (operands[1])))
    {
-      if (REGNO (operands[0]) == FIRST_STACK_REG
-         && TARGET_USE_FFREEP)
-       return "ffreep\t%y0";
+      if (REGNO (operands[0]) == FIRST_STACK_REG)
+       {
+         if (TARGET_USE_FFREEP)
+           return "ffreep\t%y0";
+         if (TARGET_CMOVE)
+           return "fcomp\t%y0";
+       }
      return "fstp\t%y0";
    }
  if (STACK_TOP_P (operands[0]))

Now, instead of emitting "fstp %st(0)", "fcomp %st(0)" is emitted everywhere. [This can be done for TARGET_CMOVE, because they use fcomi instructions, which are immune to FP clobbering]. The benchmark results are exactly the same for "fstp %st(0)" and "fcomp %st(0)":

Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  35.0 seconds (35 seconds)
Time For Trace:    0 hours  5 minutes  20.0 seconds (320 seconds)
   Total Time:    0 hours  5 minutes  57.0 seconds (357 seconds)

As there is no dependency on the result of "fstp %st(0)" or "fcomp %st(0)", the latency of instructions is not important. Because all insn have the same reciprocal throughput, I would still suggest using "fcomp %st(0)" instead of "fstp %st(0)" and "fcompp" instead of "fstp %st(0); fstp %st(0)" in edge compensation code. These insn use different execution units and could be mixed into optimal sequences with "fstp %st(x)", where x != 0;

Uros,

References:
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Uros Bizjak
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Richard Henderson
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Uros Bizjak
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]