This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
- From: Uros Bizjak <uros at kss-loka dot si>
- To: Richard Henderson <rth at redhat dot com>,Roger Sayle <roger at eyesopen dot com>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 01 Dec 2004 09:47:11 +0100
- Subject: Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
- References: <41AADBDD.2030706@kss-loka.si> <20041130021009.GB1489@redhat.com>
Richard Henderson wrote:
On Mon, Nov 29, 2004 at 09:20:45AM +0100, Uros Bizjak wrote:
There is a recommendation in Agner Fog's Pentium optimization guide,
that double pop can be implemented using fcompp instruction
(pentopt.pdf, page 126, section 19.1).
I would be shocked if this applies to P4 or Athlon,
and surprised if it still applied to P3.
The p4 case:
"fstp r" has latency of 6, reciprocal throughput of 1, uses port 0 and
execution unit mov.
"fcomp r" has latency of 2, reciprocal throughput of 1, uses port 1 and
execution unit fp.
"fcompp" has latency of 2, reciprocal throughput of 1, uses port 1 and
execution unit fp.
Now, to quote Roger from
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html:
The new faster and smaller code is:
; initial state (a b C d E f g)
FREEP (b C d E f g)
FREEP (C d E f g)
FSTP 4 (d E f C)
FREEP (E f C)
FSTP 1 (E C)
Which for current p4 case means sequence of
fstp 0
fstp 0
fstp 4
fstp 0
fstp 1
All these insns go to port 0 and execution unit mov. I guess they can't
be executed in parallel. It is not clear to me if the second fstp 0 is
dependent on the "result" of first fstp 0 (and similar for others), in
which case the insn latency would be 6.
Using fcomp(p) insns, the sequence would look like:
fcompp (port 1) (fp)
fstp 4 (port 0) (mov)
fcomp 0 (port 1) (fp)
fstp 1 (port 0) (mov)
As they use different ports and exec units, they could be executed in
parallel. These insns are edge compensation sequences, and issued on
edges, so there should be no problem for compare insns clobbering fp flags.
Uros.