This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[BENCH] Improvements to popping x87 stack in reg-stack.c
- From: Uros Bizjak <uros at kss-loka dot si>
- To: Roger Sayle <roger at eyesopen dot com>, gcc-patches at gcc dot gnu dot org
- Date: Mon, 29 Nov 2004 09:20:45 +0100
- Subject: [BENCH] Improvements to popping x87 stack in reg-stack.c
Hello Roger!
I have some benchmark numbers for povray-3.50c, with your patch for
reg-stack.c. Povray was compiled with:
CFLAGS = -O3 -march=pentium -mfpmath=387 -D__NO_MATH_INLINES
-finline-functions -ffast-math -fomit-frame-pointer -funroll-loops
-fexpensive-optimizations -malign-double -foptimize-sibling-calls
-minline-all-stringops -Wno-multichar
CXXFLAGS = $(NOMULTICHAR) -O3 -march=pentium -mfpmath=387
-D__NO_MATH_INLINES -finline-functions -ffast-math -fomit-frame-pointer
-funroll-loops -fexpensive-optimizations -malign-double
-foptimize-sibling-calls -minline-all-stringops -Wno-multichar
Unpatched povray:
grep fxch povray | wc -l
15648
povray benchmark.pov (note - NOT the official benchmark procedure!):
Time For Parse: 0 hours 0 minutes 2.0 seconds (2 seconds)
Time For Photon: 0 hours 0 minutes 39.0 seconds (39 seconds)
Time For Trace: 0 hours 5 minutes 55.0 seconds (355 seconds)
Total Time: 0 hours 6 minutes 36.0 seconds (396 seconds)
Patched povray:
grep fxch bbb | wc -l
15659
Time For Parse: 0 hours 0 minutes 2.0 seconds (2 seconds)
Time For Photon: 0 hours 0 minutes 39.0 seconds (39 seconds)
Time For Trace: 0 hours 5 minutes 54.0 seconds (354 seconds)
Total Time: 0 hours 6 minutes 35.0 seconds (395 seconds)
Patched povray was a little faster (ony one run, so it can be noise),
but patched gcc actually produced a couple of fxch insn _more_. povray
was compiled with march=pentium, to eliminate all sse instructions.
Looking at newly produced code:
The new faster and smaller code is:
; initial state (a b C d E f g)
FREEP (b C d E f g)
FREEP (C d E f g)
FSTP 4 (d E f C)
FREEP (E f C)
FSTP 1 (E C)
There is a recommendation in Agner Fog's Pentium optimization guide,
that double pop can be implemented using fcompp instruction
(pentopt.pdf, page 126, section 19.1). As your patch groups FFREEs
together, perhaps they can be implemented in the recommended way with
fcompp?
Uros.