This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[BENCH] Improvements to popping x87 stack in reg-stack.c

From: Uros Bizjak <uros at kss-loka dot si>
To: Roger Sayle <roger at eyesopen dot com>, gcc-patches at gcc dot gnu dot org
Date: Mon, 29 Nov 2004 09:20:45 +0100
Subject: [BENCH] Improvements to popping x87 stack in reg-stack.c

Hello Roger!

I have some benchmark numbers for povray-3.50c, with your patch for reg-stack.c. Povray was compiled with:

CFLAGS = -O3 -march=pentium -mfpmath=387 -D__NO_MATH_INLINES -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -fexpensive-optimizations -malign-double -foptimize-sibling-calls -minline-all-stringops -Wno-multichar CXXFLAGS = $(NOMULTICHAR) -O3 -march=pentium -mfpmath=387 -D__NO_MATH_INLINES -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -fexpensive-optimizations -malign-double -foptimize-sibling-calls -minline-all-stringops -Wno-multichar

Unpatched povray:

grep fxch povray | wc -l
 15648

povray benchmark.pov (note - NOT the official benchmark procedure!):
Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  39.0 seconds (39 seconds)
Time For Trace:    0 hours  5 minutes  55.0 seconds (355 seconds)
   Total Time:    0 hours  6 minutes  36.0 seconds (396 seconds)

Patched povray:

grep fxch bbb | wc -l
 15659

Time For Parse:    0 hours  0 minutes   2.0 seconds (2 seconds)
Time For Photon:   0 hours  0 minutes  39.0 seconds (39 seconds)
Time For Trace:    0 hours  5 minutes  54.0 seconds (354 seconds)
   Total Time:    0 hours  6 minutes  35.0 seconds (395 seconds)

Patched povray was a little faster (ony one run, so it can be noise), but patched gcc actually produced a couple of fxch insn _more_. povray was compiled with march=pentium, to eliminate all sse instructions.

Looking at newly produced code:

The new faster and smaller code is:

	; initial state (a b C d E f g)
	FREEP		(b C d E f g)
	FREEP		(C d E f g)
	FSTP 4		(d E f C)
	FREEP		(E f C)
	FSTP 1		(E C)

There is a recommendation in Agner Fog's Pentium optimization guide, that double pop can be implemented using fcompp instruction (pentopt.pdf, page 126, section 19.1). As your patch groups FFREEs together, perhaps they can be implemented in the recommended way with fcompp?

Uros.

Follow-Ups:
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Roger Sayle
- Re: [BENCH] Improvements to popping x87 stack in reg-stack.c
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]