This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Failed attempt to improve FP register allocation on alpha
- To: gcc at gcc dot gnu dot org, lucier at math dot purdue dot edu
- Subject: Re: Failed attempt to improve FP register allocation on alpha
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Wed, 5 Jan 2000 13:49:16 -0500 (EST)
- Cc: wilker at math dot purdue dot edu, feeley at iro dot umontreal dot ca
I've been trying to quantify how the Compaq C compiler (ccc) and mainline
gcc differ in generated code quality for the 21264 with ieee FP semantics,
and for more general codes, on Linux with the 2.2.13 kernel. It's all
over the map, but ccc does have a big advantage for the types of test
codes I've been posting to these mailing lists; see especially the
results for electrostatic force calculations.
Options for ccc-6.2.9.002-2:
ccc -intrinsics -assume nomath_errno -O2 -arch host -ieee
Options for gcc 2.96 20000104:
gcc -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1
(O2 pessimizes the code on these examples). This is on a 21264 with
4 Mbyte cache.
Time for 1,000,000 angle force calculations in molecular dynamics:
gcc: 931 ms cpu time
ccc: 797 ms cpu time
Time for 200,000,000 electrostatic force calculations:
gcc: 84903 ms cpu time
ccc: 33040 ms cpu time
Here, ccc unrolled the main loop 5 times and used no fmovs; gcc introduced
15 (!!) fmovs for 21 fp instructions (including 1 sqrttsu and one divtsu).
The ccc result is phenomenal---165 nanoseconds per electorstatic
force calculation, including incrementing the force vectors.
Time to multiply two 1,048,576 bit numbers, with main time being
three floating-point FFTs of 262,144 doubles using Ooura's FFT code:
gcc: 671.1425 ms
ccc: 771.9725 ms
Time to run an artificial (mainly integer and list) benchmark in the
interpreter:
gcc: 1.422851 secs
ccc: 3.359375 secs
Time to run the same benchmark, compiled:
gcc: .102539 secs
ccc: .601562 secs
So here gcc wins handily.
I realize that gcc has the problem only on one architecture (alpha) on
one version of the processor (ev6) with one setting of the options that
is still somewhat unusual for high-performance FP codes (ieee). However,
this is currently the fastest processor available for FP calculations,
and more and more codes are relying on ieee FP semantics (including the
current LAPACK for some algorithms for finding eigenvalues).
Next week I start teaching a course on Numerical Methods for Partial
Differential Equations and I need to choose a C compiler. It looks like
I'll use gcc as the default, but the above results imply to me that I
should arrange to have ccc be an easily accessible option to try on
numerical codes.
Brad Lucier