This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Failed attempt to improve FP register allocation on alpha


I've been trying to quantify how the Compaq C compiler (ccc) and mainline
gcc differ in generated code quality for the 21264 with ieee FP semantics,
and for more general codes, on Linux with the 2.2.13 kernel.  It's all
over the map, but ccc does have a big advantage for the types of test
codes I've been posting to these mailing lists; see especially the
results for electrostatic force calculations.

Options for ccc-6.2.9.002-2:

ccc -intrinsics -assume nomath_errno -O2 -arch host -ieee

Options for gcc 2.96 20000104:

gcc -mcpu=ev6 -fno-math-errno -mieee -fPIC -O1

(O2 pessimizes the code on these examples).  This is on a 21264 with
4 Mbyte cache.

Time for 1,000,000  angle force calculations in molecular dynamics:
gcc: 931 ms cpu time
ccc: 797 ms cpu time

Time for 200,000,000 electrostatic force calculations:
gcc: 84903 ms cpu time 
ccc: 33040 ms cpu time

Here, ccc unrolled the main loop 5 times and used no fmovs; gcc introduced
15 (!!) fmovs for 21 fp instructions (including 1 sqrttsu and one divtsu).
The ccc result is phenomenal---165 nanoseconds per electorstatic
force calculation, including incrementing the force vectors.

Time to multiply two 1,048,576 bit numbers, with main time being
three floating-point FFTs of 262,144 doubles using Ooura's FFT code:

gcc: 671.1425 ms
ccc: 771.9725 ms

Time to run an artificial (mainly integer and list) benchmark in the
interpreter:

gcc: 1.422851 secs
ccc: 3.359375 secs

Time to run the same benchmark, compiled:

gcc: .102539 secs
ccc: .601562 secs

So here gcc wins handily.

I realize that gcc has the problem only on one architecture (alpha) on
one version of the processor (ev6) with one setting of the options that
is still somewhat unusual for high-performance FP codes (ieee).  However,
this is currently the fastest processor available for FP calculations,
and more and more codes are relying on ieee FP semantics (including the
current LAPACK for some algorithms for finding eigenvalues).

Next week I start teaching a course on Numerical Methods for Partial
Differential Equations and I need to choose a C compiler.  It looks like
I'll use gcc as the default, but the above results imply to me that I
should arrange to have ccc be an easily accessible option to try on
numerical codes.

Brad Lucier

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]