This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Large 3.1 performance anomalies on sparc
- From: Brad Lucier <lucier at math dot purdue dot edu>
- To: gcc at gcc dot gnu dot org
- Cc: lucier at math dot purdue dot edu (Brad Lucier), feeley at iro dot umontreal dot ca
- Date: Thu, 2 May 2002 19:13:10 -0500 (EST)
- Subject: Large 3.1 performance anomalies on sparc
I'm going through a fair number of performance tests with 3.1 on sparc
in various configurations. Some of these tests indicate slowdowns of 20
times in some cases; it would be good if gcc 3.1 could be ruled out as
the culprit. The anomalies do reveal themselves with gprof in my tests.
The results of the runtime tests are at
http://www.math.purdue.edu/~lucier/runtimes.html
The tests were run as follows. The Gambit-C runtime was compiled with
the same options and linked to a shared library in my home directory.
My LD_LIBRARY_PATH is set to
/home/c/lucier/local/lib:/pkgs/gcc-3.1/lib:/usr/openwin/lib:/usr/lib:/home/c/lucier/local/gambit/lib
The Gambit runtime is in the last directory.
For 32-bit builds, the tests show a consistent 5-10% speedup from 3.0 to
3.1 with -mcpu=supersparc and -mtune=ultrasparc, together with
consistently smaller binaries. For 32-bit codes, -mcpu=ultrasparc yields
even more significant reductions in code size, but not always an increase
in speed.
However, for 64-bit ultrasparc builds, the speedups range from none to over
a facter of 20. That is, the 64-bit code on ultrasparc is > 20 times faster
than the 32-bit code on ultrasparc.
I tried to analyze why this was, so I built a 32-bit profiled runtime library
and binary for fft, as one of the significant examples. Here is the fft line
from the table:
fft 8.1 93044 15.4 7.6 74192 14.6 7.7 29848 14.0 2.3 40376 0.9
^^^^ ^^^^ ^^^^ ^^^
The runtimes are indicated; the first three are 32-bit codes, the last
is the 64-bit result.
The results were as one might expect:
banach-70% gcc -I/home/c/lucier/local/gambit/include -O1 -fschedule-insns2 -fno-strict-aliasing -fno-math-errno -mcpu=ultrasparc -mtune=ultrasparc -m32 -pg -o fft fft.c fft_.c /home/c/lucier/local/gambit/lib/libgambc.so -lm -ldl -lcurses -lsocket -lnsl -lresolv
banach-72% time ./fft
(time (run-bench name count run ok?))
28957 ms real time
28910 ms cpu time (28680 user, 230 system)
3 collections accounting for 13 ms real time (10 user, 10 system)
66768024 bytes allocated
no minor faults
no major faults
28.72u 0.27s 0:29.10 99.6%
and the gprof output told me nothing:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
76.76 12.22 12.22 internal_mcount
13.13 14.31 2.09 ___H_four1
9.99 15.90 1.59 _mcount
0.06 15.91 0.01 ___H_main
0.06 15.92 0.01 ___H_run_2d_bench
0.00 15.92 0.00 1 0.00 0.00 call___do_global_ctors_aux
0.00 15.92 0.00 1 0.00 0.00 main
However, when I took the same .o files and build a static runtime library,
there was a tremendous speedup:
banach-78% gcc -I/home/c/lucier/local/gambit/include -O1 -fschedule-insns2 -fno-strict-aliasing -fno-math-errno -mcpu=ultrasparc -mtune=ultrasparc -m32 -pg -o fft fft.c fft_.c /home/c/lucier/local/gambit/lib/libgambc.a -lm -ldl -lcurses -lsocket -lnsl -lresolv
banach-79% rm gmon.out
rm: remove gmon.out (yes/no)? y
banach-80% time ./fft
(time (run-bench name count run ok?))
1650 ms real time
1650 ms cpu time (1420 user, 230 system)
3 collections accounting for 13 ms real time (20 user, 0 system)
66768024 bytes allocated
no minor faults
no major faults
1.69u 0.32s 0:02.06 97.5%
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
64.86 1.20 1.20 63530 0.02 0.02 ___H_four1
25.95 1.68 0.48 internal_mcount
...
This time (with the same binaries, only linked statically) is significantly
faster than any of the 32-bit runtimes with a dynamically linked library,
and approaches the runtime of the 64-bit binary.
So, is there a problem with dynamically-loaded 32-bit libraries generated
by gcc-3.1 and Solaris as/ld? One that doesn't show up with 64-bit
libraries and binaries? Could there be an alignment problem? Or are all
these questions just too naive to be useful;-)?
Brad