This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
RE: 3.0.1 performance
- To: "Kurt Garloff" <kurt at garloff dot de>, <gcc at gcc dot gnu dot org>
- Subject: RE: 3.0.1 performance
- From: "akbar A." <syedali011 at earthlink dot net>
- Date: Fri, 17 Aug 2001 04:46:08 -0700
interesting results.
on the topic of performance on numerical/fp intensive code, i think another
good benchmark of pratical code might be dave eberly's lib?
http://www.magic-software.com/
and this lib is pretty commonly used in the game development community.
see sleep;
-akbar A.
-----Original Message-----
From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org]On Behalf Of
Kurt Garloff
Sent: Friday, August 17, 2001 2:17 AM
To: gcc@gcc.gnu.org
Subject: 3.0.1 performance
Hi,
doing some performance evaluation of the 3.0.1 prerelease (20010816)
gives rather disappointing ressults :-(
The code used is the benchmark from TBCI NumLib 2.2.1
http://pebbles.e-technik.uni-dortmund.de/numlib/
(If you want to test yourself, please ask me to send you an updated
Makefile.)
doxygen docu of this code is also available.
The code is heavily inlined templated numerical code representing Vectors
and Matrices and including solvers and similar stuff.
Here's the results:
3.0
---
garloff@pckurt:~/Physics/numerix-2.0-gcc3/bench $ time make OPT=1 OPT_ARCH=1
CC="gcc -V 3.0" CXX="g++ -V 3.0" SYSDEP=bin-ix86-300 -j2
[...]
real 2m32.318s user 3m48.110s sys 0m7.130s
garloff@pckurt:~/Physics/numerix-2.0-gcc3/bench $
bin-ix86-300/libbench_double
=============================================================
TBCI NumLib (2.2.1) benchmark 1.35 (Aug 17 2001 10:40:53)
[...]
Total for ALL : 20.394s e, 20.330s u+s
=============================================================
=> libbench SPEC values : 0.853 0.855
This value is compared with the same benchmark on this reference machine:
[Athlon1-700, 256MB PC-133, Linux-2.4, TBCI 2.1.2, gcc-3.0.0, OPT=1
OPT_ARCH=1]
(OPT_ARCH evaluates to -march=athlon there)
Flags are:
Compiler: Reading specs from
/raid/gcc300/lib/gcc-lib/i686-pc-linux-gnu/3.0/specs ../configure
--with-gcc-version-trigger=/raid/egcs/gcc/version.c --host=i686-pc-linux-gn
u
--with-system-zlib --with-gnu-ld --with-gnu-as --enable-libstdcxx-v3
--prefix=/raid/gcc300 --enable-haifa --enable-threads=posix Thread model:
posix gcc driver version 3.0.1 20010816 (prerelease) executing gcc version
3.0 g++: No input files
Bench flags: -O2 -ffast-math -felide-constructors -O3 -fschedule-insns2
-funroll-loops -freduce-all-givs -frerun-loop-opt -fno-inline-functions
-fstrict-aliasing -mcpu=pentiumpro -march=pentiumpro -D__GNUC_SUBVER__=0
Machine info: pckurt.casa-etp.nl Intel Intel 686 686 Pentium III
(Coppermine) Pentium III (Coppermine) [8] [8] st.6 st.6 708.115 MHz 708.115
MHz 1412.30 Bogos 1415.57 Bogos 627 MB Linux 2.4.7-SMP SMP
Fri Aug 17 10:46:29 CEST 2001 up 1 day, 6:53, 10 users, load: 2.64, 2.81,
2.74
3.0.1
-----
garloff@pckurt:~/Physics/numerix-2.0-gcc3/bench $ time make OPT=1 OPT_ARCH=1
SYSDEP=bin-ix86-301 -j2
real 1m56.750s user 2m49.840s sys 0m3.970s
garloff@pckurt:~/Physics/numerix-2.0-gcc3/bench $
bin-ix86-301/libbench_double
[...]
Total for ALL : 54.430s e, 54.390s u+s
=============================================================
=> libbench SPEC values : 0.319 0.320
For reference:
2.95.3 (SuSE)
------
garloff@pckurt:~/Physics/numerix-2.0-gcc295/bench $ time make OPT=1
OPT_ARCH=1 -j2
real 2m24.167s user 3m35.440s sys 0m5.790s
garloff@pckurt:~/Physics/numerix-2.0-gcc295/bench $ bin-ix86/libbench_double
[...]
Total for ALL : 20.148s e, 20.010s u+s
=============================================================
=> libbench SPEC values : 0.863 0.869
Conclusion:
===========
While compile performance of 3.0.1 is improved by a factor of 1.34 over 3.0
(1.27 over 2.95.3), performance dropped dramatically, by a factor of
2.67 compared to 3.0 (2.71 comp. to 2.95.3).
Looking at the detailed results, the vector operations seem to have almost
the same performance, whereas the matrix operations are much slower, and
thus
the solvers are also very slow.
Is this problem known?
I've not been following discussions on the gcc list too well the last weeks,
but I'd suspect the inlining limitations. I guess, we've gone too far there.
Or did the meaning of -fno-inline-functions change and does not only prevent
automatic inlining but inlining completely?
I did not yet play with parameters, like using plain -O3 (is it useable in
3.0.1?) or -finline-limit yet. Should I?
Other things I should try?
Suggestions, patches, command-line parameters, ... welcome.
(Please Cc: me when answering, I'm not subscribed to the gcc list.)
Regards,
--
Kurt Garloff <kurt@garloff.de> [Eindhoven, NL]
Physics: Plasma simulations <K.Garloff@Phys.TUE.NL> [TU Eindhoven, NL]
Linux: SCSI, Security <garloff@suse.de> [SuSE Nuernberg, DE]
(See mail header or public key servers for PGP2 and GPG public keys.)