This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Optimization comparison: 3.3, 3.4, mainline, tree-ssa
- From: Scott Robert Ladd <coyote at coyotegulch dot com>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Sun, 18 Apr 2004 16:31:58 -0400
- Subject: Optimization comparison: 3.3, 3.4, mainline, tree-ssa
Hello,
I've compared the speed of code generated by recent versions of GCC,
using a beta version of a C benchmark suite that I'm developing. The
benchmark suite is currently comprised of seven separate tests, each of
which times its inner loop and reports the result to a driver program.
TEST RESULTS
AMD64/Opteron 240 (1.4GHz)
Gentoo AMD64 64-bit Linux 2.6.5
GCC options: -O3 -ffast-math -march=opteron (-march=athlon-mp for 3.3.4)
compiler alma evo fft huff lin mat1 tree total
-------------- ---- ---- ---- ---- ---- ---- ---- -----
3.3.4 28.5 81.9 30.7 28.5 31.7 20.6 27.7 249.5
3.4.0 28.3 32.2 30.2 23.8 30.6 20.6 25.6 191.2
3.5 (mainline) 20.9 31.2 29.9 21.8 30.6 20.7 25.8 180.9
3.5 (tree-ssa) 9.3 21.8 31.1 30.1 29.5 20.3 19.3 171.5
On Opteron, 3.4 and later show the clear value of x86_64-specific
optimization. While tree-ssa wins the overall speed crown, it has a
major regression over previous compilers on the Huffman data compression
test. If that regression is corrected, tree-ssa will be smokin'...
Now, before we anoint tree-ssa as a vast improvement, I believe that
some of the above progression can be attributed to general improvements
in x86_64 (AMD64) code generation. The results for Pentium 4 tell a
somewhat different story:
Intel ia32/Pentium 4 Northwood 2.8GHz
Debian 32-bit Linux 2.6.5
GCC options: -O3 -ffast-math -march=pentium4
ICC options: -O3 -xN -ipo
compiler alma evo fft huff lin mat1 tree total
-------------- ---- ---- ---- ---- ---- ---- ---- -----
3.3.4 26.4 53.8 27.7 22.9 19.3 6.0 19.6 175.7
3.4.0 26.3 53.4 28.0 19.2 19.7 6.6 17.8 171.1
3.5 (mainline) 26.1 53.4 27.6 16.5 19.6 7.1 17.3 167.5
3.5 (tree-ssa) 26.1 54.6 28.5 22.3 19.6 5.7 16.4 173.3
Intel ICC 8.0 9.2 37.9 30.8 16.4 19.5 5.8 18.4 137.7
Newer obviously isn't better when compiling for the Pentium 4, and
Intel's compiler is still king of the hill.
USUAL DISCLAIMER AND EXPLANATION STUFF:
The benchmark suite is *not* complete; I will be adding at least three
more tests (wavelets, fluid dyanmics, big numbers), along with better
automated reporting facilities. If you would like a copy of the
benchmark suite, please request it from me by e-mail, as it's not ready
for general distribution. You can find my preliminary description of the
benchmarks at:
http://www.coyotegulch.com/acovea/acovea-4.html
Future directions I may consider include other optimization levels (e.g,
-O1, -Os), and comparisons for mainstream applications such as povray.
I am *not* testing compilation speed.
I am only testing 64-bit code generation on the AMD64.
Please do not ask about other architectures; I don't have them, so I
can't test them. Well, I do have SPARC, but it's old, slow, and no one
asks me about SPARC anyway. ;)
Most users will compile with the highest optimization level possible
under the assumption that it will produce the fastest code. Additional
options (e.g. -funroll-loops) may improve generated code speed; in fact,
it is almost always possible to find a "-O1 and other options" set that
produces faster code than -O3 (see my Acovea articles). HOWEVER, in this
comparison, I'm looking at how general users are going to use the tool
at hand.
All GNU compilers were taken from anonymous CVS on 2004-04-18, and built
using:
--enable-shared
--enable-threads=posix
--enable-__cxa_atexit
--disable-checking
--disable-multilib (Opteron only)
Performance on the "alma" test is predicated on the speed of the sin(),
cos(), and sqrt() functions. I've tried -D__NO_INLINE__, -fno-inline,
and other suggestions, without finding anything that improved GCC's
performance on this benchmark. Whether it's a problem in glibc or a
problem in gcc doesn't really matter to the user -- he or she just wants
fast code. It also appears that on AMD64, the tree-ssa compiler is
generating excellent code for almabench.
That's all for now. Let the kibbitzing begin!
..Scott
--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing