This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Optimization comparison: 3.3, 3.4, mainline, tree-ssa


Hello,

I've compared the speed of code generated by recent versions of GCC, using a beta version of a C benchmark suite that I'm developing. The benchmark suite is currently comprised of seven separate tests, each of which times its inner loop and reports the result to a driver program.

TEST RESULTS

AMD64/Opteron 240 (1.4GHz)
Gentoo AMD64 64-bit Linux 2.6.5
GCC options: -O3 -ffast-math -march=opteron (-march=athlon-mp for 3.3.4)

compiler        alma  evo   fft   huff  lin   mat1  tree  total
--------------  ----  ----  ----  ----  ----  ----  ----  -----
3.3.4	        28.5  81.9  30.7  28.5  31.7  20.6  27.7  249.5
3.4.0           28.3  32.2  30.2  23.8  30.6  20.6  25.6  191.2
3.5 (mainline)  20.9  31.2  29.9  21.8  30.6  20.7  25.8  180.9
3.5 (tree-ssa)   9.3  21.8  31.1  30.1  29.5  20.3  19.3  171.5

On Opteron, 3.4 and later show the clear value of x86_64-specific optimization. While tree-ssa wins the overall speed crown, it has a major regression over previous compilers on the Huffman data compression test. If that regression is corrected, tree-ssa will be smokin'...

Now, before we anoint tree-ssa as a vast improvement, I believe that some of the above progression can be attributed to general improvements in x86_64 (AMD64) code generation. The results for Pentium 4 tell a somewhat different story:


Intel ia32/Pentium 4 Northwood 2.8GHz Debian 32-bit Linux 2.6.5 GCC options: -O3 -ffast-math -march=pentium4 ICC options: -O3 -xN -ipo

compiler        alma  evo   fft   huff  lin   mat1  tree  total
--------------  ----  ----  ----  ----  ----  ----  ----  -----
3.3.4	        26.4  53.8  27.7  22.9  19.3   6.0  19.6  175.7
3.4.0           26.3  53.4  28.0  19.2  19.7   6.6  17.8  171.1
3.5 (mainline)  26.1  53.4  27.6  16.5  19.6   7.1  17.3  167.5
3.5 (tree-ssa)  26.1  54.6  28.5  22.3  19.6   5.7  16.4  173.3
Intel ICC 8.0    9.2  37.9  30.8  16.4  19.5   5.8  18.4  137.7

Newer obviously isn't better when compiling for the Pentium 4, and Intel's compiler is still king of the hill.


USUAL DISCLAIMER AND EXPLANATION STUFF:


The benchmark suite is *not* complete; I will be adding at least three more tests (wavelets, fluid dyanmics, big numbers), along with better automated reporting facilities. If you would like a copy of the benchmark suite, please request it from me by e-mail, as it's not ready for general distribution. You can find my preliminary description of the benchmarks at:

http://www.coyotegulch.com/acovea/acovea-4.html

Future directions I may consider include other optimization levels (e.g, -O1, -Os), and comparisons for mainstream applications such as povray.

I am *not* testing compilation speed.

I am only testing 64-bit code generation on the AMD64.

Please do not ask about other architectures; I don't have them, so I can't test them. Well, I do have SPARC, but it's old, slow, and no one asks me about SPARC anyway. ;)

Most users will compile with the highest optimization level possible under the assumption that it will produce the fastest code. Additional options (e.g. -funroll-loops) may improve generated code speed; in fact, it is almost always possible to find a "-O1 and other options" set that produces faster code than -O3 (see my Acovea articles). HOWEVER, in this comparison, I'm looking at how general users are going to use the tool at hand.

All GNU compilers were taken from anonymous CVS on 2004-04-18, and built using:

    --enable-shared
    --enable-threads=posix
    --enable-__cxa_atexit
    --disable-checking
    --disable-multilib (Opteron only)

Performance on the "alma" test is predicated on the speed of the sin(), cos(), and sqrt() functions. I've tried -D__NO_INLINE__, -fno-inline, and other suggestions, without finding anything that improved GCC's performance on this benchmark. Whether it's a problem in glibc or a problem in gcc doesn't really matter to the user -- he or she just wants fast code. It also appears that on AMD64, the tree-ssa compiler is generating excellent code for almabench.

That's all for now. Let the kibbitzing begin!

..Scott

--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]