This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Scott Robert Ladd <coyote at coyotegulch dot com>*To*: gcc mailing list <gcc at gcc dot gnu dot org>*Date*: Tue, 04 May 2004 10:16:12 -0400*Subject*: C Optimization, Opteron, 4 May 2004, tree-ssa/3.5/3.4

New this time around: Use of -D__NO_MATH_INLINES and -D__NO_STRING_INLINES switches New arithmetic coding benchmark Separation of Pentium 4 and Opteron results Durations of some tests have been adjusted

I've compared the speed of code generated by recent versions of GCC, using a beta version of a C benchmark suite that I'm developing. The benchmark suite is currently comprised of eight separate tests, each of which times its inner loop and reports the result to a driver program.

In the following tables, times are in seconds, as computed for the inner loop being tested. The benchmarks are described at the end of this message.

AMD64/Opteron 240 (1.4GHz) Gentoo AMD64 64-bit Linux 2.6.5 GCC options: -O3 -ffast-math -march=opteron -D__NO_MATH_INLINES -D__NO_STRING_INLINES

test 3.4.1 mainline tree-ssa ---- -------- -------- -------- alma 70.3 53.5 23.3 arco 24.8 24.4 26.1 evo 31.9 31.3 32.9 fft 29.9 31.2 31.6 huff 23.6 24.7 30.2 lin 29.9 29.5 29.6 mat1 30.5 30.4 29.9 mole 29.5 63.2 71.5 tree 38.6 37.7 30.0 -------- -------- -------- total 309.2 325.9 305.1

Because people have asked: When compiled with -O2, the total run times are { 312.7, 326.2, 313.0 } -- in other words, -O3 is little or no advantage over -O2.

ANALYSIS

The benchmark suite is *not* complete; I will be adding at least one more test, along with better automated reporting facilities. If you would like a copy of the benchmark suite, please request it from me by e-mail, as it's not ready for general distribution.

under the assumption that doing so will produce the fastest code. Additional options (e.g. -funroll-loops) may improve generated code speed; in fact, it is almost always possible to find a "-O1 and other options" set that produces faster code than -O3 (see my Acovea articles). HOWEVER, in this comparison, I'm looking at how general users are going to use the tool at hand.

All GNU compilers were taken from anonymous CVS on 2004-05-03, and built using:

--enable-shared --enable-threads=posix --enable-__cxa_atexit --disable-checking --disable-multilib (Opteron only)

BENCHMARKS

alma -- Calculates the daily ephemeris (at noon) for the years 2000-2099; tests array handling, floating-point math, and mathematical functions such as sin() and cos().

evo -- A simple genetic algorithm that maximizes a two-dimensional function; tests 64-bit math, loop generation, and floating-point math.

fft -- Uses a Fast Fourier Transform to multiply two very (very) large polynomials; tests the C99 _Complex type and basic floating-point math.

huff -- Compresses a large block of data using the Huffman algorithm; tests string manipulation, bit twiddling, and the use of large memory blocks.

lin -- Solves a large linear equation via LUP decomposition; tests basic floating-point math, two-dimensional array performance, and loop optimization.

mat1 -- Multiplies two very large matrices using the brute-force algorithm; tests loop optimization.

mole -- A molecular dynamics simulation, with performance predicated on matrix operations, loop efficiency, and sin() and cos(). I recently added this test, which exhibits very different characteristics from alma (even if they appear similar).

tree -- Creates and modifies a large B-tree in memory; tests integer looping, and dynamic memory management.

-- Scott Robert Ladd Coyote Gulch Productions (http://www.coyotegulch.com) Software Invention for High-Performance Computing

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |