This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004


On Sun, 15 Aug 2004 10:55:07 -0400, Scott Robert Ladd
<coyote@coyotegulch.com> wrote:
> Good day,
> 
> Using a custom benchmark suite of my own design, I have compared the
> performance of code generated by recent and pending versions of GCC, for
> AMD Opteron and Intel Pentium 4 processors.
> 
> Raw Numbers
> ===========
> 
> System Corwin (x86_64)
>    Dual Opteron 240, 1.4GHz
>    Tyan K8W 2885 motherboard
>    120GB Maxtor 7200 RPM ATA-133 HD
>    2GB PC2700 DRAM (1GB per processor, NUMA)
>    Radeon 9200 Pro, 128MB, HP f1903 LCD
> 
>    Linux 2.6.7 #2 SMP Sat Jun 19 20:16:20 EDT 2004
>    GNU C Library 20040808 release version 2.3.4
>    GNU assembler 2.15.90.0.1.1 20040303
>    ln (coreutils) 5.2.1
> 
>                   3.2.3  3.3.3  3.4.2  3.5.0
>                   -----  -----  -----  -----
>       alma time:   43.1   43.4   42.3   28.1
>       arco time:   24.8   25.4   24.7   24.8
>        evo time:   47.0   65.9   25.0   24.8
>        fft time:   27.7   27.8   27.4   28.1
>       huff time:   28.4   28.3   23.6   22.4
>        lin time:   30.1   30.1   29.8   29.3
>       mat1 time:   28.5   28.3   28.7   29.7
>       mole time:   10.7   12.8   12.2   28.8
>       tree time:   41.8   40.9   37.7   30.4
> --------------   -----  -----  -----  -----
> total run time:  282.0  302.8  251.5  246.3
> 
> System Tycho (i686)
>    2.8GHz Pentium 4, HT enabled in BIOS and OS
>    Intel D850EMV2 motherboard
>    80GB Maxtor 6L080J4, 7200RPM ATA-100 HD
>    80GB Maxtor 6Y080P0, 7200RPM ATA-100 HD
>    512MB PC800 RDRAM
>    Radeon 9200 Pro, NEC FE990 monitor
> 
>    Linux 2.6.7 #1 SMP Sat Jun 26 12:39:11 EDT 2004
>    GNU C Library 20040808 release version 2.3.4
>    GNU assembler 2.14.90.0.8 20040114
>    ln (coreutils) 5.2.1
> 
>                   3.2.3  3.3.3  3.4.2  3.5.0  icc 8
>                   -----  -----  -----  -----  -----
>       alma time:   39.5   39.6   39.0   22.3   13.3
>       arco time:   27.8   26.9   25.1   27.3   27.7
>        evo time:   43.1   42.9   42.4   42.1   30.1
>        fft time:   27.4   27.4   27.0   27.3   30.2
>       huff time:   23.1   23.6   18.0   13.1   16.3
>        lin time:   19.1   19.1   18.9   19.5   19.1
>       mat1 time:    7.4    7.5    7.5    7.5    7.4
>       mole time:   31.6   30.5   30.9   31.3    5.1
>       tree time:   30.9   32.3   28.3   25.6   28.8
>      ----------   -----  -----  -----  -----  -----
>      total time:  249.9  249.7  237.1  215.8  178.0
> 
> General Thoughts
> ================
> 
> Overall, GCC 3.5 provides a minor improvement in generated code
> performance when compared to GCC 3.4. The historical comparison with
> earlier GCCs shows that code performance *is* improving with subsequent
> releases.
> 
> At this time, GCC 3.5 and 3.4 often produce comparable code -- but on a
> few benchmarks, they differ greatly. For the Opteron, GCC 3.5 generates
> significantly faster code for the alma and tree benchmarks -- but it
> suffers a massive regression on the mole test. For the Pentium 4, GCC
> 3.5 is superior for the alma, huff, and tree tests, but loses a bit of
> ground against 3.4 on others.
> 
> Intel C is still amazingly effective. HOWEVER, I do not have a more
> recent version of Intel C because my current commercial license has
> expired, and compiler updates won't install any more. In terms of
> intellectual and practical freedom, GCC wins hands down.

But you can get a non-commercial license and the updates. I've been
using it for 2 years

> The Usual Explanations and Caveats
> ==================================
> 
> All compilers were built on the host systems, from official, unpatched
> archives (3.2 and 3.3) or CVS checkouts (3.4 and 3.5), acquired on the
> morning of 14 August 2004. The compiler configuration command was:
> 
> .../gcc/configure --prefix=/opt/gcc-3.?
>         --enable-shared
>         --enable-threads=posix
>         --enable-__cxa_atexit
>         --disable-checking
>         --disable-multilib
>         --enable-languages=c,c++,f77 (f95 for gcc 3.5)
> 
> The compilers were built with make -j2 bootstrap.
> 
> Since we're interested in generated code speed, all compiles were
> performed with the option set used by typical users:
> 
>    -O3 -ffast-math -march=pentium4
>    -O3 -ffast-math -march=athlon-mp (Opteron, for GCCs 3.2 and 3.3)
>    -O3 -ffast-math -march=opteron   (Opteron, for GCCs 3.4 and 3.5)
> 
> On the Pentium 4, I also compiled the code with Intel's ICC compiler,
> version 8.0.055 (build 20031211Z), using the options:
> 
>    -xN -O3 -ipo
> 
> As my Acovea program has shown, a selection of individual optimization
> flags often produces code that performs faster that what is generated by
> the generic (-O? options). However, most programmers don't have the time
> or expertise required for finding optimal optimizations (!) -- and as
> such, they tend to use the most "powerful" composite options (e.g., -O3).
> 
> Some folk may object to my use of -ffast-math -- however, in numerous
> accuracy tests, -ffast-math produces code that is both faster *and* more
> accurate than code generated without it. Yes, -ffast-math has other
> aspects that make for interesting debate; however, such discussions
> belong in another article.
> 
> This article is *NOT* a comparison of the Pentium 4 and Opteron
> processors; my two test systems are far too different for any such
> comparison to have meaning. Please do not ask me to test on systems I
> don't own, unless you're willing to send me hardware. Assuming I find
> some paying work this month, I'll be making some system upgrades in the
> near future; for now, what I've got is what I've got.
> 
> About the Benchmarks
> ====================
> 
> alma -- Calculates the daily ephemeris (at noon) for the years
> 2000-2099; tests array handling, floating-point math, and mathematical
> functions such as sin() and cos().
> 
> evo -- A simple genetic algorithm that maximizes a two-dimensional
> function; tests 64-bit math, loop generation, and floating-point math.
> 
> fft -- Uses a Fast Fourier Transform to multiply two very (very) large
> polynomials; tests the C99 _Complex type and basic floating-point math.
> 
> huff -- Compresses a large block of data using the Huffman algorithm;
> tests string manipulation, bit twiddling, and the use of large memory
> blocks.
> 
> lin -- Solves a large linear equation via LUP decomposition; tests basic
> floating-point math, two-dimensional array performance, and loop
> optimization.
> 
> mat1 -- Multiplies two very large matrices using the brute-force
> algorithm; tests loop optimization.
> 
> mole -- A molecular dynamics simulation, with performance predicated on
> matrix operations, loop efficiency, and sin() and cos(). I recently
> added this test, which exhibits very different characteristics from alma
> (even if they appear similar).
> 
> tree -- Creates and modifies a large B-tree in memory; tests integer
> looping, and dynamic memory management.
> 
> My benchmark suite is still in development, and isn't packaged as nicely
> as I'd like for general distribution. If you'd want the benchmark source
> code, or have any questions about these tests, please e-mail me.
> 
> Thank you!
> 
> --
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
> 


-- 
Natros


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]