This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004
- From: Natros <natros at gmail dot com>
- To: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: Mon, 16 Aug 2004 17:12:20 +0100
- Subject: Re: GCC Benchmarks (coybench), AMD64 and i686, 14 August 2004
- References: <411F794B.8090704@coyotegulch.com>
- Reply-to: Natros <natros at gmail dot com>
On Sun, 15 Aug 2004 10:55:07 -0400, Scott Robert Ladd
<coyote@coyotegulch.com> wrote:
> Good day,
>
> Using a custom benchmark suite of my own design, I have compared the
> performance of code generated by recent and pending versions of GCC, for
> AMD Opteron and Intel Pentium 4 processors.
>
> Raw Numbers
> ===========
>
> System Corwin (x86_64)
> Dual Opteron 240, 1.4GHz
> Tyan K8W 2885 motherboard
> 120GB Maxtor 7200 RPM ATA-133 HD
> 2GB PC2700 DRAM (1GB per processor, NUMA)
> Radeon 9200 Pro, 128MB, HP f1903 LCD
>
> Linux 2.6.7 #2 SMP Sat Jun 19 20:16:20 EDT 2004
> GNU C Library 20040808 release version 2.3.4
> GNU assembler 2.15.90.0.1.1 20040303
> ln (coreutils) 5.2.1
>
> 3.2.3 3.3.3 3.4.2 3.5.0
> ----- ----- ----- -----
> alma time: 43.1 43.4 42.3 28.1
> arco time: 24.8 25.4 24.7 24.8
> evo time: 47.0 65.9 25.0 24.8
> fft time: 27.7 27.8 27.4 28.1
> huff time: 28.4 28.3 23.6 22.4
> lin time: 30.1 30.1 29.8 29.3
> mat1 time: 28.5 28.3 28.7 29.7
> mole time: 10.7 12.8 12.2 28.8
> tree time: 41.8 40.9 37.7 30.4
> -------------- ----- ----- ----- -----
> total run time: 282.0 302.8 251.5 246.3
>
> System Tycho (i686)
> 2.8GHz Pentium 4, HT enabled in BIOS and OS
> Intel D850EMV2 motherboard
> 80GB Maxtor 6L080J4, 7200RPM ATA-100 HD
> 80GB Maxtor 6Y080P0, 7200RPM ATA-100 HD
> 512MB PC800 RDRAM
> Radeon 9200 Pro, NEC FE990 monitor
>
> Linux 2.6.7 #1 SMP Sat Jun 26 12:39:11 EDT 2004
> GNU C Library 20040808 release version 2.3.4
> GNU assembler 2.14.90.0.8 20040114
> ln (coreutils) 5.2.1
>
> 3.2.3 3.3.3 3.4.2 3.5.0 icc 8
> ----- ----- ----- ----- -----
> alma time: 39.5 39.6 39.0 22.3 13.3
> arco time: 27.8 26.9 25.1 27.3 27.7
> evo time: 43.1 42.9 42.4 42.1 30.1
> fft time: 27.4 27.4 27.0 27.3 30.2
> huff time: 23.1 23.6 18.0 13.1 16.3
> lin time: 19.1 19.1 18.9 19.5 19.1
> mat1 time: 7.4 7.5 7.5 7.5 7.4
> mole time: 31.6 30.5 30.9 31.3 5.1
> tree time: 30.9 32.3 28.3 25.6 28.8
> ---------- ----- ----- ----- ----- -----
> total time: 249.9 249.7 237.1 215.8 178.0
>
> General Thoughts
> ================
>
> Overall, GCC 3.5 provides a minor improvement in generated code
> performance when compared to GCC 3.4. The historical comparison with
> earlier GCCs shows that code performance *is* improving with subsequent
> releases.
>
> At this time, GCC 3.5 and 3.4 often produce comparable code -- but on a
> few benchmarks, they differ greatly. For the Opteron, GCC 3.5 generates
> significantly faster code for the alma and tree benchmarks -- but it
> suffers a massive regression on the mole test. For the Pentium 4, GCC
> 3.5 is superior for the alma, huff, and tree tests, but loses a bit of
> ground against 3.4 on others.
>
> Intel C is still amazingly effective. HOWEVER, I do not have a more
> recent version of Intel C because my current commercial license has
> expired, and compiler updates won't install any more. In terms of
> intellectual and practical freedom, GCC wins hands down.
But you can get a non-commercial license and the updates. I've been
using it for 2 years
> The Usual Explanations and Caveats
> ==================================
>
> All compilers were built on the host systems, from official, unpatched
> archives (3.2 and 3.3) or CVS checkouts (3.4 and 3.5), acquired on the
> morning of 14 August 2004. The compiler configuration command was:
>
> .../gcc/configure --prefix=/opt/gcc-3.?
> --enable-shared
> --enable-threads=posix
> --enable-__cxa_atexit
> --disable-checking
> --disable-multilib
> --enable-languages=c,c++,f77 (f95 for gcc 3.5)
>
> The compilers were built with make -j2 bootstrap.
>
> Since we're interested in generated code speed, all compiles were
> performed with the option set used by typical users:
>
> -O3 -ffast-math -march=pentium4
> -O3 -ffast-math -march=athlon-mp (Opteron, for GCCs 3.2 and 3.3)
> -O3 -ffast-math -march=opteron (Opteron, for GCCs 3.4 and 3.5)
>
> On the Pentium 4, I also compiled the code with Intel's ICC compiler,
> version 8.0.055 (build 20031211Z), using the options:
>
> -xN -O3 -ipo
>
> As my Acovea program has shown, a selection of individual optimization
> flags often produces code that performs faster that what is generated by
> the generic (-O? options). However, most programmers don't have the time
> or expertise required for finding optimal optimizations (!) -- and as
> such, they tend to use the most "powerful" composite options (e.g., -O3).
>
> Some folk may object to my use of -ffast-math -- however, in numerous
> accuracy tests, -ffast-math produces code that is both faster *and* more
> accurate than code generated without it. Yes, -ffast-math has other
> aspects that make for interesting debate; however, such discussions
> belong in another article.
>
> This article is *NOT* a comparison of the Pentium 4 and Opteron
> processors; my two test systems are far too different for any such
> comparison to have meaning. Please do not ask me to test on systems I
> don't own, unless you're willing to send me hardware. Assuming I find
> some paying work this month, I'll be making some system upgrades in the
> near future; for now, what I've got is what I've got.
>
> About the Benchmarks
> ====================
>
> alma -- Calculates the daily ephemeris (at noon) for the years
> 2000-2099; tests array handling, floating-point math, and mathematical
> functions such as sin() and cos().
>
> evo -- A simple genetic algorithm that maximizes a two-dimensional
> function; tests 64-bit math, loop generation, and floating-point math.
>
> fft -- Uses a Fast Fourier Transform to multiply two very (very) large
> polynomials; tests the C99 _Complex type and basic floating-point math.
>
> huff -- Compresses a large block of data using the Huffman algorithm;
> tests string manipulation, bit twiddling, and the use of large memory
> blocks.
>
> lin -- Solves a large linear equation via LUP decomposition; tests basic
> floating-point math, two-dimensional array performance, and loop
> optimization.
>
> mat1 -- Multiplies two very large matrices using the brute-force
> algorithm; tests loop optimization.
>
> mole -- A molecular dynamics simulation, with performance predicated on
> matrix operations, loop efficiency, and sin() and cos(). I recently
> added this test, which exhibits very different characteristics from alma
> (even if they appear similar).
>
> tree -- Creates and modifies a large B-tree in memory; tests integer
> looping, and dynamic memory management.
>
> My benchmark suite is still in development, and isn't packaged as nicely
> as I'd like for general distribution. If you'd want the benchmark source
> code, or have any questions about these tests, please e-mail me.
>
> Thank you!
>
> --
> Scott Robert Ladd
> Coyote Gulch Productions (http://www.coyotegulch.com)
> Software Invention for High-Performance Computing
>
--
Natros