This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

From: Vladimir Makarov <vmakarov at redhat dot com>
To: Xinliang David Li <davidxl at google dot com>
Cc: "gcc.gcc.gnu.org" <gcc at gcc dot gnu dot org>
Date: Thu, 29 Apr 2010 14:49:02 -0400
Subject: Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
References: <4BD9B2EB.9060207@redhat.com> <w2v522e93241004291117n7eb214ah55245e0209f3297e@mail.gmail.com>

Xinliang David Li wrote:

On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

 GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
for x86/x86-64 and posted the comparison of it with the
previous GCC releases and LLVM-2.7.

 Even benchmarking SPEC2000 takes a lot of time on the fastest
machine I have. So I don't plan to use SPEC2006 for this in near
future.

 You can find the comparison on
http://vmakarov.fedorapeople.org/spec/ (please just click links at the
bottom of the left frame starting with link "GCC release comparison").

 If you need exact numbers, please use the tables (the links to them
are also given) which were used to generate the corresponding bar
graphs.


 In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
first considerable compilation speed improvement since GCC-4.2.
GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
SPECFP2000 in -O2 mode) code too in comparison with the previous
release.  That is not including LTO and Graphite which can gives even
more (especially LTO) in many cases.

 GCC-4.5.0 has new big optimizations LTO and Graphite (more
accurately graphite was introduced in the previous release).
Therefore I ran additional benchmarks to test them.

LTO is a promising technology especially for integer benchmarks for which it results in smaller and faster code. But it might result in degradations too on SPECFP2000 mainly because of big degradations on a few benchmarks like wupwise or facerec. Another annoying thing about LTO, it considerably slows down the compiler.

The LTO improvement on spec2000int is is only 1.86%

                4.5     4.5+lto Improvement
164.gzip        955     950     -0.52%       <-- degrade
175.vpr         588     594     1.02%
176.gcc         1211    1216    0.41%
181.mcf         699     698     -0.14%
186.crafty      1011    987     -2.37%    <--- degrade
197.parser      792     813     2.65%
252.eon         1026    1023    -0.29%   <-- degrade
253.perlbmk     1312    1294    -1.37%  <-- degrade
254.gap         1021    1037    1.57%
255.vortex      1123    1319    17.45%
256.bzip2       737     768     4.21%
300.twolf       773     779     0.78%
-----------------------------------------------------
SPECint2000     913     930     1.86%


This matches our previous observation that to bring the best out of
LTO, FDO is also needed. (As a reference, LIPO improves over plain FDO
by ~4.5%, vortex improves 23%).  You will probably see even smaller
improvement in SPEC2006.

Thanks for the comments. FDO will probably improve SPEC2000 score. Although it is not obvious for some tests because the train data sets for them are different from the reference data sets and it might actually mislead the compiler.

FDO is important for optimizations where all possible data sets do not change branch probability distribution much. IMHO therefore FDO is not widely used by most of developers (although I am sure that for Google applications it is extremely important) and therefore I don't measure it and it is not so interesting for me. Although bigger reason not use FDO is inconvenience to use it for regular compiler user.

As for vortex FDO improvement, vortex contains a moderate size loop in which most of time is spent. The loop has if-then-else on the top loop level. On all SPEC2000 data sets, one if-branch is taken practically always (like 1 to 1,000,000). So it is not amazing for me that FDO gives such improvement for vortex.

It would be great if there is number collected comparing LTO + FDO vs
plain FDO in the same setup.

Usually after such posting the comparisons, I am getting a lot of requests. I'd like to do all of them but unfortunately running and the result preparation takes a lot of my time. May be I'll do such comparison next year.

Follow-Ups:
- Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
  - From: Xinliang David Li
- Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
  - From: Jan Hubicka

References:
- GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
  - From: Vladimir Makarov
- Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
  - From: Xinliang David Li

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]