This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LTO: Speedup -- some preliminary SPEC2000 results


Hi,
thanks for the report!  It is actually more promising than I've
expected.  A while ago I did similar tests with whole-program and
--combine and we didn't get very consistent with performance (I saw also
code size reductions).  I guess geomaverage will go down for specint
after vpr/gcc/perlbmk/gap works since pretty much everything comes from
EON's intermodule inlining.  I've just comitted the patch to fix
ipa-sra problem that will hopefully allow clean SPEC runs.

The ipa-sra bug chance calling convention of externally visible
functions.  It should not affect size too much.

> >
> > With latest Jan's fixes, The results (for -O3 vs -O3 -flto
> > -fwhole-program) are
> >
> > x86:
> > ?o Int2000:
> > ? - LTO crashes the compiler on vortex. ?LTO generates
> > ? ? wrong code for vpr, gcc, perlbmk, and gap.
> > ? - Compiler is 1.85 times slower with LTO
> > ? - Average code size is almost 6% smaller:
> >
> > ? ? ? ?4.615% ? ? ? ? ?44287 ? ? ? ? ?46331 164.gzip
> > ? ? ? -3.145% ? ? ? ? 144101 ? ? ? ? 139569 175.vpr
> > ? ? ? ?0.261% ? ? ? ?1566926 ? ? ? ?1571009 176.gcc
> > ? ? ?-12.118% ? ? ? ? ?12279 ? ? ? ? ?10791 181.mcf
> > ? ? ? 11.130% ? ? ? ? 209956 ? ? ? ? 233324 186.crafty
> > ? ? ?-29.735% ? ? ? ? 155358 ? ? ? ? 109162 197.parser
> > ? ? ?-23.075% ? ? ? ? 497347 ? ? ? ? 382585 252.eon
> > ? ? ? ?8.904% ? ? ? ? 552163 ? ? ? ? 601327 253.perlbmk
> > ? ? ? ?1.516% ? ? ? ? 503006 ? ? ? ? 510630 254.gap
> > ? ? ?-20.891% ? ? ? ? ?47465 ? ? ? ? ?37549 256.bzip2
> > ? ? ? -3.047% ? ? ? ? 198365 ? ? ? ? 192321 300.twolf
> > ? ? ? Average = -5.96236%
> >
> > ? ?- Performance is improved almost by 4%
> >
> > ? ? ?164.gzip ? ?1668 ? 1629 ?-2.33813%
> > ? ? ?181.mcf ? ? 5011 ? 5020 ? 0.17960%
> > ? ? ?186.crafty ?2268 ? 2277 ? 0.39682%
> > ? ? ?197.parser ?1928 ? 1925 ?-0.15560%

There is simple opurtunity for improvement at parser for whole program
optimization.  The hashtable size is held in static variable and it is
constant prime (after it gets initialized at startup of benchmark).
Being able to constant propagate this would noticeably help here.

> > ? ? ?252.eon ? ? 2477 ? 2950 ?19.0957%
> > ? ? ?256.bzip2 ? 1894 ? 1956 ? 3.2735%
> > ? ? ?300.twolf ? 2806 ? 3026 ? 7.84034%
> > ? ? ?GeoMean ? ? 2416 ? 2509 ? 3.84934%
> >

> >
> > LTO is quite promising. ?Actually it is in line or even better with
> > improvement got from other compilers (pathscale is the most convenient
> > compiler to check lto separately: lto gave there upto 5% improvement
> > on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50%
> > slower and generated code size upto 30% bigger). ?LTO in GCC actually

I must say that I expect the geomaverage go down after we fix the broken
benchmarks, but I would be happy to be wrong.  I wonder how pathscale
makes to make code size so much bigger with whole program assumptions.
Isn't this comparsion of single file compilation compared to pathscale
equivalent of -flto alone? (i.e. not -flto -fwhole-program?).

The results also imply that on large units we probably still do quite
bad.  Doing more clonning and less inlining should help here I would
guess.

Do you happen to have comparsion of -flto to -flto -fwhole-program?
Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]