This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)


On 2013.03.25 at 15:17 +0100, Richard Biener wrote:
> On Mon, Mar 25, 2013 at 2:24 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> > On 2013.03.25 at 14:11 +0100, Richard Biener wrote:
> >> On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf
> >> <markus@trippelsdorf.de> wrote:
> >> > On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
> >> >> On 2013.03.24 at 20:53 +0100, gcc_mailinglist@abwesend.de wrote:
> >> >> >
> >> >> > is it useful to compile gcc 4.8.0 with the lto option?
> >> >>
> >> >> If you want a (slightly) faster compiler then yes.
> >> >> Simply add "--with-build-config=bootstrap-lto" to your configuration.
> >> >> You can combine this with profile feedback: "make profiledbootstrap".
> >> >
> >> > To qualify "(slightly) faster" in the statement above, I build gcc with
> >> > four different configurations on my AMD64 4-core machine (vanilla, LTO
> >> > only, PGO only, LTO+PGO). Then I measured how much time it takes to
> >> > build the Linux kernel and Firefox. Here are the results:
> >> >
> >> > Firefox:
> >> > vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
> >> > PGO    :  4590.37s user 270.21s system 344% cpu 23:28.89 total
> >> > LTO    :  5056.11s user 268.04s system 348% cpu 25:28.73 total
> >> > LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
> >> >
> >> > kernel (measured three times):
> >> > vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 2:03.73 total
> >> > PGO    :  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 1:51.38 total
> >> > LTO    :  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 2:01.82 total
> >> > LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 1:54.03 total
> >> >
> >> > To summarize:
> >> >  * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
> >> >  * GCC build with LTO only is only ~2% faster when building Firefox. The
> >> >    kernel build time difference is in the noise.
> >> >  * A LTO+PGO build is almost exactly as fast as a pure PGO build.
> >> >
> >> > So it appears, contrary to the advice given above, that it is not useful
> >> > to build gcc 4.8.0 with the lto option at the moment.
> >>
> >> Probably Honza did a too good job in making sure optimizations LTO does
> >> can be done without LTO as well by fixing up GCC sources ;)
> >>
> >> Did you compare binary sizes of the compiler itself (w/o debuginfo)?
> >
> > Vanilla:
> > -rwxr-xr-x 1 markus markus 16219976 Mar 25 09:28 cc1
> > -rwxr-xr-x 1 markus markus 17762824 Mar 25 09:28 cc1plus
> > -rwxr-xr-x 1 markus markus 15354320 Mar 25 09:28 lto1
> > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 c++
> > -rwxr-xr-x 1 markus markus 663496 Mar 25 09:28 cpp
> > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 g++
> > -rwxr-xr-x 3 markus markus 662464 Mar 25 09:28 gcc
> >
> > PGO:
> > -rwxr-xr-x 1 markus markus 14778600 Mar 25 09:14 cc1
> > -rwxr-xr-x 1 markus markus 16106120 Mar 25 09:14 cc1plus
> > -rwxr-xr-x 1 markus markus 14054448 Mar 25 09:14 lto1
> > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 c++
> > -rwxr-xr-x 1 markus markus 575600 Mar 25 09:14 cpp
> > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 g++
> > -rwxr-xr-x 3 markus markus 575560 Mar 25 09:14 gcc
> >
> > LTO:
> > -rwxr-xr-x 1 markus markus 17147688 Mar 25 08:58 cc1
> > -rwxr-xr-x 1 markus markus 18728200 Mar 25 08:58 cc1plus
> > -rwxr-xr-x 1 markus markus 16227224 Mar 25 08:58 lto1
> > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 c++
> > -rwxr-xr-x 1 markus markus 568224 Mar 25 08:58 cpp
> > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 g++
> > -rwxr-xr-x 3 markus markus 563728 Mar 25 08:58 gcc
> >
> > LTO+PGO:
> > -rwxr-xr-x 1 root root 16319480 Mar 22 13:02 cc1
> > -rwxr-xr-x 1 root root 17616608 Mar 22 13:02 cc1plus
> > -rwxr-xr-x 1 root root 15445824 Mar 22 13:02 lto1
> > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 c++
> > -rwxr-xr-x 1 root root 492320 Mar 22 13:02 cpp
> > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 g++
> > -rwxr-xr-x 3 root root 492232 Mar 22 13:02 gcc
> 
> Hmm, does the default --enable-plugin (GCC plugin support) which results
> in -rdynamic being used maybe prevent some of the useful LTO optimizations
> (mainly due to cost constraints)?  That is, is a LTO + PGO build with
> --disable-plugin any different?

Yes, the binary size is 8-10% smaller. Unfortunately there are no performance
improvements.

LTO+PGO-disable-plugin:
-rwxr-xr-x 1 markus markus 15025568 Mar 25 15:49 cc1
-rwxr-xr-x 1 markus markus 16198584 Mar 25 15:49 cc1plus
-rwxr-xr-x 1 markus markus 13907328 Mar 25 15:49 lto1
-rwxr-xr-x 4 markus markus 492360 Mar 25 15:49 c++
-rwxr-xr-x 1 markus markus 488240 Mar 25 15:49 cpp
-rwxr-xr-x 3 markus markus 488216 Mar 25 15:49 gcc

Firefox:
LTO+PGO-disable-plugin: 4590.55s user 273.70s system 343% cpu 23:34.65 total

kernel:
LTO+PGO-disable-plugin: 
344.11s user 23.59s system 322% cpu 1:54.08 total 340.94s user 23.65s system 326% cpu 1:51.56 total 339.66s user 23.41s system 329% cpu 1:50.09 total

-- 
Markus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]