[PATCH] Introduce 4-stages profiledbootstrap to get a better profile.
Jan Hubicka
hubicka@ucw.cz
Mon May 29 15:11:00 GMT 2017
> On 05/25/2017 01:22 PM, Markus Trippelsdorf wrote:
> > On 2017.05.25 at 11:55 +0200, Martin Liška wrote:
> >> Hi.
> >>
> >> As I spoke about the PGO with Honza and Richi, current 3-stage is not ideal for following
> >> 2 reasons:
> >>
> >> 1) stageprofile compiler is train just on libraries that are built during stage2
> >> 2) apart from that, as the compiler is also used to build the final compiler, profile
> >> is being updated during the build. So the stage2 compiler is making different decisions.
> >>
> >> Both problems can be resolved by adding another step in between current stage2 and stage3
> >> where we train stage2 compiler by building compiler with default options.
> >>
> >> I'm going to do some measurements.
> >
> > I did some measurements on gcc67 (trunk with --enable-checking=release).
> > The apparent speedup is in the noise.
>
> Hello.
>
> Thanks for measurements:
>
> I can see difference for GCC 7.1:
>
> g++-7 tramp3d-v4.ii -O2 && time for i in `seq 1 10` ; do g++-7 tramp3d-v4.ii -O2 ; done
>
> before: 2m25.133s
> after: real 2m25.133s
>
> which is 99.09124426480228%. It's probably within a noise level.
>
> And apparently file size of binary is bugger:
>
> before (using bloaty):
>
> VM SIZE FILE SIZE
> -------------- --------------
> 59.0% 15.1Mi .text 15.1Mi 62.3%
> 21.3% 5.45Mi .rodata 5.45Mi 22.5%
> 6.6% 1.69Mi .eh_frame 1.69Mi 6.9%
> 5.4% 1.38Mi .bss 0 0.0%
> 3.3% 874Ki .dynstr 874Ki 3.5%
> 1.8% 480Ki .dynsym 480Ki 1.9%
> 1.1% 285Ki .eh_frame_hdr 285Ki 1.1%
> 0.6% 158Ki .gnu.hash 158Ki 0.6%
> 0.5% 144Ki .hash 144Ki 0.6%
> 0.2% 44.4Ki .data 44.4Ki 0.2%
> 0.2% 40.0Ki .gnu.version 40.0Ki 0.2%
> 0.0% 11.1Ki .rela.plt 11.1Ki 0.0%
> 0.0% 7.44Ki .plt 7.44Ki 0.0%
> 0.0% 4.56Ki .data.rel.ro 4.56Ki 0.0%
> 0.0% 3.73Ki .got.plt 3.73Ki 0.0%
> 0.0% 38 [Unmapped] 2.75Ki 0.0%
> 0.0% 624 [ELF Headers] 2.55Ki 0.0%
> 0.0% 848 [Other] 1.13Ki 0.0%
> 0.0% 917 .gcc_except_table 917 0.0%
> 0.0% 608 .dynamic 608 0.0%
> 0.0% 16 [None] 0 0.0%
> 100.0% 25.7Mi TOTAL 24.3Mi 100.0%
>
> after:
>
> VM SIZE FILE SIZE
> -------------- --------------
> 58.3% 14.6Mi .text 14.6Mi 54.2%
> 21.6% 5.41Mi .rodata 5.41Mi 20.1%
> 0.0% 0 .strtab 2.13Mi 7.9%
> 6.7% 1.67Mi .eh_frame 1.67Mi 6.2%
> 5.5% 1.38Mi .bss 0 0.0%
> 0.0% 0 .symtab 1.11Mi 4.1%
> 3.4% 876Ki .dynstr 876Ki 3.2%
> 1.9% 480Ki .dynsym 480Ki 1.7%
> 1.1% 280Ki .eh_frame_hdr 280Ki 1.0%
> 0.6% 158Ki .gnu.hash 158Ki 0.6%
> 0.6% 144Ki .hash 144Ki 0.5%
> 0.2% 44.4Ki .data 44.4Ki 0.2%
> 0.2% 40.1Ki .gnu.version 40.1Ki 0.1%
> 0.0% 11.1Ki .rela.plt 11.1Ki 0.0%
> 0.0% 7.44Ki .plt 7.44Ki 0.0%
> 0.0% 4.56Ki .data.rel.ro 4.56Ki 0.0%
> 0.0% 3.73Ki .got.plt 3.73Ki 0.0%
> 0.0% 58 [Unmapped] 3.11Ki 0.0%
> 0.0% 624 [ELF Headers] 2.61Ki 0.0%
> 0.0% 2.32Ki [Other] 2.60Ki 0.0%
> 0.0% 16 [None] 0 0.0%
> 100.0% 25.1Mi TOTAL 26.9Mi 100.0%
>
> As I had chat with Honza, we still have problem in GCC that using current working sets,
> get_hot_bb_threshold () is very close to number of runs, which is effectively 1 for a single
> run. That's mistake and that should be fixed.
Yep, with LTO+PGO bootstrap I think we also hit the problem that PGO inliner was never
seriously tuned (we basically use the very first badness metric I introduced and we never
experimented with parameters). The reason is that hot/cold partitioning even when it
is very coarsce does work reasonably well for per-file compilation model. With LTO we
are facing very many inline decisions and probably there is a lot of low hanging fruit.
GCC is currently on transition to new profile counter code. I will push out the initial
patch retiring gcov_type soon (once I finish updating it to current tree - it is very
anoying) and that will let us to track hotness more conservatively and fix the old
problem that count becomes unrealistically low by broken profile updates and thus
becomes cold. This should make it possible to increase the threshold and start with
re-tunning (hopefully this or next week)
Honza
>
> Martin
More information about the Gcc-patches
mailing list