This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level
Aditya,
the hot/cold partitioning is currently organized as folows:
1) we have static branch prediction code in predict.c and profile
feedback which we store into cfg and callgraph.
2) we have predicates
optimize_*_for_speed_p/size_p
where * can be function, basic block, cfg edge, callgraph edge, loop
or loop nest
3) all optimization passes trading code size for speed should
the corresponding predicaes about whether to do the transform.
4) ipa-split pass is mostly there to enable partial inlining
5) hot-cold partition in bb-reorder offlines stores
6) we have ipa-icf pass to identify functions and some code in FRE to
identify code within single function.
7) we do shrink wrapping to mitigate register pressure problems caused
in cold regions of code
I think this is bit stronger to what llvm does currently which relies on
outlining SESE regions earlier rather than going through the pain of
implementing support for partitioning.
Building clang10 with GCC and FDO leads to 37MB .text section
Building clang10 with GCC and LTO+FDO leads to 34MB .text section
Building clang10 with clang10 and FDO leads to 53MB .text section
Building clang10 with clang10 and thinlto+FDO leads to 67MB .text section
GCC built clang is about 2-3% faster building Firefox.
There are many things which I think could/should imporve in our
framework.
1) our size optimization is very agressive (llvms -Oz) and enaling it
based on heuristics may lead to huge regressions. We probably want
to have optimize_for_size_p predicate to have two levels and do less
agressive mode in place we are not sure the code is really very
cold.
2) ipa-split is very simplistic and only splits when there is no value
computed in header of function used in the tail. We should support
adding extra parameters for values computed and do more general SESE
outlining
Note that we do SESE outlining for openMP but this code is not
interfaced very generically to be easilly used by ipa-split.
Original implementation of ipa-split was kind of "first cut" trying
to clean up interfaces to rest of the compiler and implement more
fancy features later. This never happened so there is certainly
space for imrovements here.
We also do all splitting before actual IPA optimization while it may
be more reasonable to identify potential split points and make IPA
optimization to decide on transforms (currently we rely on inliner
to inline back useless splits).
3) function partitioning is enabled only for x86. I never had time to
get it working on other targets and no-one picked up this task
4) ipa-icf and in-function code merging is currently very conservative
(I plan to work on this next stage1) comparing metadata like type
based aliasing info.
5) we have only very limited logic to detect cold regions without
profile feedback and thus amount of offlined code is very small
(this also is because of 1).
We basically know that code leading to abort/exception handling etc
is cold and consider everything else hot.
6) We lack code placement pass (though Martin has WIP implementation of
it)
7) We do no partitioning of data segment which may be also interesting
to do.
8) Most of non-x86 backends do not implement very well the hot/cold
code models and instruction choice.
9) Shrink-wrapping and register allocation is not always able to move
spilling to code paths but this is generally very hard problem to
track.
So there are a lot of place for improvmeent (and I am sure more can be
found) and I would be happy to help you with them.
Honza
>
> Hi Martin,
> Thank you for explaining the status quo. After reading the code of bb-reorder.c,
> it looks pretty good and seems it doesn't need any significant improvements.
> In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization
> by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging.
> ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split.
>
> -Aditya
>
>
> --
> From: Martin Liška <mliska@suse.cz>
> Sent: Tuesday, March 3, 2020 2:47 AM
> To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
> Cc: Jan Hubicka <hubicka@ucw.cz>
> Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level
>
> Hello.
> Thank you for idea. I would like to provide some comments about what GCC can currently
> do and I'm curious we need something extra on top of what we do.
> Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With
> a PGO profile, the optimization is quite aggressive and can save quite some code
> being placed into a cold partitioning and being optimized for size. Without a profile,
> we do a static profile guess (predict.c), where we also propagate information about cold
> blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make
> the real reordering (bb-reorder.c).
>
> Martin
>
>
>