This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fw: GSoC topic: Implement hot cold splitting at GIMPLE IR level


Aditya,
the hot/cold partitioning is currently organized as folows:

1) we have static branch prediction code in predict.c and profile
   feedback which we store into cfg and callgraph.
2) we have predicates

   optimize_*_for_speed_p/size_p

   where * can be function, basic block, cfg edge, callgraph edge, loop
   or loop nest
3) all optimization passes trading code size for speed should
   the corresponding predicaes about whether to do the transform.
4) ipa-split pass is mostly there to enable partial inlining
5) hot-cold partition in bb-reorder offlines stores
6) we have ipa-icf pass to identify functions and some code in FRE to
   identify code within single function. 
7) we do shrink wrapping to mitigate register pressure problems caused
   in cold regions of code

I think this is bit stronger to what llvm does currently which relies on
outlining SESE regions earlier rather than going through the pain of
implementing support for partitioning.  

Building clang10 with GCC and FDO leads to 37MB .text section
Building clang10 with GCC and LTO+FDO leads to 34MB .text section
Building clang10 with clang10 and FDO leads to 53MB .text section
Building clang10 with clang10 and thinlto+FDO leads to 67MB .text section

GCC built clang is about 2-3% faster building Firefox.

There are many things which I think could/should imporve in our
framework.
 1) our size optimization is very agressive (llvms -Oz) and enaling it
    based on heuristics may lead to huge regressions.  We probably want
    to have optimize_for_size_p predicate to have two levels and do less
    agressive mode in place we are not sure the code is really very
    cold.
 2) ipa-split is very simplistic and only splits when there is no value
    computed in header of function used in the tail.  We should support
    adding extra parameters for values computed and do more general SESE
    outlining

    Note that we do SESE outlining for openMP but this code is not
    interfaced very generically to be easilly used by ipa-split.

    Original implementation of ipa-split was kind of "first cut" trying
    to clean up interfaces to rest of the compiler and implement more
    fancy features later. This never happened so there is certainly
    space for imrovements here.

    We also do all splitting before actual IPA optimization while it may
    be more reasonable to identify potential split points and make IPA
    optimization to decide on transforms (currently we rely on inliner
    to inline back useless splits).
 3) function partitioning is enabled only for x86. I never had time to
    get it working on other targets and no-one picked up this task
 4) ipa-icf and in-function code merging is currently very conservative
    (I plan to work on this next stage1) comparing metadata like type
    based aliasing info.
 5) we have only very limited logic to detect cold regions without
    profile feedback and thus amount of offlined code is very small
    (this also is because of 1).
    We basically know that code leading to abort/exception handling etc
    is cold and consider everything else hot.
 6) We lack code placement pass (though Martin has WIP implementation of
    it)
 7) We do no partitioning of data segment which may be also interesting
    to do.
 8) Most of non-x86 backends do not implement very well the hot/cold
    code models and instruction choice.
 9) Shrink-wrapping and register allocation is not always able to move
    spilling to code paths but this is generally very hard problem to
    track.

So there are a lot of place for improvmeent (and I am sure more can be
found) and I would be happy to help you with them.

Honza
> 
> Hi Martin,
> Thank you for explaining the status quo. After reading the code of bb-reorder.c,
>  it looks pretty good and seems it doesn't need any significant improvements.
> In that case, the only value GIMPLE level hot/cold splitting could bring is to enable aggressive code-size optimization
> by merging of similar/identical functions: after outlining cold regions, they may be suitable candidates for function merging.
> ipa-split might be enabling some of that, having a region based function splitting could improve ipa-split.
> 
> -Aditya
> 
> 
> --
> From: Martin Liška <mliska@suse.cz>
> Sent: Tuesday, March 3, 2020 2:47 AM
> To: Aditya K <hiraditya@msn.com>; gcc@gcc.gnu.org <gcc@gcc.gnu.org>
> Cc: Jan Hubicka <hubicka@ucw.cz>
> Subject: Re: GSoC topic: Implement hot cold splitting at GIMPLE IR level
> 
> Hello.
> Thank you for idea. I would like to provide some comments about what GCC can currently
> do and I'm curious we need something extra on top of what we do.
> Right now, GCC can do hot/cold partitioning based on functions and basic blocks. With
> a PGO profile, the optimization is quite aggressive and can save quite some code
> being placed into a cold partitioning and being optimized for size. Without a profile,
> we do a static profile guess (predict.c), where we also propagate information about cold
> blocks (determine_unlikely_bbs). Later in RTL, we utilize the information and make
> the real reordering (bb-reorder.c).
> 
> Martin
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]