This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Inliner heuristics TLC 1/n - let small function inlinng to ignore cold portion of program


This looks fine.  LIPO has similar change too.  Other directions worth
looking into:

1) To model icache effect better,  weighted callee size need to be
used with profile. The weight for BB may look like: min(1,
FREQ(BB)/FREQ(ENTRY)).
2) When function splitting is turned on, are any inline heuristic
changes are needed? E.g. only consider the hot code part of node for
unit growth computation?

We are also looking into more aggressive approach to track per loop
(inter-procedural) region growth limit, instead of using one single
global limit.

David

On Thu, Apr 17, 2014 at 10:52 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> I think for 4.10 we should revisit inliner behaviour to be more LTO and LTO+FDO
> ready. This is first of small patches I made to sanitize behaviour of current bounds.
>
> The main problem LTO brings is that we get way too many inline candidates. In per-file
> model one gets only small percentage of calls inlinable, since most of them go to other
> units, so our current heuristics behave quite well, inlining usually all calls that it
> consider benefical.
>
> With LTO almost all calls are inlinable and if we inline everything we consider
> profitable we get insane code size growths, so practically always we hit our
> 30% unit growth threshold.  This is not always a good idea.  Reducing
> inline-insns-auto/inline-insns-single to avoid inliner hitting the growth limit
> would cause a regression on benchmarks that needs inlining of large functions.
>
> LLVM seems to get around the problem by doing code expanding inlining at compile
> time (in equivalent of our early inliner). This makes functions big, so the LTO
> doesn't inline much, but it also misses useful cross-module inlines and replace
> them by less usefull inter-module.
>
> Other approach would be to have inline-insns-crossmodule that is significantly
> smaller than inline-insns-auto. We already have crossmodule hint that probably
> ought to be made smarter to not fire on COMDAT functions.
> I do not want to do it, since the numbers I collected in
> http://hubicka.blogspot.ca/2014/04/devirtualization-in-c-part-5-feedback.html
> suggest that inline-insns-auto is already quite bad limit.
>
> I would be happy to hear about alternative solutions to this.  We may want
> to switch whole program inliner into temperature style bound, like open64 does.
>
> Well, this patch actually goes bit different direction - making unit growth
> threashold more sane.
>
> While looking into inliner behaviour at Firefox to write my blog entry
> I noticed that with profile feedback only very small portion of the program
> is trained (15%) and only around 7% of code contains something that we consider
> hot.
>
> Inliner however still hits the inline-unit-growth limit with:
> Unit growth for small function inlining: 7232256->9220597 (27%)
> Inlined 183353 calls, eliminated 54652 function
>
> We do not grow the code in the cold portions of program, but because of
> the "dead" padding we grow everything we consider hot 4 times, instead
> of 1.3 times as we would usually do if it was unpadded.
>
> This patch fixes the problem by considering only non-cold functions for
> frequency calculation.  We now get:
>
> Unit growth for small function inlining: 2083217->2537163 (21%)
> Inlined 134611 calls, eliminated 53586 functions
>
> So while the relative growth is still close to 30%, the absolute
> growth is only 22% of the previous one.  We inline fewer calls but
> in the dynamic stats there is very minor (sub 0.01%) diference.
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
> Honza
>
>         * ipa-inline.c (inline_small_functions): Account only non-cold
>         functions.
>
>         * doc/invoke.texi (inline-unit-growth): Update documentation.
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c        (revision 209461)
> +++ ipa-inline.c        (working copy)
> @@ -1585,7 +1590,10 @@ inline_small_functions (void)
>             struct inline_summary *info = inline_summary (node);
>             struct ipa_dfs_info *dfs = (struct ipa_dfs_info *) node->aux;
>
> -           if (!DECL_EXTERNAL (node->decl))
> +           /* Do not account external functions, they will be optimized out
> +              if not inlined.  Also only count the non-cold portion of program.  */
> +           if (!DECL_EXTERNAL (node->decl)
> +               && node->frequency != NODE_FREQUENCY_UNLIKELY_EXECUTED)
>               initial_size += info->size;
>             info->growth = estimate_growth (node);
>             if (dfs && dfs->next_cycle)
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 209461)
> +++ doc/invoke.texi     (working copy)
> @@ -9409,7 +9409,8 @@ before applying @option{--param inline-u
>  @item inline-unit-growth
>  Specifies maximal overall growth of the compilation unit caused by inlining.
>  The default value is 30 which limits unit growth to 1.3 times the original
> -size.
> +size. Cold functions (either marked cold via an attribibute or by profile
> +feedback) are not accounted into the unit size.
>
>  @item ipcp-unit-growth
>  Specifies maximal overall growth of the compilation unit caused by


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]