This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Inliner heuristics TLC 1/n - let small function inlinng to ignore cold portion of program

I think for 4.10 we should revisit inliner behaviour to be more LTO and LTO+FDO
ready. This is first of small patches I made to sanitize behaviour of current bounds.

The main problem LTO brings is that we get way too many inline candidates. In per-file
model one gets only small percentage of calls inlinable, since most of them go to other
units, so our current heuristics behave quite well, inlining usually all calls that it
consider benefical.

With LTO almost all calls are inlinable and if we inline everything we consider
profitable we get insane code size growths, so practically always we hit our
30% unit growth threshold.  This is not always a good idea.  Reducing
inline-insns-auto/inline-insns-single to avoid inliner hitting the growth limit
would cause a regression on benchmarks that needs inlining of large functions.

LLVM seems to get around the problem by doing code expanding inlining at compile
time (in equivalent of our early inliner). This makes functions big, so the LTO
doesn't inline much, but it also misses useful cross-module inlines and replace
them by less usefull inter-module.

Other approach would be to have inline-insns-crossmodule that is significantly
smaller than inline-insns-auto. We already have crossmodule hint that probably
ought to be made smarter to not fire on COMDAT functions.
I do not want to do it, since the numbers I collected in
suggest that inline-insns-auto is already quite bad limit.

I would be happy to hear about alternative solutions to this.  We may want
to switch whole program inliner into temperature style bound, like open64 does.

Well, this patch actually goes bit different direction - making unit growth
threashold more sane.

While looking into inliner behaviour at Firefox to write my blog entry
I noticed that with profile feedback only very small portion of the program
is trained (15%) and only around 7% of code contains something that we consider

Inliner however still hits the inline-unit-growth limit with:
Unit growth for small function inlining: 7232256->9220597 (27%)
Inlined 183353 calls, eliminated 54652 function

We do not grow the code in the cold portions of program, but because of
the "dead" padding we grow everything we consider hot 4 times, instead
of 1.3 times as we would usually do if it was unpadded.

This patch fixes the problem by considering only non-cold functions for
frequency calculation.  We now get:

Unit growth for small function inlining: 2083217->2537163 (21%)
Inlined 134611 calls, eliminated 53586 functions

So while the relative growth is still close to 30%, the absolute
growth is only 22% of the previous one.  We inline fewer calls but
in the dynamic stats there is very minor (sub 0.01%) diference.

Bootstrapped/regtested x86_64-linux, will commit it shortly.


	* ipa-inline.c (inline_small_functions): Account only non-cold

	* doc/invoke.texi (inline-unit-growth): Update documentation.
Index: ipa-inline.c
--- ipa-inline.c	(revision 209461)
+++ ipa-inline.c	(working copy)
@@ -1585,7 +1590,10 @@ inline_small_functions (void)
 	    struct inline_summary *info = inline_summary (node);
 	    struct ipa_dfs_info *dfs = (struct ipa_dfs_info *) node->aux;
-	    if (!DECL_EXTERNAL (node->decl))
+	    /* Do not account external functions, they will be optimized out
+	       if not inlined.  Also only count the non-cold portion of program.  */
+	    if (!DECL_EXTERNAL (node->decl)
+		&& node->frequency != NODE_FREQUENCY_UNLIKELY_EXECUTED)
 	      initial_size += info->size;
 	    info->growth = estimate_growth (node);
 	    if (dfs && dfs->next_cycle)
Index: doc/invoke.texi
--- doc/invoke.texi	(revision 209461)
+++ doc/invoke.texi	(working copy)
@@ -9409,7 +9409,8 @@ before applying @option{--param inline-u
 @item inline-unit-growth
 Specifies maximal overall growth of the compilation unit caused by inlining.
 The default value is 30 which limits unit growth to 1.3 times the original
+size. Cold functions (either marked cold via an attribibute or by profile
+feedback) are not accounted into the unit size.
 @item ipcp-unit-growth
 Specifies maximal overall growth of the compilation unit caused by

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]