This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
-O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Mon, 16 Sep 2019 18:39:14 +0200
- Subject: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2
Hi,
as discussed on Cauldron this week I plan to push out changes enabling
-finline-functions at -O2 with limited parameters aiming to overal
better performance without large code size increases.
Currently we do inline agressively functions declared inline, we inline
when function size is expected to shrink and we also do limited
auto-inlining in early inliner for non-inline functions even if code
grows. This is handled by PARAM_EARLY_INLINING_INSNS.
This patch tunes it down or -O2 in order to get some room for real
IPA inliner to do its work.
Combined efect of my chages are in
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=ddee20190fa78935338bc3161c1b29b8528d82dd%2C9b247ee17d1030b88462531225cc842251507bb6
This involves further forking inline-insns-auto, inline-insns-single and
big-speedup params.
Generally I was able to mostly improve SPEC 2006 and 2017 scores as
follows:
O2 Kabylake
SPEC/SPEC2006/INT/total 0.58%
SPEC/SPEC2006/FP/total 0.19%
SPEC/SPEC2017/FP/total 0.45%
SPEC/SPEC2017/INT/total 0.18%
O2 LTO Kabylake
SPEC/SPEC2006/INT/total 1.08%
SPEC/SPEC2006/FP/total 0.60%
O2 Zen
SPEC/SPEC2006/INT/total 1.64%
SPEC/SPEC2006/FP/total 0.23%
SPEC/SPEC2017/INT/total -0.58%
SPEC/SPEC2017/FP/total 0.52%
O2 Zen LTO
SPEC/SPEC2006/FP/total 1.40%
SPEC/SPEC2006/INT/total 1.26%
SPEC/SPEC2017/INT/total 0.93%
SPEC/SPEC2017/FP/total -0.22%
The SPEC2017 FP on Zen is affected by 10% regression on CactusBSSN that
seems to be due to microarchitectural behaviour depending on code layout
rather than any inlining changes in hot parts of program. Other notable
regression is omnetpp that shows on Zen only too. Comparing Zen and
Kaby result it seems that only consistent loser id gcc (3%) a xalancbmk
(2.8%) both with non-LTO only. I plan to investigate those if regression
persists even though it is bit small and there is no obvious problem in
the backtrace.
Code size improves by 0.67% or SPEC2006 non-LTO and regresses by 1.64% with LTO
For 2017 it is 2.2% improvement and 2.4% regression respectively.
The difference between LTO and non-LTO is mostly due to fact that LTO
units tends to hit overall unit growth cap of inlining since there are
too many inline candidates. For this reason the patch is not as
effective on Firefox and other realy big packages as I would like. I
still plan number of changes to inliner this stage1 so this is not final
situation, but I think it is better to do the change early so it gets
tested on other architectures. (And it was concensus of the Caudlron
discussion by my understanding)
This patch is not enabling -finline-functions so it will temporarily
regress perofrmance (and improve code size). i am doing this in
incremental steps to get more data on both inliners.
Bootstrapped/regtested x86_64-linux, plan to commit it later today.
Honza
* ipa-inline.c (want_early_inline_function_p): Use
PARAM_EARLY_INLINING_INSNS_O2.
* params.def (PARAM_EARLY_INLINING_INSNS_O2): New.
(PARAM_EARLY_INLINING_INSNS): Update documentation.
* invoke.texi (early-inlining-insns-O2): New.
(early-inlining-insns): Update documentation.
Index: ipa-inline.c
===================================================================
--- ipa-inline.c (revision 275716)
+++ ipa-inline.c (working copy)
@@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr
{
int growth = estimate_edge_growth (e);
int n;
+ int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3
+ ? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)
+ : PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2);
+
if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE))
;
@@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr
growth);
want_inline = false;
}
- else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+ else if (growth > early_inlining_insns)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
" will not early inline: %C->%C, "
- "growth %i exceeds --param early-inlining-insns\n",
- e->caller, callee,
- growth);
+ "growth %i exceeds --param early-inlining-insns%s\n",
+ e->caller, callee, growth,
+ opt_for_fn (e->caller->decl, optimize) >= 3
+ ? "" : "-O2");
want_inline = false;
}
else if ((n = num_calls (callee)) != 0
- && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+ && growth * (n + 1) > early_inlining_insns)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
" will not early inline: %C->%C, "
- "growth %i exceeds --param early-inlining-insns "
+ "growth %i exceeds --param early-inlining-insns%s "
"divided by number of calls\n",
- e->caller, callee,
- growth);
+ e->caller, callee, growth,
+ opt_for_fn (e->caller->decl, optimize) >= 3
+ ? "" : "-O2");
want_inline = false;
}
}
Index: params.def
===================================================================
--- params.def (revision 275716)
+++ params.def (working copy)
@@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
10, 0, 0)
DEFPARAM(PARAM_EARLY_INLINING_INSNS,
"early-inlining-insns",
- "Maximal estimated growth of function body caused by early inlining of single call.",
+ "Maximal estimated growth of function body caused by early inlining of single call with -O3 and -Ofast.",
14, 0, 0)
+DEFPARAM(PARAM_EARLY_INLINING_INSNS_O2,
+ "early-inlining-insns-O2",
+ "Maximal estimated growth of function body caused by early inlining of single call with -O1 and -O2.",
+ 6, 0, 0)
DEFPARAM(PARAM_LARGE_STACK_FRAME,
"large-stack-frame",
"The size of stack frame to be considered large.",
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 275716)
+++ doc/invoke.texi (working copy)
@@ -11290,9 +11290,17 @@ recursion depth can be guessed from the
via a given call expression. This parameter limits inlining only to call
expressions whose probability exceeds the given threshold (in percents).
+@item early-inlining-insns-O2
+Specify growth that the early inliner can make. In effect it increases
+the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O1} or @option{-O2}
+optimization levels.
+
@item early-inlining-insns
Specify growth that the early inliner can make. In effect it increases
the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O3} or @option{-Ofast}
+optimization levels.
@item max-early-inliner-iterations
Limit of iterations of the early inliner. This basically bounds