This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Set inline-insns-single-O2 to 70
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Thu, 14 Nov 2019 13:38:19 +0100
- Subject: Set inline-insns-single-O2 to 70
Hi,
this patch bumps inline-insns-single-O2 from 30 to 70. I originally reduced
it from 120 to 50 when forking the -O2 and -O3 parameters which has
quite significant code size benefits.
This parameter controls how large functions user declared inline are inlined
(sadly we really can't inline all).
However while this transform is mostly SPEC netutral it has turned out to cause
performance regression for tramp3d, botan and some of Firefox benchmarks with
LTO.
I re-measured everything with 30, 50, 70 and 90 values as seen here:
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba
Ignore the benzen results since it run only part of the tests.
Also ignore everything which is not with -Ofast. It is noise.
The off noise observations are:
- 6% regression for Povray at 50, 70, 90 (6%). This is bit of independent
problem which I will treat independently
- 4% improvement for gcc for 90
- 4% improvement for xalancbmk for 90
- 2% improvement for parest for 70, 90
- 12% improvement for Deesjeng for 50, 70, 90
Most sensitive code size wise is xalanc, about 15% growth for 50+
To see code size one needs to click "Display all ELF stats", set minimum
threshold to 0.001 and click generate. Once page is fully loaded add
total.*text to Filter.
The overall outcome is growth
50 70 90
spec 2006 0.51% 0.89% 1.12%
spec 2006 LTO 0.34% 0.60% 0.79%
spec 2017 2.06% 2.48% 2.57%
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba
Short story
- many of botan tests like bumping limits up, about 1/3 of them all the way to
90 (there was no improvments for 120).
- nbench like increase to 50 and more
- polyhedron ttf2 likes 50 and more
- tramp3d likes 90
I also run Firefox LTO benchmarks:
30 https://treeherder.mozilla.org/#/jobs?repo=try&revision=90a908c19de521482cad5ff864f8f67fec6dbc75
50 https://treeherder.mozilla.org/#/jobs?repo=try&revision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99
70 https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a7cf9728e4a952eff5190abeb72a4a95571d95d
90 https://treeherder.mozilla.org/#/jobs?repo=try&revision=5157552ce80419ed5bd0594668a53a13a04786d2
In all cases I used --param inline-unit-growth=12000 since this limit otherwise blocks
inliner before it gets into function sizes in question. Code size is as folows:
libxul.so size:
30 103798151
50 108490103 (+4%)
70 114372911 (+10%)
90 116104639 (+11%)
Compares:
30 to 50: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5157552ce80419ed5bd0594668a53a13a04786d2&newProject=try&newRevision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99&framework=1
(this shows almost nothing)
30 to 70: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=6a7cf9728e4a952eff5190abeb72a4a95571d95d&framework=1
(here is 14% improvement for dormaeo benchamrk and 5% in overall
responsiveness, there is regression in tsvgx/tresize which can be
tracked down to quite low lever hand optimized code in SKIA graphics
rendering library which does not define ALWAY_INLINE to always_inline
for GCC (only for clang))
30 to 90: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=5157552ce80419ed5bd0594668a53a13a04786d2&framework=1
(generally similar to previous one)
So most improvments shows up with 70 and 50 seems to not be enough to get
performance for Firefox. We still lose on tramp3d and some of botan, but I
think this is generally -O3/-Ofast type of code so I hope it is acceptable.
The SPEC code sizes are not very realistic, since a lot of codebases are
Fortran or old C which do not use inlined keyword at all. On the other hand
Firefox sizes are not realistic either (in other direction) since I disabled
the inline-unit-growth parameter.
I hope that once Martin get Tumbleweed builds with GCC 10 branch working, we
can verify how much this makes difference in larger scale.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
* params.opt (inline-insns-single-O2): Bump from 30 to 70.
Index: params.opt
===================================================================
--- params.opt (revision 278216)
+++ params.opt (working copy)
@@ -487,7 +487,7 @@ Common Joined UInteger Var(param_max_inl
The maximum number of instructions in a single function eligible for inlining with -O3 and -Ofast.
-param=max-inline-insns-single-O2=
-Common Joined UInteger Var(param_max_inline_insns_single_o2) Init(30) Param
+Common Joined UInteger Var(param_max_inline_insns_single_o2) Init(70) Param
The maximum number of instructions in a single function eligible for inlining.
-param=max-inline-insns-size=