This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Set inline-insns-single-O2 to 70


Hi,
this patch bumps inline-insns-single-O2 from 30 to 70.  I originally reduced
it from 120 to 50 when forking the -O2 and -O3 parameters which has
quite significant code size benefits.

This parameter controls how large functions user declared inline are inlined
(sadly we really can't inline all).

However while this transform is mostly SPEC netutral it has turned out to cause
performance regression for tramp3d, botan and some of Firefox benchmarks with
LTO.

I re-measured everything with 30, 50, 70 and 90 values as seen here:

https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba

Ignore the benzen results since it run only part of the tests.
Also ignore everything which is not with -Ofast. It is noise.

The off noise observations are:

 - 6% regression for Povray at 50, 70, 90 (6%).  This is bit of independent
   problem which I will treat independently
 - 4% improvement for gcc for 90
 - 4% improvement for xalancbmk for 90
 - 2% improvement for parest for 70, 90
 - 12% improvement for Deesjeng for 50, 70, 90

Most sensitive code size wise is xalanc, about 15% growth for 50+

To see code size one needs to click "Display all ELF stats", set minimum
threshold to 0.001 and click generate. Once page is fully loaded add 
total.*text to Filter.

The overall outcome is growth

                50    70    90
 spec 2006      0.51% 0.89% 1.12%
 spec 2006 LTO  0.34% 0.60% 0.79%
 spec 2017      2.06% 2.48% 2.57%

https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba

Short story  
 - many of botan tests like bumping limits up, about 1/3 of them all the way to
   90 (there was no improvments for 120).
 - nbench like increase to 50 and more
 - polyhedron ttf2 likes 50 and more
 - tramp3d likes 90 

I also run Firefox LTO benchmarks:

30 https://treeherder.mozilla.org/#/jobs?repo=try&revision=90a908c19de521482cad5ff864f8f67fec6dbc75
50 https://treeherder.mozilla.org/#/jobs?repo=try&revision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99
70 https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a7cf9728e4a952eff5190abeb72a4a95571d95d
90 https://treeherder.mozilla.org/#/jobs?repo=try&revision=5157552ce80419ed5bd0594668a53a13a04786d2

In all cases I used --param inline-unit-growth=12000 since this limit otherwise blocks
inliner before it gets into function sizes in question.  Code size is as folows:

libxul.so size:

30 103798151
50 108490103 (+4%)
70 114372911 (+10%)
90 116104639 (+11%)

Compares:
30 to 50: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5157552ce80419ed5bd0594668a53a13a04786d2&newProject=try&newRevision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99&framework=1
 (this shows almost nothing)
30 to 70: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=6a7cf9728e4a952eff5190abeb72a4a95571d95d&framework=1
 (here is 14% improvement for dormaeo benchamrk and 5% in overall
 responsiveness, there is regression in tsvgx/tresize which can be
 tracked down to quite low lever hand optimized code in SKIA graphics
 rendering library which does not define ALWAY_INLINE to always_inline
 for GCC (only for clang))
30 to 90: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=5157552ce80419ed5bd0594668a53a13a04786d2&framework=1
 (generally similar to previous one)

So most improvments shows up with 70 and 50 seems to not be enough to get
performance for Firefox.  We still lose on tramp3d and some of botan, but I
think this is generally -O3/-Ofast type of code so I hope it is acceptable.

The SPEC code sizes are not very realistic, since a lot of codebases are
Fortran or old C which do not use inlined keyword at all.  On the other hand
Firefox sizes are not realistic either (in other direction) since I disabled
the inline-unit-growth parameter.

I hope that once Martin get Tumbleweed builds with GCC 10 branch working, we
can verify how much this makes difference in larger scale.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

	* params.opt (inline-insns-single-O2): Bump from 30 to 70.

Index: params.opt
===================================================================
--- params.opt	(revision 278216)
+++ params.opt	(working copy)
@@ -487,7 +487,7 @@ Common Joined UInteger Var(param_max_inl
 The maximum number of instructions in a single function eligible for inlining with -O3 and -Ofast.
 
 -param=max-inline-insns-single-O2=
-Common Joined UInteger Var(param_max_inline_insns_single_o2) Init(30) Param
+Common Joined UInteger Var(param_max_inline_insns_single_o2) Init(70) Param
 The maximum number of instructions in a single function eligible for inlining.
 
 -param=max-inline-insns-size=


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]