Reduce inline limits a bit to compensate changes in inlining metrics
Richard Biener
rguenther@suse.de
Mon Feb 12 09:22:00 GMT 2018
On Fri, 9 Feb 2018, Jan Hubicka wrote:
> Hi,
> this patch addresses the code size regression by reducing
> max-inline-insns-auto 40->30 and increasing inline-min-speedup 8->15.
>
> The main reason why we need retuning is following
>
> - inline-min-speedup works in a way that if expected runtime
> of caller+calee combo after inlining reduces by more than 8%
> the inliner is going to bypass inline-insns-auto (because it knows the
> inlining is benefical rather than just inlining in hope it will be).
> The decrease either happens because callee is very lightweight at
> average or because we can track it will optimize well.
>
> During GCC 8 development I have switched time estimates from int to sreal.
> Original estimates was capping time to about 1000 instructions and thus
> large function rarely saw speedup because that was based comparing caped
> numbers. With sreals we can now track benefits better
>
> - We made quite some progress on early optimizations making function
> bodies to appear smaller to inliner which in turn inlines more of them.
> This is reason why we want to decrease inline-min-speedup to gain some code
> size back.
>
> The code size estimate difference at beggining of inlning is about 6% to
> gcc 6 and about 12% to gcc 4.9.
>
> I have benchmarked patch on Haswell SPEC2000, SPEC2006, polyhedron and our C++
> benchmarks. Here I found no off-noise changes on SPEC2000/2006. I know that
> reducing inline-insns-auto to 10 still produces no regressions and even
> improves facerec 6600->8000 but that seems bit of effect of good luck (it also
> depends on setting of branch predictor weights and needs to be analyzed
> independently). min-speedup can be increased to 30 without measurable effects
> as well.
>
> On C++ benchmark suite I know that cray degrades with min-speedup set to 30 (it
> needs value of 22). Also there is degradation with profile-generate on tramp3d.
>
> So overall I believe that for Haswell the reduction of inline limits is doing
> very consistent code size improvement without perofrmance tradeoffs.
>
> I also tested Itanium and here things are slightly more sensitive. The
> reduction of limits affects gzip 337->332 (-1.5%), vpr 1000->980 (-2%), crafty
> (925->935) (+2%) and vortex (1165->1180) (+1%). So overall it is specint2000
> neutral. Reducing inline-isns-auto to 10 brings off noise overall degradation
> by -1% and 20 is in-between.
>
> specfp2000 reacts positively by improving applu 520->525 (+1%) and mgrid
> 391->397 (+1.3%). It would let me to reduct inline-isns-auto to 10 without
> any other regressions.
>
> C++ benchmarks does not show any off-noise changes.
>
> I have also did some limited testing on ppc and arm. They reacted more similarly
> to Haswell also showing no important changes for reducing the inlining limits.
>
> Now reducing inline limits triggers failure of testsuite/g++.dg/pr83239.C
> which tests that inlining happens. The reason why it does not happen is
> becuae ipa-fnsplit is trying to second guess if inliner will evnetually consider
> function for inlining and the test is out of date. I decided to hack around
> it for stage4 and will try to clean these things up next stage1.
>
> Bootstraped/regtested x86_64-linux. I know it is late in stage4, but would it
> be OK to for GCC 8?
Ok.
Richard.
> PR middle-end/83665
> * params.def (inline-min-speedup): Increase from 8 to 15.
> (max-inline-insns-auto): Decrease from 40 to 30.
> * ipa-split.c (consider_split): Add some buffer for function to
> be considered inlining candidate.
> * invoke.texi (max-inline-insns-auto, inline-min-speedup): UPdate
> default values.
> Index: params.def
> ===================================================================
> --- params.def (revision 257520)
> +++ params.def (working copy)
> @@ -52,13 +52,13 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCO
> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
> "inline-min-speedup",
> "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-insns-auto.",
> - 8, 0, 0)
> + 15, 0, 0)
>
> /* The single function inlining limit. This is the maximum size
> of a function counted in internal gcc instructions (not in
> real machine instructions) that is eligible for inlining
> by the tree inliner.
> - The default value is 450.
> + The default value is 400.
> Only functions marked inline (or methods defined in the class
> definition for C++) are affected by this.
> There are more restrictions to inlining: If inlined functions
> @@ -77,11 +77,11 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SINGLE,
> that is applied to functions marked inlined (or defined in the
> class declaration in C++) given by the "max-inline-insns-single"
> parameter.
> - The default value is 40. */
> + The default value is 30. */
> DEFPARAM (PARAM_MAX_INLINE_INSNS_AUTO,
> "max-inline-insns-auto",
> "The maximum number of instructions when automatically inlining.",
> - 40, 0, 0)
> + 30, 0, 0)
>
> DEFPARAM (PARAM_MAX_INLINE_INSNS_RECURSIVE,
> "max-inline-insns-recursive",
> Index: ipa-split.c
> ===================================================================
> --- ipa-split.c (revision 257520)
> +++ ipa-split.c (working copy)
> @@ -558,10 +558,13 @@ consider_split (struct split_point *curr
> " Refused: split size is smaller than call overhead\n");
> return;
> }
> + /* FIXME: The logic here is not very precise, because inliner does use
> + inline predicates to reduce function body size. We add 10 to anticipate
> + that. Next stage1 we should try to be more meaningful here. */
> if (current->header_size + call_overhead
> >= (unsigned int)(DECL_DECLARED_INLINE_P (current_function_decl)
> ? MAX_INLINE_INSNS_SINGLE
> - : MAX_INLINE_INSNS_AUTO))
> + : MAX_INLINE_INSNS_AUTO) + 10)
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
> @@ -574,7 +577,7 @@ consider_split (struct split_point *curr
> Limit this duplication. This is consistent with limit in tree-sra.c
> FIXME: with LTO we ought to be able to do better! */
> if (DECL_ONE_ONLY (current_function_decl)
> - && current->split_size >= (unsigned int) MAX_INLINE_INSNS_AUTO)
> + && current->split_size >= (unsigned int) MAX_INLINE_INSNS_AUTO + 10)
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi (revision 257520)
> +++ doc/invoke.texi (working copy)
> @@ -10131,13 +10131,14 @@ a lot of functions that would otherwise
> by the compiler are investigated. To those functions, a different
> (more restrictive) limit compared to functions declared inline can
> be applied.
> -The default value is 40.
> +The default value is 30.
>
> @item inline-min-speedup
> When estimated performance improvement of caller + callee runtime exceeds this
> threshold (in percent), the function can be inlined regardless of the limit on
> @option{--param max-inline-insns-single} and @option{--param
> max-inline-insns-auto}.
> +The default value is 15.
>
> @item large-function-insns
> The limit specifying really large functions. For functions larger than this
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
More information about the Gcc-patches
mailing list