This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Enable inliner to bypass inline-insns-single/auto when it knows the performance will improve

with inliner predicates, the inliner heuristic now is able to prove that
some of the inlined function body will be optimized out after inlining.
This makes it possible to estimate the speedup that is now used to drive
the badness metric, but it is ignored in actual decision whether function
is inline candidate.

In general the decision on when to inline can be
 1) conservative on code size - when we know code will shrink it is almost
    surely a win
 2) uninformed guess - we can just inline and hope something will simplify.
    this makes sense to do with small enough function epsecially when user
    asks for -O3
 3) informed inline - we know somehting important will simplify.

We already have inline hints handling some cases of 3), like loop strides
and bounds.  This patch just adds the time based hint.
If speedup of runtme of caller+callee exceeds 10%, it is quite likely inlining
is win.
The inlining still may not happen in the end due to other inlining limits.

Bootstrapped/regtested x86_64-linux. Benchmarked on SPEC2k, SPEC2k6, C++
tests, polyhedron and Mozilla.  Largest single win is on c-ray where we now
inline ray_spehere because it will become loop invariant.  There are also
improvements on polyhedron and Mozilla.

Will commit it today or tomorrow depending on when autotesters will hit other


	PR middle-end/48636
	* ipa-inline.c (big_speedup_p): New function.
	(want_inline_small_function_p): Use it.
	(edge_badness): Dump it.
	* params.def (inline-min-speedup): New parameter.
	* doc/invoke.texi (inline-min-speedup): Document.

Index: doc/invoke.texi
*** doc/invoke.texi	(revision 193284)
--- doc/invoke.texi	(working copy)
*************** by the compiler are investigated.  To th
*** 8941,8946 ****
--- 8941,8952 ----
  be applied.
  The default value is 40.
+ @item inline-min-speedup
+ When estimated performance improvement of caller + callee runtime exceeds this
+ threshold (in precent), the function can be inlined regardless the limit on
+ @option{--param max-inline-insns-single} and @option{--param
+ max-inline-insns-auto}.
  @item large-function-insns
  The limit specifying really large functions.  For functions larger than this
  limit after inlining, inlining is constrained by
Index: ipa-inline.c
*** ipa-inline.c	(revision 193284)
--- ipa-inline.c	(working copy)
*************** compute_inlined_call_time (struct cgraph
*** 493,498 ****
--- 493,514 ----
    return time;
+ /* Return true if the speedup for inlining E is bigger than
+ static bool
+ big_speedup_p (struct cgraph_edge *e)
+ {
+   gcov_type time = compute_uninlined_call_time (inline_summary (e->callee),
+ 					  e);
+   gcov_type inlined_time = compute_inlined_call_time (e,
+ 					        estimate_edge_time (e));
+   if (time - inlined_time
+     return true;
+   return false;
+ }
  /* Return true if we are interested in inlining small function.
     When REPORT is true, report reason to dump file.  */
*************** want_inline_small_function_p (struct cgr
*** 514,519 ****
--- 530,536 ----
        int growth = estimate_edge_growth (e);
        inline_hints hints = estimate_edge_hints (e);
+       bool big_speedup = big_speedup_p (e);
        if (growth <= 0)
*************** want_inline_small_function_p (struct cgr
*** 521,526 ****
--- 538,544 ----
  	 hints suggests that inlining given function is very profitable.  */
        else if (DECL_DECLARED_INLINE_P (callee->symbol.decl)
  	       && growth >= MAX_INLINE_INSNS_SINGLE
+ 	       && !big_speedup
  	       && !(hints & (INLINE_HINT_indirect_call
  			     | INLINE_HINT_loop_iterations
  			     | INLINE_HINT_loop_stride)))
*************** want_inline_small_function_p (struct cgr
*** 574,579 ****
--- 592,598 ----
  	 Upgrade it to MAX_INLINE_INSNS_SINGLE when hints suggests that
  	 inlining given function is very profitable.  */
        else if (!DECL_DECLARED_INLINE_P (callee->symbol.decl)
+ 	       && !big_speedup
  	       && growth >= ((hints & (INLINE_HINT_indirect_call
  				       | INLINE_HINT_loop_iterations
  				       | INLINE_HINT_loop_stride))
*************** edge_badness (struct cgraph_edge *edge, 
*** 836,841 ****
--- 855,862 ----
        dump_inline_hints (dump_file, hints);
+       if (big_speedup_p (edge))
+ 	fprintf (dump_file, " big_speedup");
        fprintf (dump_file, "\n");
Index: params.def
--- params.def  (revision 193286)
+++ params.def  (working copy)
          "Maximal estimated outcome of branch considered predictable",
          2, 0, 50)
+         "inline-min-speedup",
+         "The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-isnsns-auto",
+         10, 0, 0)
 /* The single function inlining limit. This is the maximum size
    of a function counted in internal gcc instructions (not in
    real machine instructions) that is eligible for inlining

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]