This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] rewrite incremental inlining heuristics


Hi,
the attached patch replaces non-unit-at-a-time inlining code by new incremental
inline decision code in cgraph.c.  The code use simple inlining algorithm
mostly equivalent to what GCC does with RTL inlining (believed to work better
than current tree-inlining code) and re-use most of inlinining infrastructure I
did for unit-at-a-time so it have same parameters except for overall unit
growth test removed (there is no way to test it before knowing compilation unit size).

It has two important limitations.  Firstly it always does only in-order
inlining even for functions compiled out-of-order.  This is simple
lazyness to avoid recursion in the decision process and can be removed
if considered important limitation.

Secondly the class instances produced by C++ frontend only when referenced are
not inlined first time because they are produced later.  Originally C++
frontend did instantiate these functions when tree_inlinable_p was called on
them.  We can restore the behaviour with frontend hook, but I am not 100%
convienced that this is needed given the fact that algorithm is used only for
-O0 and -O1.

I benchmarked part of SPEC testsuite (gcc, eon, vpr, perl) that work on my
128MB notebook and got mostly the same results as with old algorithm.  For C++
testcases consuming a lot of memory the situation seems to be slightly better,
but not too much (for some of these the overall unit growth parameter fire that
is not present here, perhaps we can use some kind of incremental notion of
this, I will look into this).  In general Gerald's testcases works fine, POOMA
testcases still require a lot of memory, but the memory scales lineraly with
increasing the inline limits, while originally it did scaled exponentially.
For tramp3d.ii the consumption with current inline limits goes down from 250MB
to 110MB, unit-at-a-time needs 20MB.  There is still something strange
happening as most memory is consumed by single function that is still not too
large, so backend seems to be very inefecient on it, I will investiagte why.

Bootstrappd/regtested i386, OK (for the cgraph*.* bits)?

Honza

Sat Oct 11 13:19:47 CEST 2003  Jan Hubicka  <jh@suse.cz>
	* cgraphunit.c (cgraph_decide_inlining_incrementally):  New function.
	(cgraph_finalize_function): Use it.
	(cgraph_mark_inline): Allow incrmental decisions
	* invoke.texi (max-inline-slope, min-inline-insns): Kill.
	* params.def (PARAM_MAX_INLINE_SLOPE, PARAM_MIN_INLINE_INSNS): Kill.
	* tree-inline.c (limits_allow_inlining): Kill.
	(expand_call_inline): Always use unit-at-a-time path.

*** gcc.old/cgraphunit.c	Sat Sep 27 23:30:30 2003
--- gcc/cgraphunit.c	Sat Oct 11 13:38:34 2003
*************** static void cgraph_mark_local_functions 
*** 49,60 ****
--- 49,61 ----
  static void cgraph_optimize_function (struct cgraph_node *);
  static bool cgraph_default_inline_p (struct cgraph_node *n);
  static void cgraph_analyze_function (struct cgraph_node *node);
+ static void cgraph_decide_inlining_incrementally (struct cgraph_node *);
  
  /* Statistics we collect about inlining algorithm.  */
  static int ncalls_inlined;
  static int nfunctions_inlined;
  static int initial_insns;
  static int overall_insns;
  
  /* Records tree nodes seen in cgraph_create_edges.  Simply using
     walk_tree_without_duplicates doesn't guarantee each node is visited
*************** cgraph_finalize_function (tree decl, boo
*** 206,212 ****
    /* If not unit at a time, then we need to create the call graph
       now, so that called functions can be queued and emitted now.  */
    if (!flag_unit_at_a_time)
!     cgraph_analyze_function (node);
  
    if (decide_is_function_needed (node, decl))
      cgraph_mark_needed_node (node);
--- 241,250 ----
    /* If not unit at a time, then we need to create the call graph
       now, so that called functions can be queued and emitted now.  */
    if (!flag_unit_at_a_time)
!     {
!       cgraph_analyze_function (node);
!       cgraph_decide_inlining_incrementally (node);
!     }
  
    if (decide_is_function_needed (node, decl))
      cgraph_mark_needed_node (node);
*************** cgraph_mark_inline (struct cgraph_node *
*** 849,854 ****
--- 887,893 ----
    to->global.insns = new_insns;
  
    if (!called && !what->needed && !what->origin
+       && flag_unit_at_a_time
        && !DECL_EXTERNAL (what->decl))
      {
        if (!what->global.will_be_output)
*************** cgraph_decide_inlining (void)
*** 1183,1188 ****
--- 1222,1280 ----
    free (inlined_callees);
  }
  
+ /* Decide on the inlining.  We do so in the topological order to avoid
+    expenses on updating datastructures.  */
+ 
+ static void
+ cgraph_decide_inlining_incrementally (struct cgraph_node *node)
+ {
+   struct cgraph_edge *e;
+   struct cgraph_node **inlined =
+     xmalloc (sizeof (struct cgraph_node *) * cgraph_n_nodes);
+   struct cgraph_node **inlined_callees =
+     xmalloc (sizeof (struct cgraph_node *) * cgraph_n_nodes);
+   int ninlined;
+   int ninlined_callees;
+   int y;
+ 
+   ninlined = cgraph_inlined_into (node, inlined);
+ 
+   /* First of all look for always inline functions.  */
+   for (e = node->callees; e; e = e->next_callee)
+     if (e->callee->local.disregard_inline_limits && !e->callee->output
+ 	&& e->callee != node && !e->inline_call)
+       {
+ 	ninlined_callees = cgraph_inlined_callees (e->callee, inlined_callees);
+ 	cgraph_mark_inline (node, e->callee, inlined, ninlined,
+ 			    inlined_callees, ninlined_callees);
+ 	for (y = 0; y < ninlined_callees; y++)
+ 	  inlined_callees[y]->output = 0, node->aux = 0;
+       }
+ 
+   /* Now do the automatic inlining.  */
+   for (e = node->callees; e; e = e->next_callee)
+     if (e->callee->local.inlinable && !e->callee->output
+ 	&& e->callee != node && !e->inline_call
+         && cgraph_default_inline_p (e->callee)
+ 	&& cgraph_check_inline_limits (node, e->callee, inlined,
+ 				       ninlined))
+       {
+ 	ninlined_callees = cgraph_inlined_callees (e->callee, inlined_callees);
+ 	cgraph_mark_inline (node, e->callee, inlined, ninlined,
+ 			    inlined_callees, ninlined_callees);
+ 	for (y = 0; y < ninlined_callees; y++)
+ 	  inlined_callees[y]->output = 0, node->aux = 0;
+       }
+ 
+   /* Clear the flags set by cgraph_inlined_into.  */
+   for (y = 0; y < ninlined; y++)
+     inlined[y]->output = 0, node->aux = 0;
+ 
+   free (inlined);
+   free (inlined_callees);
+ }
+ 
+ 
  /* Return true when CALLER_DECL should be inlined into CALLEE_DECL.  */
  
  bool
diff -rc3p gcc.old/doc/invoke.texi gcc/doc/invoke.texi
*** gcc.old/doc/invoke.texi	Sat Sep 27 23:30:29 2003
--- gcc/doc/invoke.texi	Sat Oct 11 13:06:27 2003
*************** larger binaries.  Very high values are n
*** 4738,4759 ****
  binaries may adversely affect runtime performance.
  The default value is 200.
  
- @item max-inline-slope
- After exceeding the maximum number of inlined instructions by repeated
- inlining, a linear function is used to decrease the allowable size
- for single functions.  The slope of that function is the negative
- reciprocal of the number specified here.
- This parameter is ignored when @option{-funit-at-a-time} is used.
- The default value is 32.
- 
- @item min-inline-insns
- The repeated inlining is throttled more and more by the linear function
- after exceeding the limit.  To avoid too much throttling, a minimum for
- this function is specified here to allow repeated inlining for very small
- functions even when a lot of repeated inlining already has been done.
- This parameter is ignored when @option{-funit-at-a-time} is used.
- The default value is 10.
- 
  @item large-function-insns
  The limit specifying really large functions.  For functions greater than this
  limit inlining is constrained by @option{--param large-function-growth}.
--- 4738,4743 ----
diff -rc3p gcc.old/flags.h gcc/flags.h
*** gcc.old/flags.h	Sat Sep 27 23:30:32 2003
--- gcc/flags.h	Sun Sep 28 00:52:20 2003
*************** extern int flag_signaling_nans;
*** 692,697 ****
--- 692,701 ----
  
  extern int flag_unit_at_a_time;
  
+ /* Nonzero means that we defer emitting functions until they are actually
+    used.  */
+ extern int flag_remove_unreachable_functions;
+ 
  /* A string that's used when a random name is required.  NULL means
     to make it really random.  */
  
diff -rc3p gcc.old/opts.c gcc/opts.c
*** gcc.old/opts.c	Sat Sep 27 23:30:39 2003
--- gcc/opts.c	Sat Oct 11 13:40:48 2003
*************** common_handle_option (size_t scode, cons
*** 1056,1068 ****
        set_param_value ("max-inline-insns-single", value / 2);
        set_param_value ("max-inline-insns-auto", value / 2);
        set_param_value ("max-inline-insns-rtl", value);
-       if (value / 4 < MIN_INLINE_INSNS)
- 	{
- 	  if (value / 4 > 10)
- 	    set_param_value ("min-inline-insns", value / 4);
- 	  else
- 	    set_param_value ("min-inline-insns", 10);
- 	}
        break;
  
      case OPT_finstrument_functions:
--- 1056,1061 ----
diff -rc3p gcc.old/params.def gcc/params.def
*** gcc.old/params.def	Sat Sep 27 23:30:39 2003
--- gcc/params.def	Sat Oct 11 13:06:04 2003
*************** DEFPARAM (PARAM_MAX_INLINE_INSNS,
*** 84,115 ****
  	  "The maximum number of instructions by repeated inlining before gcc starts to throttle inlining",
  	  200)
  
- /* After the repeated inline limit has been exceeded (see
-    "max-inline-insns" parameter), a linear function is used to
-    decrease the size of single functions eligible for inlining.
-    The slope of this linear function is given the negative
-    reciprocal value (-1/x) of this parameter. 
-    The default value is 32.
-    This linear function is used until it falls below a minimum
-    value specified by the "min-inline-insns" parameter.  */
- DEFPARAM (PARAM_MAX_INLINE_SLOPE,
- 	  "max-inline-slope",
- 	  "The slope of the linear function throttling inlining after the recursive inlining limit has been reached is given by the negative reciprocal value of this parameter",
- 	  32)
- 
- /* When gcc has inlined so many instructions (by repeated
-    inlining) that the throttling limits the inlining very much,
-    inlining for very small functions is still desirable to
-    achieve good runtime performance.  The size of single functions 
-    (measured in gcc instructions) which will still be eligible for 
-    inlining then is given by this parameter.  It defaults to 130.
-    Only much later (after exceeding 128 times the recursive limit)
-    inlining is cut down completely.  */
- DEFPARAM (PARAM_MIN_INLINE_INSNS,
- 	  "min-inline-insns",
- 	  "The number of instructions in a single functions still eligible to inlining after a lot recursive inlining",
- 	  10)
- 
  /* For languages that (still) use the RTL inliner, we can specify
     limits for the RTL inliner separately.
     The parameter here defines the maximum number of RTL instructions
--- 84,89 ----
Only in gcc: remove
diff -rc3p gcc.old/testsuite/g++.dg/opt/inline4.C gcc/testsuite/g++.dg/opt/inline4.C
*** gcc.old/testsuite/g++.dg/opt/inline4.C	Sat Sep 27 23:30:46 2003
--- gcc/testsuite/g++.dg/opt/inline4.C	Sat Oct 11 14:42:37 2003
***************
*** 1,4 ****
! // { dg-options "-O2 -ftemplate-depth-20000 --param min-inline-insns=100 --param max-inline-insns=3" }
  
  template <int I>
  inline void g() { g<I-1>(); return; }
--- 1,4 ----
! // { dg-options "-O2 -ftemplate-depth-20000" }
  
  template <int I>
  inline void g() { g<I-1>(); return; }
diff -rc3p gcc.old/testsuite/gcc.dg/inline-2.c gcc/testsuite/gcc.dg/inline-2.c
*** gcc.old/testsuite/gcc.dg/inline-2.c	Sat Sep 27 23:30:50 2003
--- gcc/testsuite/gcc.dg/inline-2.c	Sat Oct 11 13:42:13 2003
*************** static int foo(void)
*** 11,17 ****
  
  int bar(void)
  {
!   return foo() + 1;
  }
  
  /* { dg-final { scan-assembler-not "jsr" { target alpha*-*-* } } } */
--- 11,18 ----
  
  int bar(void)
  {
!   /* Call twice to avoid bypassing the limit for functions called once.  */
!   return foo() + foo() + 1;
  }
  
  /* { dg-final { scan-assembler-not "jsr" { target alpha*-*-* } } } */
diff -rc3p gcc.old/tree-inline.c gcc/tree-inline.c
*** gcc.old/tree-inline.c	Sat Sep 27 23:30:31 2003
--- gcc/tree-inline.c	Sat Oct 11 13:05:28 2003
*************** static tree copy_body (inline_data *);
*** 119,125 ****
  static tree expand_call_inline (tree *, int *, void *);
  static void expand_calls_inline (tree *, inline_data *);
  static bool inlinable_function_p (tree);
- static int limits_allow_inlining (tree, inline_data *);
  static tree remap_decl (tree, inline_data *);
  #ifndef INLINER_FOR_JAVA
  static tree initialize_inlined_parameters (inline_data *, tree, tree);
--- 119,124 ----
*************** inlinable_function_p (tree fn)
*** 1107,1203 ****
    return inlinable;
  }
  
- /* We can't inline functions that are too big.  Only allow a single
-    function to be of MAX_INLINE_INSNS_SINGLE size.  Make special
-    allowance for extern inline functions, though.
- 
-    Return nonzero if the function FN can be inlined into the inlining
-    context ID.  */
- 
- static int
- limits_allow_inlining (tree fn, inline_data *id)
- {
-   int estimated_insns = 0;
-   size_t i;
- 
-   /* Don't even bother if the function is not inlinable.  */
-   if (!inlinable_function_p (fn))
-     return 0;
- 
-   /* Investigate the size of the function.  Return at once
-      if the function body size is too large.  */
-   if (!(*lang_hooks.tree_inlining.disregard_inline_limits) (fn))
-     {
-       int currfn_max_inline_insns;
- 
-       /* If we haven't already done so, get an estimate of the number of
- 	 instructions that will be produces when expanding this function.  */
-       if (!DECL_ESTIMATED_INSNS (fn))
- 	DECL_ESTIMATED_INSNS (fn)
- 	  = (*lang_hooks.tree_inlining.estimate_num_insns) (fn);
-       estimated_insns = DECL_ESTIMATED_INSNS (fn);
- 
-       /* We may be here either because fn is declared inline or because
- 	 we use -finline-functions.  For the second case, we are more
- 	 restrictive.
- 
- 	 FIXME: -finline-functions should imply -funit-at-a-time, it's
- 		about equally expensive but unit-at-a-time produces
- 		better code.  */
-       currfn_max_inline_insns = DECL_DECLARED_INLINE_P (fn) ?
- 		MAX_INLINE_INSNS_SINGLE : MAX_INLINE_INSNS_AUTO;
- 
-       /* If the function is too big to be inlined, adieu.  */
-       if (estimated_insns > currfn_max_inline_insns)
- 	return 0;
- 
-       /* We now know that we don't disregard the inlining limits and that 
- 	 we basically should be able to inline this function.
- 	 We always allow inlining functions if we estimate that they are
- 	 smaller than MIN_INLINE_INSNS.  Otherwise, investigate further.  */
-       if (estimated_insns > MIN_INLINE_INSNS)
- 	{
- 	  int sum_insns = (id ? id->inlined_insns : 0) + estimated_insns;
- 
- 	  /* In the extreme case that we have exceeded the recursive inlining
- 	     limit by a huge factor (128), we just say no.
- 
- 	     FIXME:  Should not happen in real life, but people have reported
- 		     that it actually does!?  */
- 	  if (sum_insns > MAX_INLINE_INSNS * 128)
- 	    return 0;
- 
- 	  /* If we did not hit the extreme limit, we use a linear function
- 	     with slope -1/MAX_INLINE_SLOPE to exceedingly decrease the
- 	     allowable size.  */
- 	  else if (sum_insns > MAX_INLINE_INSNS)
- 	    {
- 	      if (estimated_insns > currfn_max_inline_insns
- 			- (sum_insns - MAX_INLINE_INSNS) / MAX_INLINE_SLOPE)
- 	        return 0;
- 	    }
- 	}
-     }
- 
-   /* Don't allow recursive inlining.  */
-   for (i = 0; i < VARRAY_ACTIVE_SIZE (id->fns); ++i)
-     if (VARRAY_TREE (id->fns, i) == fn)
-       return 0;
- 
-   if (DECL_INLINED_FNS (fn))
-     {
-       int j;
-       tree inlined_fns = DECL_INLINED_FNS (fn);
- 
-       for (j = 0; j < TREE_VEC_LENGTH (inlined_fns); ++j)
- 	if (TREE_VEC_ELT (inlined_fns, j) == VARRAY_TREE (id->fns, 0))
- 	  return 0;
-     }
- 
-   /* Go ahead, this function can be inlined.  */
-   return 1;
- }
- 
  /* If *TP is a CALL_EXPR, replace it with its inline expansion.  */
  
  static tree
--- 1106,1111 ----
*************** expand_call_inline (tree *tp, int *walk_
*** 1284,1291 ****
      return NULL_TREE;
  
    /* Turn forward declarations into real ones.  */
!   if (flag_unit_at_a_time)
!     fn = cgraph_node (fn)->decl;
  
    /* If fn is a declaration of a function in a nested scope that was
       globally declared inline, we don't set its DECL_INITIAL.
--- 1192,1198 ----
      return NULL_TREE;
  
    /* Turn forward declarations into real ones.  */
!   fn = cgraph_node (fn)->decl;
  
    /* If fn is a declaration of a function in a nested scope that was
       globally declared inline, we don't set its DECL_INITIAL.
*************** expand_call_inline (tree *tp, int *walk_
*** 1301,1309 ****
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if ((flag_unit_at_a_time
!        && (!DECL_SAVED_TREE (fn) || !cgraph_inline_p (id->current_decl, fn)))
!       || (!flag_unit_at_a_time && !limits_allow_inlining (fn, id)))
      {
        if (warn_inline && DECL_INLINE (fn) && DECL_DECLARED_INLINE_P (fn)
  	  && !DECL_IN_SYSTEM_HEADER (fn))
--- 1208,1214 ----
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if (!DECL_SAVED_TREE (fn) || !cgraph_inline_p (id->current_decl, fn))
      {
        if (warn_inline && DECL_INLINE (fn) && DECL_DECLARED_INLINE_P (fn)
  	  && !DECL_IN_SYSTEM_HEADER (fn))
*************** expand_call_inline (tree *tp, int *walk_
*** 1541,1547 ****
    id->inlined_insns += DECL_ESTIMATED_INSNS (fn) - 1;
  
    /* Update callgraph if needed.  */
!   if (id->decl && flag_unit_at_a_time)
      {
        cgraph_remove_call (id->decl, fn);
        cgraph_create_edges (id->decl, *inlined_body);
--- 1446,1452 ----
    id->inlined_insns += DECL_ESTIMATED_INSNS (fn) - 1;
  
    /* Update callgraph if needed.  */
!   if (id->decl)
      {
        cgraph_remove_call (id->decl, fn);
        cgraph_create_edges (id->decl, *inlined_body);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]