This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[RFC/patch] Callgraph based inlining heuristics

From: Jan Hubicka <jh at suse dot cz>
To: gcc at gcc dot gnu dot org, mark at codesourcery dot com, rth at redhat dot com,aj at suse dot de
Date: Mon, 23 Jun 2003 01:20:32 +0200
Subject: [RFC/patch] Callgraph based inlining heuristics
Hi,
I finally got C++ unit-at-a-time into shape so it don't consume all my free
time and got into implementing first try of my proposed inlining heuristics
that uses callgraph to decide on inlining top-down while inline using current
code bottom-up hopefully getting better from the both worlds.

I am attaching somewhat raw implementation of the idea.  At the moment I am
using cgraph edges to flag on whether particular inlining will happen so
function can be inlinined into once caller but not inlined into another, but
the decisions must be transitive.

(that is not the case of attribute leafify proposed - doing the datastructure
in full generality allowing each particular walk in the callgraph  to be either
collapsed or not is probably too expensive so attribute leafify can be
implemented as a special case).

I sort the functions topologically and then do 3 stages - in the first stage I
set inline to all callers of always_inline functions watching for the cycles.
After flag is set the size of function bodies is update by adding the size of
inlined body minus 1.  (this is the same as estimation in tree-inline.c.  Not
very good.  We can consider choosing functions that may inline and trying to
analyze them harder, in the extreme case even produce rtl and profile and
copmpute average amount of insns executed).

In the second stage I proceed topologically again inline small functions
updating the size of bodies and in the third stage I decide on inlining once.

I've added new constant specifying large functions (30000 statements) and I am
limiting growth of the large functions by factor of 2.  This avoids quadratic
bottleneck seen on insn-recog during bootstrap.

I tested libstdc++ compilation with -O2 and get 10% savings of the .so library
size (with debug info in) and about 5% compiler speedup.  I also tried PR/8362 testcase.
With -O2 -fno-unit-at-a-time I get:

 garbage collection    :  14.43 (11%) usr   0.03 ( 1%) sys  14.49 (11%) wall
 cfg construction      :   1.83 ( 1%) usr   0.00 ( 0%) sys   1.83 ( 1%) wall
 cfg cleanup           :   2.79 ( 2%) usr   0.01 ( 0%) sys   2.80 ( 2%) wall
 trivially dead code   :   1.95 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) wall
 life analysis         :   4.47 ( 3%) usr   0.00 ( 0%) sys   4.47 ( 3%) wall
 life info update      :   1.91 ( 1%) usr   0.00 ( 0%) sys   1.91 ( 1%) wall
 alias analysis        :   3.03 ( 2%) usr   0.05 ( 2%) sys   3.08 ( 2%) wall
 register scan         :   1.16 ( 1%) usr   0.01 ( 0%) sys   1.17 ( 1%) wall
 rebuild jump labels   :   0.80 ( 1%) usr   0.00 ( 0%) sys   0.81 ( 1%) wall
 preprocessing         :   0.39 ( 0%) usr   0.06 ( 2%) sys   0.44 ( 0%) wall
 parser                :  17.02 (13%) usr   1.23 (43%) sys  18.32 (13%) wall
 name lookup           :   8.08 ( 6%) usr   0.84 (29%) sys   8.92 ( 6%) wall
 expand                :  13.09 (10%) usr   0.12 ( 4%) sys  13.26 (10%) wall
 varconst              :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall
 integration           :   4.55 ( 3%) usr   0.03 ( 1%) sys   4.60 ( 3%) wall
 jump                  :   0.76 ( 1%) usr   0.01 ( 0%) sys   0.77 ( 1%) wall
 CSE                   :  13.96 (10%) usr   0.01 ( 0%) sys  14.01 (10%) wall
 global CSE            :   7.38 ( 5%) usr   0.01 ( 0%) sys   7.39 ( 5%) wall
 loop analysis         :   1.00 ( 1%) usr   0.01 ( 0%) sys   1.01 ( 1%) wall
 bypass jumps          :   1.12 ( 1%) usr   0.01 ( 0%) sys   1.13 ( 1%) wall
 CSE 2                 :   5.69 ( 4%) usr   0.01 ( 0%) sys   5.72 ( 4%) wall
 branch prediction     :   1.45 ( 1%) usr   0.01 ( 0%) sys   1.46 ( 1%) wall
 flow analysis         :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall
 combiner              :   3.54 ( 3%) usr   0.00 ( 0%) sys   3.55 ( 3%) wall
 if-conversion         :   0.37 ( 0%) usr   0.00 ( 0%) sys   0.37 ( 0%) wall
 regmove               :   0.84 ( 1%) usr   0.00 ( 0%) sys   0.84 ( 1%) wall
 local alloc           :   2.53 ( 2%) usr   0.02 ( 1%) sys   2.55 ( 2%) wall
 global alloc          :   5.47 ( 4%) usr   0.05 ( 2%) sys   5.52 ( 4%) wall
 reload CSE regs       :   2.24 ( 2%) usr   0.00 ( 0%) sys   2.24 ( 2%) wall
 flow 2                :   0.64 ( 0%) usr   0.00 ( 0%) sys   0.64 ( 0%) wall
 if-conversion 2       :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall
 peephole 2            :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall
 rename registers      :   3.18 ( 2%) usr   0.01 ( 0%) sys   3.19 ( 2%) wall
 scheduling 2          :   3.23 ( 2%) usr   0.10 ( 3%) sys   3.34 ( 2%) wall
 reorder blocks        :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall
 shorten branches      :   0.56 ( 0%) usr   0.02 ( 1%) sys   0.58 ( 0%) wall
 final                 :   1.02 ( 1%) usr   0.16 ( 6%) sys   1.18 ( 1%) wall
 symout                :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 rest of compilation   :   2.53 ( 2%) usr   0.04 ( 1%) sys   2.58 ( 2%) wall
 TOTAL                 : 134.67             2.86           137.79

With -O2 -funit-at-a-time I get:

Execution times (seconds)
 garbage collection    :  10.21 (11%) usr   0.05 ( 2%) sys  10.29 (11%) wall
 cfg construction      :   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 1%) wall
 cfg cleanup           :   1.38 ( 1%) usr   0.01 ( 0%) sys   1.39 ( 1%) wall
 trivially dead code   :   1.03 ( 1%) usr   0.01 ( 0%) sys   1.04 ( 1%) wall
 life analysis         :   3.29 ( 4%) usr   0.00 ( 0%) sys   3.35 ( 3%) wall
 life info update      :   1.52 ( 2%) usr   0.01 ( 0%) sys   1.53 ( 2%) wall
 alias analysis        :   1.76 ( 2%) usr   0.01 ( 0%) sys   1.77 ( 2%) wall
 register scan         :   0.64 ( 1%) usr   0.00 ( 0%) sys   0.64 ( 1%) wall
 rebuild jump labels   :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall
 preprocessing         :   0.42 ( 0%) usr   0.05 ( 2%) sys   0.49 ( 1%) wall
 parser                :  18.16 (20%) usr   1.20 (38%) sys  19.48 (20%) wall
 name lookup           :   7.03 ( 8%) usr   1.08 (34%) sys   8.14 ( 8%) wall
 expand                :   4.37 ( 5%) usr   0.05 ( 2%) sys   4.43 ( 5%) wall
 varconst              :   2.86 ( 3%) usr   0.43 (13%) sys   3.31 ( 3%) wall
 integration           :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall
 jump                  :   0.37 ( 0%) usr   0.01 ( 0%) sys   0.37 ( 0%) wall
 CSE                   :   8.19 ( 9%) usr   0.00 ( 0%) sys   8.21 ( 9%) wall
 global CSE            :   3.19 ( 3%) usr   0.00 ( 0%) sys   3.20 ( 3%) wall
 loop analysis         :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 1%) wall
 bypass jumps          :   0.78 ( 1%) usr   0.00 ( 0%) sys   0.79 ( 1%) wall
 CSE 2                 :   4.07 ( 4%) usr   0.00 ( 0%) sys   4.07 ( 4%) wall
 branch prediction     :   0.87 ( 1%) usr   0.00 ( 0%) sys   0.87 ( 1%) wall
 flow analysis         :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall
 combiner              :   2.15 ( 2%) usr   0.01 ( 0%) sys   2.16 ( 2%) wall
 if-conversion         :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall
 regmove               :   0.52 ( 1%) usr   0.00 ( 0%) sys   0.52 ( 1%) wall
 local alloc           :   1.79 ( 2%) usr   0.00 ( 0%) sys   1.79 ( 2%) wall
 global alloc          :   4.15 ( 4%) usr   0.01 ( 0%) sys   4.16 ( 4%) wall
 reload CSE regs       :   1.81 ( 2%) usr   0.00 ( 0%) sys   2.17 ( 2%) wall
 flow 2                :   0.40 ( 0%) usr   0.01 ( 0%) sys   0.41 ( 0%) wall
 if-conversion 2       :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall
 peephole 2            :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall
 rename registers      :   3.05 ( 3%) usr   0.00 ( 0%) sys   3.05 ( 3%) wall
 scheduling 2          :   2.87 ( 3%) usr   0.01 ( 0%) sys   2.88 ( 3%) wall
 reorder blocks        :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall
 shorten branches      :   0.28 ( 0%) usr   0.02 ( 1%) sys   0.30 ( 0%) wall
 reg stack             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 final                 :   0.93 ( 1%) usr   0.19 ( 6%) sys   1.13 ( 1%) wall
 symout                :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 rest of compilation   :   1.33 ( 1%) usr   0.00 ( 0%) sys   1.33 ( 1%) wall
 TOTAL                 :  92.61             3.19            96.50

And I get 24% fewer instructions produced.

I also tested it on some of C SPEC2000 sources and it does not seem to supress
any important inlining.  I hope to do some behchmarking tomorrow as I need some
sleep now.

So :) any comments are welcome.  Also I would like to know about other
testcases where inliner behaves in broken way.

Honza

*** cgraph.c.in	Sun Jun 22 23:47:35 2003
--- cgraph.c	Mon Jun 23 00:26:38 2003
*************** create_edge (caller, callee)
*** 160,165 ****
--- 160,179 ----
       struct cgraph_node *caller, *callee;
  {
    struct cgraph_edge *edge = xmalloc (sizeof (struct cgraph_edge));
+   struct cgraph_edge *edge2;
+ 
+   edge->inline_call = false;
+   /* At the moment we don't associate calls with specific CALL_EXPRs
+      as we probably ought to, so we must preserve inline_call flags to
+      be the same in all copies of the same edge.  */
+   if (cgraph_global_info_ready)
+     for (edge2 = caller->callees; edge2; edge2 = edge2->next_caller)
+       if (edge2->callee == callee)
+ 	{
+ 	  edge->inline_call = edge2->inline_call;
+ 	  break;
+ 	}
+ 
  
    edge->caller = caller;
    edge->callee = callee;
*** cgraph.h.in	Sun Jun 22 21:56:33 2003
--- cgraph.h	Mon Jun 23 01:36:16 2003
*************** struct cgraph_local_info
*** 30,40 ****
    /* Set when function function is visiable in current compilation unit only
       and it's address is never taken.  */
    bool local;
!   /* Set when function is small enought to be inlinable many times.  */
!   bool inline_many;
!   /* Set when function can be inlined once (false only for functions calling
!      alloca, using varargs and so on).  */
!   bool can_inline_once;
  };
  
  /* Information about the function that needs to be computed globally
--- 30,42 ----
    /* Set when function function is visiable in current compilation unit only
       and it's address is never taken.  */
    bool local;
! 
!   /* False when there is something making inlining impossible (such as va_arg) */
!   bool inlinable;
!   /* True when function should be inlined independently on it's size.  */
!   bool disgread_inline_limits;
!   /* Size of the function before inlining.  */
!   int self_insns;
  };
  
  /* Information about the function that needs to be computed globally
*************** struct cgraph_global_info
*** 44,49 ****
--- 46,54 ----
  {
    /* Set when the function will be inlined exactly once.  */
    bool inline_once;
+ 
+   /* Estimated size of the function after inlining.  */
+   int insns;
  };
  
  /* Information about the function that is propagated by the RTL backend.
*************** struct cgraph_edge
*** 95,100 ****
--- 100,106 ----
    struct cgraph_node *caller, *callee;
    struct cgraph_edge *next_caller;
    struct cgraph_edge *next_callee;
+   bool inline_call;
  };
  
  extern struct cgraph_node *cgraph_nodes;
*************** void cgraph_finalize_compilation_unit	PA
*** 120,124 ****
--- 126,131 ----
  void cgraph_create_edges		PARAMS ((tree, tree));
  void cgraph_optimize			PARAMS ((void));
  void cgraph_mark_needed_node		PARAMS ((struct cgraph_node *, int));
+ bool cgraph_inline_p			PARAMS ((tree, tree));
  
  #endif  /* GCC_CGRAPH_H  */
*** cgraphunit.c.in	Sun Jun 22 21:56:26 2003
--- cgraphunit.c	Mon Jun 23 01:38:47 2003
*************** Software Foundation, 59 Temple Place - S
*** 35,47 ****
  #include "cgraph.h"
  #include "varpool.h"
  #include "diagnostic.h"
  
  static void cgraph_expand_functions PARAMS ((void));
  static void cgraph_mark_functions_to_output PARAMS ((void));
  static void cgraph_expand_function PARAMS ((struct cgraph_node *));
  static tree record_call_1 PARAMS ((tree *, int *, void *));
  static void cgraph_mark_local_functions PARAMS ((void));
- static void cgraph_mark_functions_to_inline_once PARAMS ((void));
  static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
  
  /* Analyze function once it is parsed.  Set up the local information
--- 35,53 ----
  #include "cgraph.h"
  #include "varpool.h"
  #include "diagnostic.h"
+ #include "c-common.h"
+ #include "params.h"
+ 
+ #define INSNS_PER_STMT 10
+ #define INSNS_PER_CALL 10
+ #define LARGE_FUNCTION_INSNS 30000
+ #define LARGE_FUNCTION_GROWTH 200
  
  static void cgraph_expand_functions PARAMS ((void));
  static void cgraph_mark_functions_to_output PARAMS ((void));
  static void cgraph_expand_function PARAMS ((struct cgraph_node *));
  static tree record_call_1 PARAMS ((tree *, int *, void *));
  static void cgraph_mark_local_functions PARAMS ((void));
  static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
  
  /* Analyze function once it is parsed.  Set up the local information
*************** cgraph_finalize_function (decl, body)
*** 75,88 ****
        cgraph_mark_needed_node (node, 1);
      }
  
!   if (!node->needed && !DECL_COMDAT (node->decl))
!     node->local.can_inline_once = tree_inlinable_function_p (decl, 1);
!   else
!     node->local.can_inline_once = 0;
!   if (flag_inline_trees)
!     node->local.inline_many = tree_inlinable_function_p (decl, 0);
!   else
!     node->local.inline_many = 0;
  
    (*debug_hooks->deferred_inline_function) (decl);
  }
--- 81,91 ----
        cgraph_mark_needed_node (node, 1);
      }
  
!   node->local.inlinable = tree_inlinable_function_p (decl, 1);
!   node->local.self_insns = DECL_NUM_STMTS (decl) * INSNS_PER_STMT;
!   if (node->local.inlinable)
!     node->local.disgread_inline_limits
!       = (*lang_hooks.tree_inlining.disregard_inline_limits) (decl);
  
    (*debug_hooks->deferred_inline_function) (decl);
  }
*************** cgraph_mark_functions_to_output ()
*** 231,244 ****
    for (node = cgraph_nodes; node; node = node->next)
      {
        tree decl = node->decl;
  
        /* We need to output all local functions that are used and not
  	 always inlined, as well as those that are reachable from
  	 outside the current compilation unit.  */
        if (DECL_SAVED_TREE (decl)
  	  && (node->needed
! 	      || (!node->local.inline_many && !node->global.inline_once
! 		  && node->reachable)
  	      || (DECL_ASSEMBLER_NAME_SET_P (decl)
  	          && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
  	  && !TREE_ASM_WRITTEN (decl) && !node->origin
--- 234,251 ----
    for (node = cgraph_nodes; node; node = node->next)
      {
        tree decl = node->decl;
+       struct cgraph_edge *e;
+ 
+       for (e = node->callers; e; e = e->next_caller)
+ 	if (!e->inline_call)
+ 	  break;
  
        /* We need to output all local functions that are used and not
  	 always inlined, as well as those that are reachable from
  	 outside the current compilation unit.  */
        if (DECL_SAVED_TREE (decl)
  	  && (node->needed
! 	      || (e && node->reachable)
  	      || (DECL_ASSEMBLER_NAME_SET_P (decl)
  	          && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
  	  && !TREE_ASM_WRITTEN (decl) && !node->origin
*************** cgraph_optimize_function (node)
*** 255,260 ****
--- 262,269 ----
  {
    tree decl = node->decl;
  
+   /* optimize_inline_calls avoid inlining of current_function_decl.  */
+   current_function_decl = 0;
    if (flag_inline_trees)
      optimize_inline_calls (decl);
    if (node->nested)
*************** cgraph_expand_function (node)
*** 271,276 ****
--- 280,286 ----
       struct cgraph_node *node;
  {
    tree decl = node->decl;
+   struct cgraph_edge *e;
  
    announce_function (decl);
  
*************** cgraph_expand_function (node)
*** 280,320 ****
       via lang_expand_decl_stmt.  */
    (*lang_hooks.callgraph.expand_function) (decl);
  
!   /* When we decided to inline the function once, we never ever should
!      need to output it separately.  */
!   if (node->global.inline_once)
!     abort ();
!   if (!node->local.inline_many
!       || !node->callers)
      DECL_SAVED_TREE (decl) = NULL;
    current_function_decl = NULL;
  }
  
! 
! /* Expand all functions that must be output. 
!   
!    Attempt to topologically sort the nodes so function is output when
!    all called functions are already assembled to allow data to be
!    propagated accross the callgraph.  Use a stack to get smaller distance
!    between a function and it's callees (later we may choose to use a more
!    sophisticated algorithm for function reordering; we will likely want
!    to use subsections to make the output functions appear in top-down
!    order.  */
! 
! static void
! cgraph_expand_functions ()
  {
    struct cgraph_node *node, *node2;
-   struct cgraph_node **stack =
-     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
-   struct cgraph_node **order =
-     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
    int stack_size = 0;
    int order_pos = 0;
    struct cgraph_edge *edge, last;
-   int i;
  
!   cgraph_mark_functions_to_output ();
  
    /* We have to deal with cycles nicely, so use a depth first traversal
       output algorithm.  Ignore the fact that some functions won't need
--- 290,315 ----
       via lang_expand_decl_stmt.  */
    (*lang_hooks.callgraph.expand_function) (decl);
  
!   for (e = node->callers; e; e = e->next_caller)
!     if (e->inline_call)
!       break;
!   if (e)
      DECL_SAVED_TREE (decl) = NULL;
    current_function_decl = NULL;
  }
  
! /* Fill array order with all nodes with output flag set in the reverse
!    topological order.  */
! static int
! cgraph_postorder (struct cgraph_node **order)
  {
    struct cgraph_node *node, *node2;
    int stack_size = 0;
    int order_pos = 0;
    struct cgraph_edge *edge, last;
  
!   struct cgraph_node **stack =
!     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
  
    /* We have to deal with cycles nicely, so use a depth first traversal
       output algorithm.  Ignore the fact that some functions won't need
*************** cgraph_expand_functions ()
*** 360,487 ****
  	      }
  	  }
        }
!   for (i = order_pos - 1; i >= 0; i--)
      {
!       node = order[i];
!       if (node->output)
  	{
! 	  if (!node->reachable)
! 	    abort ();
! 	  node->output = 0;
! 	  cgraph_expand_function (node);
  	}
      }
    free (stack);
!   free (order);
  }
  
! /* Mark all local functions.
!    We can not use node->needed directly as it is modified during
!    execution of cgraph_optimize.  */
  
  static void
! cgraph_mark_local_functions ()
  {
    struct cgraph_node *node;
  
!   if (!quiet_flag)
!     fprintf (stderr, "\n\nMarking local functions:");
  
-   /* Figure out functions we want to assemble.  */
    for (node = cgraph_nodes; node; node = node->next)
      {
!       node->local.local = (!node->needed
! 		           && DECL_SAVED_TREE (node->decl)
! 			   && !DECL_COMDAT (node->decl)
! 		           && !TREE_PUBLIC (node->decl));
!       if (node->local.local)
! 	announce_function (node->decl);
      }
! }
  
! /* Decide what function should be inlined because they are invoked once
!    (so inlining won't result in duplication of the code).  */
  
! static void
! cgraph_mark_functions_to_inline_once ()
! {
!   struct cgraph_node *node, *node1;
  
    if (!quiet_flag)
      fprintf (stderr, "\n\nMarking functions to inline once:");
  
!   /* Now look for function called only once and mark them to inline.
!      From this point number of calls to given function won't grow.  */
!   for (node = cgraph_nodes; node; node = node->next)
      {
        if (node->callers && !node->callers->next_caller && !node->needed
! 	  && node->local.can_inline_once)
  	{
  	  bool ok = true;
  
  	  /* Verify that we won't duplicate the caller.  */
  	  for (node1 = node->callers->caller;
! 	       node1->local.inline_many
! 	       && node1->callers
! 	       && ok;
! 	       node1 = node1->callers->caller)
  	    if (node1->callers->next_caller || node1->needed)
  	      ok = false;
  	  if (ok)
  	    {
! 	      node->global.inline_once = true;
! 	      announce_function (node->decl);
  	    }
  	}
      }
  }
  
  
  /* Perform simple optimizations based on callgraph.  */
  
  void
  cgraph_optimize ()
  {
-   struct cgraph_node *node;
-   bool changed = true;
- 
    cgraph_mark_local_functions ();
  
!   cgraph_mark_functions_to_inline_once ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
      fprintf (stderr, "\n\nAssembling functions:");
  
!   /* Output everything.  
!      ??? Our inline heuristic may decide to not inline functions previously
!      marked as inlinable thus adding new function bodies that must be output.
!      Later we should move all inlining decisions to callgraph code to make
!      this impossible.  */
    cgraph_expand_functions ();
-   if (!quiet_flag)
-     fprintf (stderr, "\n\nAssembling functions that failed to inline:");
-   while (changed && !errorcount && !sorrycount)
-     {
-       changed = false;
-       for (node = cgraph_nodes; node; node = node->next)
- 	{
- 	  tree decl = node->decl;
- 	  if (!node->origin
- 	      && !TREE_ASM_WRITTEN (decl)
- 	      && DECL_SAVED_TREE (decl)
- 	      && !DECL_EXTERNAL (decl))
- 	    {
- 	      struct cgraph_edge *edge;
- 
- 	      for (edge = node->callers; edge; edge = edge->next_caller)
- 		if (TREE_ASM_WRITTEN (edge->caller->decl))
- 		  {
- 		    changed = true;
- 		    cgraph_expand_function (node);
- 		    break;
- 		  }
- 	    }
- 	}
-     }
  }
--- 355,818 ----
  	      }
  	  }
        }
!   free (stack);
!   return order_pos;
! }
! 
! #define INLINED_TIMES(node) ((size_t)(node)->aux)
! #define SET_INLINED_TIMES(node,times) ((node)->aux = (void *)(times))
! 
! /* Return list of nodes we decided to inline NODE into, set their output
!    flag and compute INLINED_TIMES. 
! 
!    We do simple backgracing to get INLINED_TIMES right.  This should not be
!    expensive as we limit amount of inlining.  Alternatively we may first
!    discover set of nodes, to topological sort on these and propagate
!    information.  */
! static int
! cgraph_inlined_into (struct cgraph_node *node, struct cgraph_node **array)
! {
!   int nfound = 0;
!   struct cgraph_edge **stack;
!   struct cgraph_edge *e, *e1;
!   int sp;
!   int i;
! 
!   /* Fast path: since we traverse in mostly topological order, we will likely
!      find no edges.  */
!   for (e = node->callers; e; e = e->next_caller)
!     if (e->inline_call)
!       break;
! 
!   if (!e)
!     return 0;
! 
!   /* Allocate stack for back-tracking up callgraph.  */
!   stack = xmalloc ((cgraph_n_nodes + 1) * sizeof (struct cgraph_edge));
!   sp = 0;
! 
!   /* Push the first edge on to the stack.  */
!   stack[sp++] = e;
! 
!   while (sp)
      {
!       struct cgraph_node *caller;
! 
!       /* Look at the edge on the top of the stack.  */
!       e = stack[sp - 1];
!       caller = e->caller;
! 
!       /* Check if the caller destination has been visited yet.  */
!       if (!caller->output)
  	{
! 	  array[nfound++] = e->caller;
! 	  /* Mark that we have visited the destination.  */
! 	  caller->output = true;
! 	  SET_INLINED_TIMES (caller, 0);
! 	}
!       SET_INLINED_TIMES (caller, INLINED_TIMES (caller) + 1);
! 
!       for (e1 = caller->callers; e1; e1 = e1->next_caller)
! 	if (e1->inline_call)
! 	  break;
!       if (e1)
! 	stack[sp++] = e1;
!       else
! 	{
! 	  while (true)
! 	    {
! 	      for (e1 = e->next_caller; e1; e1 = e1->next_caller)
! 		if (e1->inline_call)
! 		  break;
! 
! 	      if (e1)
! 		{
! 		  stack[sp - 1] = e1;
! 		  break;
! 		}
! 	      else
! 		{
! 		  sp--;
! 		  if (!sp)
! 		    break;
! 		  e = stack[sp - 1];
! 		}
! 	    }
  	}
      }
+ 
    free (stack);
! 
! 
!   if (!quiet_flag)
!     {
!       fprintf (stderr, "\nFound inline predecesors of ");
!       announce_function (node->decl);
!       fprintf (stderr, ":");
!       for (i = 0; i < nfound; i++)
! 	{
! 	  announce_function (array[i]->decl);
! 	  if (INLINED_TIMES (array[i]) != 1)
! 	    fprintf (stderr, " (%i times)", INLINED_TIMES (array[i]));
! 	}
!       fprintf (stderr, "\n");
!     }
! 
!   return nfound;
  }
  
! /* Estimate size of the function after inlining WHAT into TO.  */
! static int
! cgraph_estimate_size_after_inlining (int times,
! 				     struct cgraph_node *to,
! 				     struct cgraph_node *what)
! {
!   return to->global.insns + (what->global.insns - INSNS_PER_CALL * times);
! }
  
+ /* Update insn sizes after inlining WHAT into TO that is already inlined into
+    all nodes in INLINED array.  */
  static void
! cgraph_mark_inline (struct cgraph_node *to,
! 		    struct cgraph_node *what,
! 		    struct cgraph_node **inlined, int ninlined)
! {
!   int i;
!   int times = 0;
!   struct cgraph_edge *e;
! 
!   for (e = to->callees; e; e = e->next_callee)
!     if (e->callee == what)
!       {
! 	if (e->inline_call)
! 	  abort ();
! 	e->inline_call = true;
! 	times++;
!       }
!   if (!times)
!     abort ();
! 
!   to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
!   for (i = 0; i < ninlined; i++)
!     inlined[i]->global.insns =
!       cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) * times,
! 					   inlined[i], what);
! }
! 
! /* Return false when inlining WHAT into TO is not good idea as it would cause
!    too large growth of function bodies.  */
! static bool
! cgraph_check_inline_limits (struct cgraph_node *to,
! 			    struct cgraph_node *what,
! 			    struct cgraph_node **inlined, int ninlined)
! {
!   int i;
!   int times = 0;
!   struct cgraph_edge *e;
!   int newsize;
! 
!   for (e = to->callees; e; e = e->next_callee)
!     if (e->callee == what)
!       times++;
! 
!   to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
!   newsize = cgraph_estimate_size_after_inlining (times, to, what);
!   if (newsize > LARGE_FUNCTION_INSNS
!       && newsize > to->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
!     return false;
!   for (i = 0; i < ninlined; i++)
!     {
!       newsize =
! 	cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) *
! 					     times, inlined[i], what);
!       if (newsize > LARGE_FUNCTION_INSNS
! 	  && newsize >
! 	  inlined[i]->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
! 	return false;
!     }
!   return true;
! }
! 
! /* Return true when function N is small enought to be inlined.  */
! static bool
! cgraph_default_inline_p (struct cgraph_node *n)
! {
!   if (!DECL_INLINE (n->decl))
!     return false;
!   if (DID_INLINE_FUNC (n->decl))
!     return n->global.insns < MAX_INLINE_INSNS_AUTO;
!   else
!     return n->global.insns < MAX_INLINE_INSNS_SINGLE;
! }
! 
! static void
! cgraph_decide_inlining (void)
  {
    struct cgraph_node *node;
+   int nnodes;
+   struct cgraph_node **order =
+     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+   struct cgraph_node **inlined =
+     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+   int ninlined;
+   int i, y;
  
!   /* Cgraph_postorder ignore nodes with no node->output set.  */
!   for (node = cgraph_nodes; node; node = node->next)
!     {
!       node->global.insns = node->local.self_insns;
!       if (DECL_SAVED_TREE (node->decl))
! 	node->output = 1;
!     }
! 
!   nnodes = cgraph_postorder (order);
  
    for (node = cgraph_nodes; node; node = node->next)
+     node->output = 0;
+ 
+   /* In the first pass mark all always_inline edges.  Do this with a priority
+      so no our decisions makes this impossible.  */
+   if (!quiet_flag)
+     fprintf (stderr, "\n\nDeciding on always_inline functions:");
+   for (i = nnodes - 1; i >= 0; i--)
      {
!       struct cgraph_edge *e;
! 
!       node = order[i];
! 
!       for (e = node->callees; e; e = e->next_callee)
! 	if (e->callee->local.disgread_inline_limits)
! 	  break;
!       if (!e)
! 	continue;
!       ninlined = cgraph_inlined_into (order[i], inlined);
!       for (; e; e = e->next_callee)
! 	{
! 	  if (e->inline_call || !e->callee->local.disgread_inline_limits)
! 	    continue;
! 	  if (e->callee->output || e->callee == node)
! 	    {
! 	      current_function_decl = e->callee->decl;
! 	      warning_with_decl (node->decl,
! 				 "Always inline flag ignored in respect to `%s' to avoid cycle");
! 	      continue;
! 	    }
! 	  if (!quiet_flag)
! 	    {
! 	      fprintf (stderr, "Always inlining");
! 	      announce_function (e->callee->decl);
! 	      fprintf (stderr, " to");
! 	      announce_function (node->decl);
! 	      fprintf (stderr, "\n");
! 	    }
! 	  cgraph_mark_inline (node, e->callee, inlined, ninlined);
! 	}
!       for (y = 0; y < ninlined; y++)
! 	inlined[y]->output = 0;
      }
!   /* In the first pass mark all always_inline edges.  Do this with a priority
!      so no our decisions makes this impossible.  */
!   if (!quiet_flag)
!     fprintf (stderr, "\n\nDeciding on inlining:");
!   for (i = nnodes - 1; i >= 0; i--)
!     {
!       struct cgraph_edge *e;
  
!       node = order[i];
  
!       for (e = node->callees; e; e = e->next_callee)
! 	if (!e->inline_call && cgraph_default_inline_p (e->callee))
! 	  break;
!       if (!e)
! 	continue;
!       ninlined = cgraph_inlined_into (order[i], inlined);
!       for (; e; e = e->next_callee)
! 	{
! 	  if (e->inline_call || !cgraph_default_inline_p (e->callee))
! 	    continue;
! 	  if (e->callee == node)
! 	    continue;
! 	  if (e->callee->output)
! 	    {
! 	      if (warn_inline && DECL_INLINE (e->callee->decl))
! 		{
! 		  current_function_decl = e->callee->decl;
! 		  warning_with_decl (node->decl,
! 				     "Not inlining to `%s' to avoid cycle");
! 		}
! 	      continue;
! 	    }
! 	  if (!cgraph_check_inline_limits
! 	      (node, e->callee, inlined, ninlined))
! 	    {
! 	      if (!quiet_flag)
! 		{
! 		  fprintf (stderr, "Not inlining");
! 		  announce_function (e->callee->decl);
! 		  fprintf (stderr, " to");
! 		  announce_function (node->decl);
! 		  fprintf (stderr, "as inline limits exceeds\n");
! 		}
! 	      if (warn_inline && DECL_INLINE (e->callee->decl))
! 		{
! 		  current_function_decl = e->callee->decl;
! 		  warning_with_decl (node->decl,
! 				     "Not inlining to `%s' to avoid too large function bodies");
! 		}
! 	      continue;
! 	    }
! 	  if (!quiet_flag)
! 	    {
! 	      fprintf (stderr, "Inlining");
! 	      announce_function (e->callee->decl);
! 	      fprintf (stderr, " to");
! 	      announce_function (node->decl);
! 	      fprintf (stderr, "\n");
! 	    }
! 	  cgraph_mark_inline (node, e->callee, inlined, ninlined);
! 	}
!       for (y = 0; y < ninlined; y++)
! 	inlined[y]->output = 0;
!     }
  
    if (!quiet_flag)
      fprintf (stderr, "\n\nMarking functions to inline once:");
  
!   for (i = nnodes - 1; i >= 0; i--)
      {
+       node = order[i];
+ 
        if (node->callers && !node->callers->next_caller && !node->needed
! 	  && node->local.inlinable && !node->callers->inline_call)
  	{
  	  bool ok = true;
+ 	  struct cgraph_node *node1;
  
  	  /* Verify that we won't duplicate the caller.  */
  	  for (node1 = node->callers->caller;
! 	       node1->callers && node1->callers->inline_call
! 	       && ok; node1 = node1->callers->caller)
  	    if (node1->callers->next_caller || node1->needed)
  	      ok = false;
  	  if (ok)
  	    {
! 	      ninlined = cgraph_inlined_into (node, inlined);
! 	      if (!cgraph_check_inline_limits
! 		  (node->callers->caller, node, inlined, ninlined))
! 		{
! 		  if (!quiet_flag)
! 		    {
! 		      fprintf (stderr, "Not inlining once");
! 		      announce_function (node->decl);
! 		      fprintf (stderr, " to");
! 		      announce_function (node1->decl);
! 		      fprintf (stderr, "as inline limits exceeds\n");
! 		    }
! 		}
! 	      else
! 		{
! 		  cgraph_mark_inline (node->callers->caller, node, inlined,
! 				      ninlined);
! 		  announce_function (node->decl);
! 		}
! 	      for (y = 0; y < ninlined; y++)
! 		inlined[y]->output = 0;
  	    }
  	}
      }
+ 
+   free (order);
+   free (inlined);
+ }
+ 
+ /* Return true when CALLER_DECL should be inlined into CALLEE_DECL.  */
+ bool
+ cgraph_inline_p (tree caller_decl, tree callee_decl)
+ {
+   struct cgraph_node *caller = cgraph_node (caller_decl);
+   struct cgraph_node *callee = cgraph_node (callee_decl);
+   struct cgraph_edge *e;
+ 
+   for (e = caller->callees; e; e = e->next_callee)
+     if (e->callee == callee)
+       return e->inline_call;
+   abort ();
+ }
+ 
+ /* Expand all functions that must be output. 
+   
+    Attempt to topologically sort the nodes so function is output when
+    all called functions are already assembled to allow data to be
+    propagated accross the callgraph.  Use a stack to get smaller distance
+    between a function and it's callees (later we may choose to use a more
+    sophisticated algorithm for function reordering; we will likely want
+    to use subsections to make the output functions appear in top-down
+    order.  */
+ 
+ static void
+ cgraph_expand_functions ()
+ {
+   struct cgraph_node *node;
+   struct cgraph_node **order =
+     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+   int order_pos = 0;
+   int i;
+ 
+   cgraph_mark_functions_to_output ();
+ 
+   order_pos = cgraph_postorder (order);
+ 
+   for (i = order_pos - 1; i >= 0; i--)
+     {
+       node = order[i];
+       if (node->output)
+ 	{
+ 	  if (!node->reachable)
+ 	    abort ();
+ 	  node->output = 0;
+ 	  cgraph_expand_function (node);
+ 	}
+     }
+   free (order);
  }
  
+ /* Mark all local functions.
+    We can not use node->needed directly as it is modified during
+    execution of cgraph_optimize.  */
+ 
+ static void
+ cgraph_mark_local_functions ()
+ {
+   struct cgraph_node *node;
+ 
+   if (!quiet_flag)
+     fprintf (stderr, "\n\nMarking local functions:");
+ 
+   /* Figure out functions we want to assemble.  */
+   for (node = cgraph_nodes; node; node = node->next)
+     {
+       node->local.local = (!node->needed
+ 		           && DECL_SAVED_TREE (node->decl)
+ 			   && !DECL_COMDAT (node->decl)
+ 		           && !TREE_PUBLIC (node->decl));
+       if (node->local.local)
+ 	announce_function (node->decl);
+     }
+ }
  
  /* Perform simple optimizations based on callgraph.  */
  
  void
  cgraph_optimize ()
  {
    cgraph_mark_local_functions ();
  
!   cgraph_decide_inlining ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
      fprintf (stderr, "\n\nAssembling functions:");
  
!   /* Output everything.   */
    cgraph_expand_functions ();
  }
*** tree-inline.c.in	Mon Jun 23 00:50:08 2003
--- tree-inline.c	Mon Jun 23 01:24:02 2003
*************** typedef struct inline_data
*** 106,111 ****
--- 106,112 ----
    htab_t tree_pruner;
    /* Decl of function we are inlining into.  */
    tree decl;
+   tree current_decl;
  } inline_data;
  
  /* Prototypes.  */
*************** expand_call_inline (tp, walk_subtrees, d
*** 1204,1212 ****
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if ((!flag_unit_at_a_time || !DECL_SAVED_TREE (fn)
!        || !cgraph_global_info (fn)->inline_once)
!       && !inlinable_function_p (fn, id, 0))
      {
        if (warn_inline && DECL_INLINE (fn))
  	{
--- 1205,1213 ----
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if (!DECL_SAVED_TREE (fn)
!       || (flag_unit_at_a_time && !cgraph_inline_p (id->current_decl, fn))
!       || (!flag_unit_at_a_time && !inlinable_function_p (fn, id, 0)))
      {
        if (warn_inline && DECL_INLINE (fn))
  	{
*************** expand_call_inline (tp, walk_subtrees, d
*** 1451,1457 ****
      }
  
    /* Recurse into the body of the just inlined function.  */
!   expand_calls_inline (inlined_body, id);
    VARRAY_POP (id->fns);
  
    /* If we've returned to the top level, clear out the record of how
--- 1452,1463 ----
      }
  
    /* Recurse into the body of the just inlined function.  */
!   {
!     tree old_decl = id->current_decl;
!     id->current_decl = fn;
!     expand_calls_inline (inlined_body, id);
!     id->current_decl = old_decl;
!   }
    VARRAY_POP (id->fns);
  
    /* If we've returned to the top level, clear out the record of how
*************** optimize_inline_calls (fn)
*** 1497,1502 ****
--- 1503,1509 ----
    memset (&id, 0, sizeof (id));
  
    id.decl = fn;
+   id.current_decl = fn;
    /* Don't allow recursion into FN.  */
    VARRAY_TREE_INIT (id.fns, 32, "fns");
    VARRAY_PUSH_TREE (id.fns, fn);
Follow-Ups:
- Re: [RFC/patch] Callgraph based inlining heuristics
  - From: Gerald Pfeifer
- Re: [RFC/patch] Callgraph based inlining heuristics
  - From: Steven Bosscher
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]