This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Inliner heuristics updated


Hi,
I am about to commit the patch in the attached form (with testcase using
Janis' new tree-prof scripts, thanks!).
I did some extra testing on SPEC, Gerald testcase and tramp3d.  For SPEC
the new inliner always brings slightly better results at slightly
smaller compile time, important differences are only in the profile
driven runs I sent last time.
Gerald testcase scores as follows:

[0]: /usr/bin/time dl-mainline-O3
[1]: /usr/bin/time dl-newinliner-O3
[2]: /usr/bin/time dl-prof

                     |     [0]      |     [1]      |     [2]      |
---------------------+--------------+--------------+--------------+
      STRATCOMP1-ALL |  2.41 (0.00) |  2.34 (0.00) |  2.23 (0.01) |
   STRATCOMP-770.2-Q |  0.40 (0.00) |  0.41 (0.00) |  0.40 (0.00) |
               2QBF1 | 13.58 (0.03) | 13.46 (0.05) | 11.58 (0.03) |
          PRIMEIMPL2 |  5.33 (0.03) |  5.04 (0.04) |  4.74 (0.01) |
       3COL-SIMPLEX1 |  5.52 (0.74) |  4.96 (0.26) |  4.41 (0.29) |
        3COL-RANDOM1 |  9.18 (0.05) |  9.03 (0.05) |  8.90 (0.08) |
          HP-RANDOM1 | 11.30 (0.12) | 11.09 (0.13) | 11.39 (0.08) |
       HAMCYCLE-FREE |  0.76 (0.00) |  0.69 (0.00) |  0.61 (0.00) |
             DECOMP2 |  8.89 (0.05) |  8.86 (0.05) |  9.06 (0.14) |
        BW-P5-nopush |  6.20 (0.03) |  6.23 (0.14) |  6.14 (0.01) |
       BW-P5-pushbin |  5.22 (0.01) |  5.19 (0.01) |  5.18 (0.00) |
     BW-P5-nopushbin |  1.52 (0.01) |  1.51 (0.00) |  1.52 (0.00) |
        HANOI-Towers |  3.10 (0.02) |  3.07 (0.06) |  3.01 (0.01) |
              RAMSEY |  4.33 (0.01) |  4.33 (0.01) |  4.33 (0.03) |
             CRISTAL |  7.17 (0.13) |  7.06 (0.02) |  7.34 (0.01) |
           21-QUEENS |  5.02 (0.07) |  5.00 (0.07) |  4.70 (0.04) |
   MSTDir[V=13,A=40] |  7.84 (0.02) |  7.46 (0.02) |  7.47 (0.02) |
   MSTDir[V=15,A=40] |  7.83 (0.01) |  7.46 (0.03) |  7.47 (0.02) |
 MSTUndir[V=13,A=40] |  4.31 (0.01) |  4.09 (0.01) |  4.41 (0.02) |
         TIMETABLING |  6.00 (0.07) |  6.03 (0.06) |  5.17 (0.04) |
---------------------+--------------+--------------+--------------+

There actually is regression on tramp3d compiled with -O3 (-O2 scores
are identical). Mainline needs  4s per iteration, new inliner 5.4s per
iteration, however with profile feedback it drops to 3.2s.  Given the
quite positive evidence from other benchmarks, I am going to commit it
anyway.  The 5.4s score is the same as mainline had before cfg inliner
was applied and it is possible to improve it by small shitfs in inliner
parameters.  I would welcome any help on figuring out what is really
going on here tought.

There is surely quite a fair amount of room for improvement in the
inliner cost scheme (we might consider taking into account constant
arguments and such), but I will leave that for incremental
experimenting.

Honza

Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1490
diff -c -3 -p -r1.1490 Makefile.in
*** Makefile.in	24 May 2005 14:05:59 -0000	1.1490
--- Makefile.in	25 May 2005 11:42:10 -0000
*************** cgraphunit.o : cgraphunit.c $(CONFIG_H) 
*** 2064,2070 ****
  ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) 
  ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
     $(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
!    $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h
  coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
     $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
     function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
--- 2064,2071 ----
  ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) 
  ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
     $(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
!    $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
!    $(COVERAGE_H)
  coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
     $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
     function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.72
diff -c -3 -p -r1.72 cgraph.c
*** cgraph.c	19 May 2005 10:38:37 -0000	1.72
--- cgraph.c	25 May 2005 11:42:10 -0000
*************** cgraph_create_node (void)
*** 169,174 ****
--- 169,175 ----
    if (cgraph_nodes)
      cgraph_nodes->previous = node;
    node->previous = NULL;
+   node->global.estimated_growth = INT_MIN;
    cgraph_nodes = node;
    cgraph_n_nodes++;
    return node;
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.53
diff -c -3 -p -r1.53 cgraph.h
*** cgraph.h	19 May 2005 10:38:37 -0000	1.53
--- cgraph.h	25 May 2005 11:42:10 -0000
*************** struct cgraph_global_info GTY(())
*** 69,74 ****
--- 69,77 ----
    /* Estimated size of the function after inlining.  */
    int insns;
  
+   /* Estimated growth after inlining.  INT_MIN if not computed.  */
+   int estimated_growth;
+ 
    /* Set iff the function has been inlined at least once.  */
    bool inlined;
  };
Index: ipa-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ipa-inline.c,v
retrieving revision 2.4
diff -c -3 -p -r2.4 ipa-inline.c
*** ipa-inline.c	19 May 2005 10:38:38 -0000	2.4
--- ipa-inline.c	25 May 2005 11:42:10 -0000
*************** Software Foundation, 59 Temple Place - S
*** 78,89 ****
--- 78,92 ----
  #include "fibheap.h"
  #include "intl.h"
  #include "tree-pass.h"
+ #include "coverage.h"
  
  /* Statistics we collect about inlining algorithm.  */
  static int ncalls_inlined;
  static int nfunctions_inlined;
  static int initial_insns;
  static int overall_insns;
+ static int max_insns;
+ static gcov_type max_count;
  
  /* Estimate size of the function after inlining WHAT into TO.  */
  
*************** static int
*** 91,102 ****
  cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
  				     struct cgraph_node *what)
  {
!   tree fndecl = what->decl;
!   tree arg;
    int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
    for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
      call_insns += estimate_move_cost (TREE_TYPE (arg));
!   return (what->global.insns - call_insns) * times + to->global.insns;
  }
  
  /* E is expected to be an edge being inlined.  Clone destination node of
--- 94,108 ----
  cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
  				     struct cgraph_node *what)
  {
!   int size;
!   tree fndecl = what->decl, arg;
    int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
+ 
    for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
      call_insns += estimate_move_cost (TREE_TYPE (arg));
!   size = (what->global.insns - call_insns) * times + to->global.insns;
!   gcc_assert (size >= 0);
!   return size;
  }
  
  /* E is expected to be an edge being inlined.  Clone destination node of
*************** cgraph_estimate_growth (struct cgraph_no
*** 209,214 ****
--- 215,222 ----
  {
    int growth = 0;
    struct cgraph_edge *e;
+   if (node->global.estimated_growth != INT_MIN)
+     return node->global.estimated_growth;
  
    for (e = node->callers; e; e = e->next_caller)
      if (e->inline_failed)
*************** cgraph_estimate_growth (struct cgraph_no
*** 221,226 ****
--- 229,235 ----
    if (!node->needed && !DECL_EXTERNAL (node->decl))
      growth -= node->global.insns;
  
+   node->global.estimated_growth = growth;
    return growth;
  }
  
*************** cgraph_recursive_inlining_p (struct cgra
*** 298,349 ****
    return recursive;
  }
  
! /* Recompute heap nodes for each of callees.  */
  static void
! update_callee_keys (fibheap_t heap, struct fibnode **heap_node,
! 		    struct cgraph_node *node)
  {
    struct cgraph_edge *e;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed && heap_node[e->callee->uid])
!       fibheap_replace_key (heap, heap_node[e->callee->uid],
! 			   cgraph_estimate_growth (e->callee));
      else if (!e->inline_failed)
!       update_callee_keys (heap, heap_node, e->callee);
  }
  
! /* Enqueue all recursive calls from NODE into queue linked via aux pointers
!    in between FIRST and LAST.  WHERE is used for bookkeeping while looking
!    int calls inlined within NODE.  */
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			struct cgraph_edge **first, struct cgraph_edge **last)
  {
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	if (!*first)
! 	  *first = e;
! 	else
! 	  (*last)->aux = e;
! 	*last = e;
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, first, last);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
! static void
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   struct cgraph_edge *first_call = NULL, *last_call = NULL;
!   struct cgraph_edge *last_in_current_depth;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
--- 307,451 ----
    return recursive;
  }
  
! /* Return true if the call can be hot.  */
! static bool
! cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
! {
!   if (profile_info && flag_branch_probabilities
!       && (edge->count
! 	  <= profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
!     return false;
!   return true;
! }
! 
! /* A cost model driving the inlining heuristics in a way so the edges with
!    smallest badness are inlined first.  After each inlining is performed
!    the costs of all caller edges of nodes affected are recompted so the
!    metrics may accurately depend on values such as number of inlinable callers
!    of the function or function body size.
! 
!    For the moment we use estimated growth caused by inlining callee into all
!    it's callers for driving the inlining but once we have loop depth or
!    frequency information readilly available we should do better.
! 
!    With profiling we use number of executions of each edge to drive the cost.
!    We also should distinguish hot and cold calls where the cold calls are
!    inlined into only when code size is overall improved.  
!    
!    Value INT_MAX can be returned to prevent function from being inlined.
!    */
! 
! static int
! cgraph_edge_badness (struct cgraph_edge *edge)
! {
!   if (max_count)
!     {
!       int growth =
! 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
!       growth -= edge->caller->global.insns;
! 
!       /* Always preffer inlining saving code size.  */
!       if (growth <= 0)
! 	return INT_MIN - growth;
!       return ((int)((double)edge->count * INT_MIN / max_count)) / growth;
!     }
!   else
!   {
!     int nest = MIN (edge->loop_nest, 8);
!     int badness = cgraph_estimate_growth (edge->callee) * 256;
! 		    
!     badness >>= nest;
! 
!     /* Make recursive inlining happen always after other inlining is done.  */
!     if (cgraph_recursive_inlining_p (edge->caller, edge->callee, NULL))
!       return badness + 1;
!     else
!       return badness;
!   }
! }
! 
! /* Recompute heap nodes for each of caller edge.  */
! 
  static void
! update_caller_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
! {
!   struct cgraph_edge *edge;
! 
!   if (!node->local.inlinable || node->local.disregard_inline_limits
!       || node->global.inlined_to)
!     return;
!   if (bitmap_bit_p (updated_nodes, node->uid))
!     return;
!   bitmap_set_bit (updated_nodes, node->uid);
! 
!   for (edge = node->callers; edge; edge = edge->next_caller)
!     if (edge->inline_failed)
!       {
! 	int badness = cgraph_edge_badness (edge);
! 	if (edge->aux)
! 	  {
! 	    fibnode_t n = edge->aux;
! 	    gcc_assert (n->data == edge);
! 	    if (n->key == badness)
! 	      continue;
! 
! 	    /* fibheap_replace_key only increase the keys.  */
! 	    if (fibheap_replace_key (heap, n, badness))
! 	      continue;
! 	    fibheap_delete_node (heap, edge->aux);
! 	  }
! 	edge->aux = fibheap_insert (heap, badness, edge);
!       }
! }
! 
! /* Recompute heap nodes for each of caller edges of each of callees.  */
! 
! static void
! update_callee_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
  {
    struct cgraph_edge *e;
+   node->global.estimated_growth = INT_MIN;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed)
!       update_caller_keys (heap, e->callee, updated_nodes);
      else if (!e->inline_failed)
!       update_callee_keys (heap, e->callee, updated_nodes);
  }
  
! /* Enqueue all recursive calls from NODE into priority queue depending on
!    how likely we want to recursivly inline the call.  */
! 
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			fibheap_t heap)
  {
+   static int priority;
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	/* FIXME: Once counts and frequencies are available we should drive the
! 	   order by these.  For now force the order to be simple queue since
! 	   we get order dependent on recursion depth for free by this.  */
!         fibheap_insert (heap, priority++, e);
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, heap);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
! 
! static bool
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   fibheap_t heap;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
*************** cgraph_decide_recursive_inlining (struct
*** 358,371 ****
    /* Make sure that function is small enough to be considered for inlining.  */
    if (!max_depth
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return;
!   lookup_recursive_calls (node, node, &first_call, &last_call);
!   if (!first_call)
!     return;
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "\nPerforming recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
--- 460,477 ----
    /* Make sure that function is small enough to be considered for inlining.  */
    if (!max_depth
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return false;
!   heap = fibheap_new ();
!   lookup_recursive_calls (node, node, heap);
!   if (fibheap_empty (heap))
!     {
!       fibheap_delete (heap);
!       return false;
!     }
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "  Performing recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
*************** cgraph_decide_recursive_inlining (struct
*** 376,407 ****
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   last_in_current_depth = last_call;
!   while (first_call
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = first_call;
  
!       first_call = first_call->aux;
!       curr->aux = NULL;
  
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, &first_call, &last_call);
! 
!       if (last_in_current_depth
! 	  && ++depth >= max_depth)
! 	break;
        n++;
      }
  
!   /* Cleanup queue pointers.  */
!   while (first_call)
!     {
!       struct cgraph_edge *next = first_call->aux;
!       first_call->aux = NULL;
!       first_call = next;
!     }
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
--- 482,511 ----
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   while (!fibheap_empty (heap)
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = fibheap_extract_min (heap);
!       struct cgraph_node *node;
  
!       depth = 0;
!       for (node = curr->caller;
! 	   node; node = node->global.inlined_to)
! 	if (node->decl == curr->callee->decl)
! 	  depth++;
!       if (depth > max_depth)
! 	continue;
  
+       if (dump_file)
+ 	fprintf (dump_file, 
+ 		 "   Inlining call of depth %i\n", depth);
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, heap);
        n++;
      }
  
!   fibheap_delete (heap);
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
*************** cgraph_decide_recursive_inlining (struct
*** 415,420 ****
--- 519,525 ----
      if (node->global.inlined_to == master_clone)
        cgraph_remove_node (node);
    cgraph_remove_node (master_clone);
+   return true;
  }
  
  /* Set inline_failed for all callers of given function to REASON.  */
*************** static void
*** 442,452 ****
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
    fibheap_t heap = fibheap_new ();
!   struct fibnode **heap_node =
!     xcalloc (cgraph_max_uid, sizeof (struct fibnode *));
!   int max_insns = ((HOST_WIDEST_INT) initial_insns
! 		   * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
  
    /* Put all inline candidates into the heap.  */
  
--- 547,558 ----
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
+   struct cgraph_edge *edge;
    fibheap_t heap = fibheap_new ();
!   bitmap updated_nodes = BITMAP_ALLOC (NULL);
! 
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
  
    /* Put all inline candidates into the heap.  */
  
*************** cgraph_decide_inlining_of_small_function
*** 455,541 ****
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
  
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
-       heap_node[node->uid] =
- 	fibheap_insert (heap, cgraph_estimate_growth (node), node);
-     }
  
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
!   while (overall_insns <= max_insns && (node = fibheap_extract_min (heap)))
      {
-       struct cgraph_edge *e, *next;
        int old_insns = overall_insns;
  
-       heap_node[node->uid] = NULL;
        if (dump_file)
- 	fprintf (dump_file, 
- 		 "\nConsidering %s with %i insns\n"
- 		 " Estimated growth is %+i insns.\n",
- 		 cgraph_node_name (node), node->global.insns,
- 		 cgraph_estimate_growth (node));
-       if (!cgraph_default_inline_p (node))
  	{
! 	  cgraph_set_inline_failed (node,
! 	    N_("--param max-inline-insns-single limit reached after inlining into the callee"));
! 	  continue;
  	}
!       for (e = node->callers; e; e = next)
  	{
! 	  next = e->next_caller;
! 	  if (e->inline_failed)
  	    {
! 	      struct cgraph_node *where;
! 
! 	      if (cgraph_recursive_inlining_p (e->caller, e->callee,
! 				      	       &e->inline_failed)
! 		  || !cgraph_check_inline_limits (e->caller, e->callee,
! 			  			  &e->inline_failed))
! 		{
! 		  if (dump_file)
! 		    fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			     cgraph_node_name (e->caller), e->inline_failed);
! 		  continue;
! 		}
! 	      next = cgraph_mark_inline (e);
! 	      where = e->caller;
! 	      if (where->global.inlined_to)
! 		where = where->global.inlined_to;
! 
! 	      if (heap_node[where->uid])
! 		fibheap_replace_key (heap, heap_node[where->uid],
! 				     cgraph_estimate_growth (where));
! 
  	      if (dump_file)
! 		fprintf (dump_file, 
! 			 " Inlined into %s which now has %i insns.\n",
! 			 cgraph_node_name (e->caller),
! 			 e->caller->global.insns);
  	    }
  	}
  
!       cgraph_decide_recursive_inlining (node);
! 
!       /* Similarly all functions called by the function we just inlined
!          are now called more times; update keys.  */
!       update_callee_keys (heap, heap_node, node);
  
        if (dump_file)
  	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((node = fibheap_extract_min (heap)) != NULL)
!     if (!node->local.disregard_inline_limits)
!       cgraph_set_inline_failed (node, N_("--param inline-unit-growth limit reached"));
    fibheap_delete (heap);
!   free (heap_node);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
--- 561,721 ----
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
+       if (dump_file)
+ 	fprintf (dump_file, "Considering inline candidate %s.\n", cgraph_node_name (node));
  
+       node->global.estimated_growth = INT_MIN;
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
  
!       for (edge = node->callers; edge; edge = edge->next_caller)
! 	if (edge->inline_failed)
! 	  {
! 	    gcc_assert (!edge->aux);
! 	    edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge), edge);
! 	  }
!     }
!   while (overall_insns <= max_insns && (edge = fibheap_extract_min (heap)))
      {
        int old_insns = overall_insns;
+       struct cgraph_node *where;
+       int growth =
+ 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
+ 
+       growth -= edge->caller->global.insns;
  
        if (dump_file)
  	{
! 	  fprintf (dump_file, 
! 		   "\nConsidering %s with %i insns to be inlined into %s\n"
! 		   " Estimated growth after inlined into all callees is %+i insns.\n"
! 		   " Estimated badness is %i.\n",
! 		   cgraph_node_name (edge->callee),
! 		   edge->callee->global.insns,
! 		   cgraph_node_name (edge->caller),
! 		   cgraph_estimate_growth (edge->callee),
! 		   cgraph_edge_badness (edge));
! 	  if (edge->count)
! 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
  	}
!       gcc_assert (edge->aux);
!       edge->aux = NULL;
!       if (!edge->inline_failed)
! 	continue;
! 
!       /* When not having profile info ready we don't weight by any way the
!          possition of call in procedure itself.  This means if call of
! 	 function A from function B seems profitable to inline, the recursive
! 	 call of function A in inline copy of A in B will look profitable too
! 	 and we end up inlining until reaching maximal function growth.  This
! 	 is not good idea so prohibit the recursive inlining.
! 
! 	 ??? When the frequencies are taken into account we might not need this
! 	 restriction.   */
!       if (!max_count)
  	{
! 	  where = edge->caller;
! 	  while (where->global.inlined_to)
  	    {
! 	      if (where->decl == edge->callee->decl)
! 		break;
! 	      where = where->callers->caller;
! 	    }
! 	  if (where->global.inlined_to)
! 	    {
! 	      edge->inline_failed
! 		= (edge->callee->local.disregard_inline_limits ? N_("recursive inlining") : "");
  	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:Recursive inlining perfomed only for function itself.\n");
! 	      continue;
  	    }
  	}
  
!       if (!cgraph_maybe_hot_edge_p (edge) && growth > 0)
! 	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("call is unlikely");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
! 	  continue;
! 	}
!       if (!cgraph_default_inline_p (edge->callee))
! 	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("--param max-inline-insns-single limit reached after inlining into the callee");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
! 	  continue;
! 	}
!       if (cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				       &edge->inline_failed))
! 	{
! 	  where = edge->caller;
! 	  if (where->global.inlined_to)
! 	    where = where->global.inlined_to;
! 	  if (!cgraph_decide_recursive_inlining (where))
! 	    continue;
!           update_callee_keys (heap, where, updated_nodes);
! 	}
!       else
! 	{
! 	  if (!cgraph_check_inline_limits (edge->caller, edge->callee,
! 					   &edge->inline_failed))
! 	    {
! 	      if (dump_file)
! 		fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			 cgraph_node_name (edge->caller), edge->inline_failed);
! 	      continue;
! 	    }
! 	  cgraph_mark_inline_edge (edge);
!          update_callee_keys (heap, edge->callee, updated_nodes);
! 	}
!       where = edge->caller;
!       if (where->global.inlined_to)
! 	where = where->global.inlined_to;
! 
!       /* Our profitability metric can depend on local properties
! 	 such as number of inlinable calls and size of the function body.
! 	 After inlining these properties might change for the function we
! 	 inlined into (since it's body size changed) and for the functions
! 	 called by function we inlined (since number of it inlinable callers
! 	 might change).  */
!       update_caller_keys (heap, where, updated_nodes);
!       bitmap_clear (updated_nodes);
  
        if (dump_file)
  	fprintf (dump_file, 
+ 		 " Inlined into %s which now has %i insns.\n",
+ 		 cgraph_node_name (edge->caller),
+ 		 edge->caller->global.insns);
+       if (dump_file)
+ 	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((edge = fibheap_extract_min (heap)) != NULL)
!     {
!       gcc_assert (edge->aux);
!       edge->aux = NULL;
!       if (!edge->callee->local.disregard_inline_limits && edge->inline_failed
!           && !cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				           &edge->inline_failed))
! 	edge->inline_failed = N_("--param inline-unit-growth limit reached");
!     }
    fibheap_delete (heap);
!   BITMAP_FREE (updated_nodes);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
*************** cgraph_decide_inlining (void)
*** 551,559 ****
    int old_insns = 0;
    int i;
  
    for (node = cgraph_nodes; node; node = node->next)
!     initial_insns += node->local.self_insns;
    overall_insns = initial_insns;
  
    nnodes = cgraph_postorder (order);
  
--- 731,751 ----
    int old_insns = 0;
    int i;
  
+   timevar_push (TV_INLINE_HEURISTICS);
+   max_count = 0;
    for (node = cgraph_nodes; node; node = node->next)
!     {
!       struct cgraph_edge *e;
!       initial_insns += node->local.self_insns;
!       for (e = node->callees; e; e = e->next_callee)
! 	if (max_count < e->count)
! 	  max_count = e->count;
!     }
    overall_insns = initial_insns;
+   gcc_assert (!max_count || (profile_info && flag_branch_probabilities));
+ 
+   max_insns = ((HOST_WIDEST_INT) overall_insns
+ 	       * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
  
    nnodes = cgraph_postorder (order);
  
*************** cgraph_decide_inlining (void)
*** 668,673 ****
--- 860,866 ----
    /* We will never output extern functions we didn't inline. 
       ??? Perhaps we can prevent accounting of growth of external
       inline functions.  */
+ 
    cgraph_remove_unreachable_nodes (false, dump_file);
  
    if (dump_file)
*************** cgraph_decide_inlining (void)
*** 677,682 ****
--- 870,876 ----
  	     ncalls_inlined, nfunctions_inlined, initial_insns,
  	     overall_insns);
    free (order);
+   timevar_pop (TV_INLINE_HEURISTICS);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.90
diff -c -3 -p -r2.90 passes.c
*** passes.c	27 Apr 2005 21:35:20 -0000	2.90
--- passes.c	25 May 2005 11:42:10 -0000
*************** static void
*** 1427,1433 ****
  rest_of_clean_state (void)
  {
    rtx insn, next;
-   coverage_end_function ();
  
    /* It is very important to decompose the RTL instruction chain here:
       debug information keeps pointing into CODE_LABEL insns inside the function
--- 1427,1432 ----
Index: profile.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/profile.c,v
retrieving revision 1.157
diff -c -3 -p -r1.157 profile.c
*** profile.c	21 Apr 2005 09:17:19 -0000	1.157
--- profile.c	25 May 2005 11:42:10 -0000
*************** branch_prob (void)
*** 1141,1146 ****
--- 1141,1147 ----
    free_edge_list (el);
    if (flag_branch_probabilities)
      profile_status = PROFILE_READ;
+   coverage_end_function ();
  }
  
  /* Union find algorithm implementation for the basic blocks using
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.48
diff -c -3 -p -r1.48 timevar.def
*** timevar.def	17 May 2005 20:28:22 -0000	1.48
--- timevar.def	25 May 2005 11:42:10 -0000
*************** DEFTIMEVAR (TV_CPP		     , "preprocessin
*** 60,65 ****
--- 60,66 ----
  DEFTIMEVAR (TV_LEX		     , "lexical analysis")
  DEFTIMEVAR (TV_PARSE                 , "parser")
  DEFTIMEVAR (TV_NAME_LOOKUP           , "name lookup")
+ DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
  DEFTIMEVAR (TV_INTEGRATION           , "integration")
  DEFTIMEVAR (TV_TREE_GIMPLIFY	     , "tree gimplify")
  DEFTIMEVAR (TV_TREE_EH		     , "tree eh")
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.101
diff -c -3 -p -r2.101 tree-optimize.c
*** tree-optimize.c	19 May 2005 10:38:42 -0000	2.101
--- tree-optimize.c	25 May 2005 11:42:10 -0000
*************** init_tree_optimization_passes (void)
*** 377,387 ****
    NEXT_PASS (pass_build_cfg); 
    NEXT_PASS (pass_pre_expand);
    NEXT_PASS (pass_warn_function_return);
    *p = NULL;
  
    p = &all_passes;
    NEXT_PASS (pass_fixup_cfg);
-   NEXT_PASS (pass_tree_profile);
    NEXT_PASS (pass_init_datastructures);
    NEXT_PASS (pass_all_optimizations);
    NEXT_PASS (pass_warn_function_noreturn);
--- 377,387 ----
    NEXT_PASS (pass_build_cfg); 
    NEXT_PASS (pass_pre_expand);
    NEXT_PASS (pass_warn_function_return);
+   NEXT_PASS (pass_tree_profile);
    *p = NULL;
  
    p = &all_passes;
    NEXT_PASS (pass_fixup_cfg);
    NEXT_PASS (pass_init_datastructures);
    NEXT_PASS (pass_all_optimizations);
    NEXT_PASS (pass_warn_function_noreturn);
*************** tree_lowering_passes (tree fn)
*** 682,688 ****
  void
  ipa_passes (void)
  {
!    execute_pass_list (all_ipa_passes);
  }
  
  
--- 682,690 ----
  void
  ipa_passes (void)
  {
!   bitmap_obstack_initialize (NULL);
!   execute_pass_list (all_ipa_passes);
!   bitmap_obstack_release (NULL);
  }
  
  
Index: testsuite/gcc.dg/tree-prof/inliner-1.c
===================================================================
RCS file: testsuite/gcc.dg/tree-prof/inliner-1.c
diff -N testsuite/gcc.dg/tree-prof/inliner-1.c
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- testsuite/gcc.dg/tree-prof/inliner-1.c	25 May 2005 11:42:11 -0000
***************
*** 0 ****
--- 1,34 ----
+ /* { dg-options "-O2 -fdump-tree-optimized -ftree-based-profiling" } */
+ int a;
+ int b[100];
+ void abort (void);
+ 
+ inline void
+ cold_function ()
+ {
+   int i;
+   for (i = 0; i < 99; i++)
+     if (b[i] / (b[i+1] + 1))
+       abort ();
+ }
+ 
+ inline void
+ hot_function ()
+ {
+   int i;
+   for (i = 0; i < 99; i++)
+     if (b[i] / (b[i+1] + 1))
+       abort ();
+ }
+ 
+ main ()
+ {
+   if (a)
+     cold_function ();
+   else
+     hot_function ();
+ }
+ 
+ /* cold function should be inlined, while hot function should not.  */
+ /* { dg-final-use { scan-tree-dump "cold_function.*tail" "optimized"} } */
+ /* { dg-final-use { scan-tree-dump-not "hot_function.*tail" "optimized"} } */
Index: testsuite/gcc.dg/tree-prof/tree-prof.exp
===================================================================
RCS file: testsuite/gcc.dg/tree-prof/tree-prof.exp
diff -N testsuite/gcc.dg/tree-prof/tree-prof.exp
*** /dev/null	1 Jan 1970 00:00:00 -0000
--- testsuite/gcc.dg/tree-prof/tree-prof.exp	25 May 2005 11:42:11 -0000
***************
*** 0 ****
--- 1,53 ----
+ #   Copyright (C) 2001, 2002, 2004, 2005 Free Software Foundation, Inc.
+ 
+ # This program is free software; you can redistribute it and/or modify
+ # it under the terms of the GNU General Public License as published by
+ # the Free Software Foundation; either version 2 of the License, or
+ # (at your option) any later version.
+ # 
+ # This program is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ # GNU General Public License for more details.
+ # 
+ # You should have received a copy of the GNU General Public License
+ # along with this program; if not, write to the Free Software
+ # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  
+ 
+ # Test the functionality of programs compiled with profile-directed block
+ # ordering using -fprofile-arcs followed by -fbranch-probabilities.
+ 
+ load_lib target-supports.exp
+ 
+ # Some targets don't support tree profiling.
+ if { ![check_profiling_available "-ftree-based-profiling"] } {
+     return
+ }
+ 
+ # The procedures in profopt.exp need these parameters.
+ set tool gcc
+ set prof_ext "gcda gcno"
+ 
+ # Override the list defined in profopt.exp.
+ set PROFOPT_OPTIONS [list {}]
+ 
+ if $tracelevel then {
+     strace $tracelevel
+ }
+ 
+ # Load support procs.
+ load_lib profopt.exp
+ 
+ # These are globals used by profopt-execute.  The first is options
+ # needed to generate profile data, the second is options to use the
+ # profile data.
+ set profile_option "-ftree-based-profiling -fprofile-generate"
+ set feedback_option "-ftree-based-profiling -fprofile-use"
+ 
+ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
+     # If we're only testing specific files and this isn't one of them, skip it.
+     if ![runtest_file_p $runtests $src] then {
+         continue
+     }
+     profopt-execute $src
+ }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]