This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Rewrite of inlining heuristics


Hi,
this patch merge inlining heuristics rewrite from tree-profiling-branch.  Main
goal was to make profile driven inlining easy, but it affects the default
heuristics too.  Basically the old heuristics ordered all inline candidates
(functions) in priority queue depending on estimated costs of the inlining and
inlined as many of those as possible until reaching code growth limits.

The rewrite orders the call sites (cgraph edges) instead of functions, so
different calls of the same function can have different priorities.  Without
profiling current it takes into account loop depth that seems to have some
improvements on gap and Gerald's application benchmarks.

The profile driven inlining is in too.  It requires -ftree-based-profiling to
be active and if you want RTL land profile guided optimizations, you need
-fno-loop-optimize.  I plan to benchmark it and if this combination is going to
be a win (I hope so), I will change -fprofile-generate/-fprofile-use to imply
these flags.

I am still running some benchmarks, so I will post them later today or
tomorrow, but I am pusing out the patch so people might try it or comment on
before I commit it.

The SPEC scores are of -O3 are:


Size of binaries:
 164.gzip: Base: 71377 bytes
 164.gzip: Peak: 71377 bytes
 176.gcc: Base: 1886381 bytes
 176.gcc: Peak: 1830505 bytes
 181.mcf: Base: 30495 bytes
 181.mcf: Peak: 26879 bytes
 186.crafty: Base: 237732 bytes
 186.crafty: Peak: 237732 bytes
 197.parser: Base: 203166 bytes
 197.parser: Peak: 187870 bytes
 253.perlbmk: Base: 748770 bytes
 253.perlbmk: Peak: 747612 bytes
 254.gap: Base: 610495 bytes
 254.gap: Peak: 606207 bytes
 255.vortex: Base: 684446 bytes
 255.vortex: Peak: 684446 bytes
 256.bzip2: Base: 63018 bytes
 256.bzip2: Peak: 63018 bytes
 300.twolf: Base: 239745 bytes
 300.twolf: Peak: 239745 bytes
 =============================
 Total: Base: 4775625 bytes
 Total: Peak: 4695391 bytes

Total time for base compilation: 275 s
Total time for peak compilation: 270 s

GCC was configured as: configure --enable-threads=posix --enable-languages="c,c++,f95"
GCC bootstrap times for 'make -j1 bootstrap && make install':
Base compiler: 2519 s
Peak compiler: 2516 s

   164.gzip          1400   156       900    *     1400   156       899    *
   175.vpr                                   X                             X
   176.gcc           1100   101      1089    *     1100   101      1086    *
   181.mcf           1800   431       417    *     1800   432       417    *
   186.crafty        1000    63.5    1574    *     1000    63.7    1569    *
   197.parser        1800   267       673    *     1800   262       688    *
   252.eon                                   X                             X
   253.perlbmk       1800   166      1086    *     1800   165      1088    *
   254.gap           1100   128       856    *     1100   125       878    *
   255.vortex        1900   147      1294    *     1900   146      1304    *
   256.bzip2         1500   176       852    *     1500   176       853    *
   300.twolf         3000   324       927    *     3000   325       922    *
   Est. SPECint_base2000              915    
   Est. SPECint2000                                                 919    

Size of binaries:
 168.wupwise: Base: 45287 bytes
 168.wupwise: Peak: 45279 bytes
 171.swim: Base: 23114 bytes
 171.swim: Peak: 23106 bytes
 172.mgrid: Base: 27538 bytes
 172.mgrid: Peak: 27530 bytes
 173.applu: Base: 92251 bytes
 173.applu: Peak: 92243 bytes
 177.mesa: Base: 639871 bytes
 177.mesa: Peak: 635785 bytes
 179.art: Base: 31472 bytes
 179.art: Peak: 31472 bytes
 183.equake: Base: 42823 bytes
 183.equake: Peak: 42823 bytes
 187.facerec: Base: 83208 bytes
 187.facerec: Peak: 83200 bytes
 188.ammp: Base: 174610 bytes
 188.ammp: Peak: 174642 bytes
 189.lucas: Base: 84715 bytes
 189.lucas: Peak: 84707 bytes
 191.fma3d: Base: 1286905 bytes
 191.fma3d: Peak: 1286897 bytes
 200.sixtrack: Base: 1048817 bytes
 200.sixtrack: Peak: 1048809 bytes
 301.apsi: Base: 150880 bytes
 301.apsi: Peak: 150872 bytes
 =============================
 Total: Base: 3731491 bytes
 Total: Peak: 3727365 bytes

Total time for base compilation: 276 s
Total time for peak compilation: 272 s

   168.wupwise                               X                             X
   171.swim                                  X                             X
   172.mgrid                                 X                             X
   173.applu                                 X                             X
   177.mesa          1400   113      1238    *     1400   113      1235    *
   178.galgel                                X                             X
   179.art           2600   341       762    *     2600   344       756    *
   183.equake        1300   161       807    *     1300   161       808    *
   187.facerec                               X                             X
   188.ammp          2200   231       954    *     2200   231       954    *
   189.lucas                                 X                             X
   191.fma3d                                 X                             X
   200.sixtrack                              X                             X
   301.apsi                                  X                             X
   Est. SPECfp_base2000               923    
   Est. SPECfp2000                                                  921    


Of course the patch is interesting with profile feedback and IMA.
Unforutnately mainline is quite broken there so only few benchmarks actually
works and those are those more trivial and thus less interesting.  On
tree-profiling the speedups are somewhere in 2-3% at considerable code size
savings (and I am testing fix for the gzip failure, it is unrelated latent bug)

 164.gzip: Base: 80321 bytes
 181.mcf: Base: 33875 bytes
 181.mcf: Peak: 29779 bytes
 186.crafty: Base: 296462 bytes
 186.crafty: Peak: 267790 bytes
 197.parser: Base: 246457 bytes
 197.parser: Peak: 180857 bytes
 254.gap: Base: 665521 bytes
 254.gap: Peak: 559098 bytes
 256.bzip2: Base: 71182 bytes
 256.bzip2: Peak: 58766 bytes
 =============================
 Total: Base: 1393818 bytes
 Total: Peak: 1096290 bytes

Total time for base compilation: 380 s
Total time for peak compilation: 370 s

   164.gzip          1400   150       934    *                             
   175.vpr                                   X                             X
   176.gcc                                   X                             X
   181.mcf           1800   427       422    *     1800   427       422    *
   186.crafty        1000    64.6    1549    *     1000    62.2    1608    *
   197.parser        1800   222       810    *     1800   222       810    *
   252.eon                                   X                             X
   253.perlbmk                               X                             X
   254.gap           1100   126       875    *     1100   123       895    *
   255.vortex                                X                             X
   256.bzip2         1500   181       827    *     1500   182       826    *
   300.twolf                                 X                             X
   Est. SPECint_base2000              843    
   Est. SPECint2000                                                 835    

More benchmarks to come ;)

The testcase works only with Janis's tree-profiling testsuite bits.

/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-inline -ftree-based-profiling" } */
int a;
int b[100];
void abort (void);

inline void
cold_function ()
{
  int i;
  for (i = 0; i < 99; i++)
    if (b[i] / (b[i+1] + 1))
      abort ();
}

inline void
hot_function ()
{
  int i;
  for (i = 0; i < 99; i++)
    if (b[i] / (b[i+1] + 1))
      abort ();
}

main ()
{
  if (a)
    cold_function ();
  else
    hot_function ();
}

/* cold function should be inlined, while hot function should not.  
   Look for "cold_function () [tail call];" call statement not for the
   declaration or other apperances of the string in dump.  */
/* { dg-final-use { scan-tree-dump "cold_function .. .tail" "optimized"} } */
/* { dg-final-use { scan-tree-dump-not "hot_function .. .tail" "optimized"} } */

2005-05-24  Jan Hubicka  <jh@suse.cz>
	* Makefile.in (ipa-inline.o): Add COEVERAGE_H dependency.
	* cgraph.c (cgraph_create_node): Reset estimated_growth.
	* cgraph.h (cgraph_global_info): Add estimated_growth.
	* ipa-inline.c: Include coverage.h
	(max_insns, max_count): New static variables.
	(cgraph_estimate_size_after_inlining): Cache the result.
	(cgraph_estimate_growth):
	* passes.c (rest_of_clean_state): Kill coverage_end_function.
	* timevar.def (TV_INLINE_HEURISTICS): New timevar.
	* tree-optimize.c (init_tree_optimization_passes): Move profiling before
	inlining.
	(ipa_passes): Initialize bitmaps.
Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1489
diff -c -3 -p -r1.1489 Makefile.in
*** Makefile.in	20 May 2005 21:17:40 -0000	1.1489
--- Makefile.in	23 May 2005 09:49:04 -0000
*************** cgraphunit.o : cgraphunit.c $(CONFIG_H) 
*** 2063,2069 ****
  ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) 
  ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
     $(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
!    $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h
  coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
     $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
     function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
--- 2063,2070 ----
  ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) 
  ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
     $(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
!    $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
!    $(COVERAGE_H)
  coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
     $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
     function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.72
diff -c -3 -p -r1.72 cgraph.c
*** cgraph.c	19 May 2005 10:38:37 -0000	1.72
--- cgraph.c	23 May 2005 09:49:04 -0000
*************** cgraph_create_node (void)
*** 169,174 ****
--- 169,175 ----
    if (cgraph_nodes)
      cgraph_nodes->previous = node;
    node->previous = NULL;
+   node->global.estimated_growth = INT_MIN;
    cgraph_nodes = node;
    cgraph_n_nodes++;
    return node;
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.53
diff -c -3 -p -r1.53 cgraph.h
*** cgraph.h	19 May 2005 10:38:37 -0000	1.53
--- cgraph.h	23 May 2005 09:49:04 -0000
*************** struct cgraph_global_info GTY(())
*** 69,74 ****
--- 69,77 ----
    /* Estimated size of the function after inlining.  */
    int insns;
  
+   /* Estimated growth after inlining.  INT_MIN if not computed.  */
+   int estimated_growth;
+ 
    /* Set iff the function has been inlined at least once.  */
    bool inlined;
  };
Index: ipa-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ipa-inline.c,v
retrieving revision 2.4
diff -c -3 -p -r2.4 ipa-inline.c
*** ipa-inline.c	19 May 2005 10:38:38 -0000	2.4
--- ipa-inline.c	23 May 2005 09:49:04 -0000
*************** Software Foundation, 59 Temple Place - S
*** 78,89 ****
--- 78,92 ----
  #include "fibheap.h"
  #include "intl.h"
  #include "tree-pass.h"
+ #include "coverage.h"
  
  /* Statistics we collect about inlining algorithm.  */
  static int ncalls_inlined;
  static int nfunctions_inlined;
  static int initial_insns;
  static int overall_insns;
+ static int max_insns;
+ static gcov_type max_count;
  
  /* Estimate size of the function after inlining WHAT into TO.  */
  
*************** static int
*** 91,102 ****
  cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
  				     struct cgraph_node *what)
  {
!   tree fndecl = what->decl;
!   tree arg;
    int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
    for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
      call_insns += estimate_move_cost (TREE_TYPE (arg));
!   return (what->global.insns - call_insns) * times + to->global.insns;
  }
  
  /* E is expected to be an edge being inlined.  Clone destination node of
--- 94,108 ----
  cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
  				     struct cgraph_node *what)
  {
!   int size;
!   tree fndecl = what->decl, arg;
    int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
+ 
    for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
      call_insns += estimate_move_cost (TREE_TYPE (arg));
!   size = (what->global.insns - call_insns) * times + to->global.insns;
!   gcc_assert (size >= 0);
!   return size;
  }
  
  /* E is expected to be an edge being inlined.  Clone destination node of
*************** cgraph_estimate_growth (struct cgraph_no
*** 209,214 ****
--- 215,222 ----
  {
    int growth = 0;
    struct cgraph_edge *e;
+   if (node->global.estimated_growth != INT_MIN)
+     return node->global.estimated_growth;
  
    for (e = node->callers; e; e = e->next_caller)
      if (e->inline_failed)
*************** cgraph_estimate_growth (struct cgraph_no
*** 221,226 ****
--- 229,235 ----
    if (!node->needed && !DECL_EXTERNAL (node->decl))
      growth -= node->global.insns;
  
+   node->global.estimated_growth = growth;
    return growth;
  }
  
*************** cgraph_recursive_inlining_p (struct cgra
*** 298,349 ****
    return recursive;
  }
  
! /* Recompute heap nodes for each of callees.  */
  static void
! update_callee_keys (fibheap_t heap, struct fibnode **heap_node,
! 		    struct cgraph_node *node)
  {
    struct cgraph_edge *e;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed && heap_node[e->callee->uid])
!       fibheap_replace_key (heap, heap_node[e->callee->uid],
! 			   cgraph_estimate_growth (e->callee));
      else if (!e->inline_failed)
!       update_callee_keys (heap, heap_node, e->callee);
  }
  
! /* Enqueue all recursive calls from NODE into queue linked via aux pointers
!    in between FIRST and LAST.  WHERE is used for bookkeeping while looking
!    int calls inlined within NODE.  */
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			struct cgraph_edge **first, struct cgraph_edge **last)
  {
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	if (!*first)
! 	  *first = e;
! 	else
! 	  (*last)->aux = e;
! 	*last = e;
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, first, last);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
! static void
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   struct cgraph_edge *first_call = NULL, *last_call = NULL;
!   struct cgraph_edge *last_in_current_depth;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
--- 307,451 ----
    return recursive;
  }
  
! /* Return true if the call can be hot.  */
! static bool
! cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
! {
!   if (profile_info && flag_branch_probabilities
!       && (edge->count
! 	  <= profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
!     return false;
!   return true;
! }
! 
! /* A cost model driving the inlining heuristics in a way so the edges with
!    smallest badness are inlined first.  After each inlining is performed
!    the costs of all caller edges of nodes affected are recompted so the
!    metrics may accurately depend on values such as number of inlinable callers
!    of the function or function body size.
! 
!    For the moment we use estimated growth caused by inlining callee into all
!    it's callers for driving the inlining but once we have loop depth or
!    frequency information readilly available we should do better.
! 
!    With profiling we use number of executions of each edge to drive the cost.
!    We also should distinguish hot and cold calls where the cold calls are
!    inlined into only when code size is overall improved.  
!    
!    Value INT_MAX can be returned to prevent function from being inlined.
!    */
! 
! static int
! cgraph_edge_badness (struct cgraph_edge *edge)
! {
!   if (max_count)
!     {
!       int growth =
! 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
!       growth -= edge->caller->global.insns;
! 
!       /* Always preffer inlining saving code size.  */
!       if (growth <= 0)
! 	return INT_MIN - growth;
!       return ((int)((double)edge->count * INT_MIN / max_count)) / growth;
!     }
!   else
!   {
!     int nest = MIN (edge->loop_nest, 8);
!     int badness = cgraph_estimate_growth (edge->callee) * 256;
! 		    
!     badness >>= nest;
! 
!     /* Make recursive inlining happen always after other inlining is done.  */
!     if (cgraph_recursive_inlining_p (edge->caller, edge->callee, NULL))
!       return badness + 1;
!     else
!       return badness;
!   }
! }
! 
! /* Recompute heap nodes for each of caller edge.  */
! 
  static void
! update_caller_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
! {
!   struct cgraph_edge *edge;
! 
!   if (!node->local.inlinable || node->local.disregard_inline_limits
!       || node->global.inlined_to)
!     return;
!   if (bitmap_bit_p (updated_nodes, node->uid))
!     return;
!   bitmap_set_bit (updated_nodes, node->uid);
! 
!   for (edge = node->callers; edge; edge = edge->next_caller)
!     if (edge->inline_failed)
!       {
! 	int badness = cgraph_edge_badness (edge);
! 	if (edge->aux)
! 	  {
! 	    fibnode_t n = edge->aux;
! 	    gcc_assert (n->data == edge);
! 	    if (n->key == badness)
! 	      continue;
! 
! 	    /* fibheap_replace_key only increase the keys.  */
! 	    if (fibheap_replace_key (heap, n, badness))
! 	      continue;
! 	    fibheap_delete_node (heap, edge->aux);
! 	  }
! 	edge->aux = fibheap_insert (heap, badness, edge);
!       }
! }
! 
! /* Recompute heap nodes for each of caller edges of each of callees.  */
! 
! static void
! update_callee_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
  {
    struct cgraph_edge *e;
+   node->global.estimated_growth = INT_MIN;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed)
!       update_caller_keys (heap, e->callee, updated_nodes);
      else if (!e->inline_failed)
!       update_callee_keys (heap, e->callee, updated_nodes);
  }
  
! /* Enqueue all recursive calls from NODE into priority queue depending on
!    how likely we want to recursivly inline the call.  */
! 
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			fibheap_t heap)
  {
+   static int priority;
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	/* FIXME: Once counts and frequencies are available we should drive the
! 	   order by these.  For now force the order to be simple queue since
! 	   we get order dependent on recursion depth for free by this.  */
!         fibheap_insert (heap, priority++, e);
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, heap);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
! 
! static bool
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   fibheap_t heap;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
*************** cgraph_decide_recursive_inlining (struct
*** 358,371 ****
    /* Make sure that function is small enough to be considered for inlining.  */
    if (!max_depth
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return;
!   lookup_recursive_calls (node, node, &first_call, &last_call);
!   if (!first_call)
!     return;
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "\nPerforming recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
--- 460,477 ----
    /* Make sure that function is small enough to be considered for inlining.  */
    if (!max_depth
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return false;
!   heap = fibheap_new ();
!   lookup_recursive_calls (node, node, heap);
!   if (fibheap_empty (heap))
!     {
!       fibheap_delete (heap);
!       return false;
!     }
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "  Performing recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
*************** cgraph_decide_recursive_inlining (struct
*** 376,407 ****
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   last_in_current_depth = last_call;
!   while (first_call
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = first_call;
  
!       first_call = first_call->aux;
!       curr->aux = NULL;
  
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, &first_call, &last_call);
! 
!       if (last_in_current_depth
! 	  && ++depth >= max_depth)
! 	break;
        n++;
      }
  
!   /* Cleanup queue pointers.  */
!   while (first_call)
!     {
!       struct cgraph_edge *next = first_call->aux;
!       first_call->aux = NULL;
!       first_call = next;
!     }
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
--- 482,511 ----
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   while (!fibheap_empty (heap)
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = fibheap_extract_min (heap);
!       struct cgraph_node *node;
  
!       depth = 0;
!       for (node = curr->caller;
! 	   node; node = node->global.inlined_to)
! 	if (node->decl == curr->callee->decl)
! 	  depth++;
!       if (depth > max_depth)
! 	continue;
  
+       if (dump_file)
+ 	fprintf (dump_file, 
+ 		 "   Inlining call of depth %i\n", depth);
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, heap);
        n++;
      }
  
!   fibheap_delete (heap);
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
*************** cgraph_decide_recursive_inlining (struct
*** 415,420 ****
--- 519,525 ----
      if (node->global.inlined_to == master_clone)
        cgraph_remove_node (node);
    cgraph_remove_node (master_clone);
+   return true;
  }
  
  /* Set inline_failed for all callers of given function to REASON.  */
*************** static void
*** 442,452 ****
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
    fibheap_t heap = fibheap_new ();
!   struct fibnode **heap_node =
!     xcalloc (cgraph_max_uid, sizeof (struct fibnode *));
!   int max_insns = ((HOST_WIDEST_INT) initial_insns
! 		   * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
  
    /* Put all inline candidates into the heap.  */
  
--- 547,558 ----
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
+   struct cgraph_edge *edge;
    fibheap_t heap = fibheap_new ();
!   bitmap updated_nodes = BITMAP_ALLOC (NULL);
! 
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
  
    /* Put all inline candidates into the heap.  */
  
*************** cgraph_decide_inlining_of_small_function
*** 455,541 ****
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
  
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
-       heap_node[node->uid] =
- 	fibheap_insert (heap, cgraph_estimate_growth (node), node);
-     }
  
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
!   while (overall_insns <= max_insns && (node = fibheap_extract_min (heap)))
      {
-       struct cgraph_edge *e, *next;
        int old_insns = overall_insns;
  
-       heap_node[node->uid] = NULL;
        if (dump_file)
- 	fprintf (dump_file, 
- 		 "\nConsidering %s with %i insns\n"
- 		 " Estimated growth is %+i insns.\n",
- 		 cgraph_node_name (node), node->global.insns,
- 		 cgraph_estimate_growth (node));
-       if (!cgraph_default_inline_p (node))
  	{
! 	  cgraph_set_inline_failed (node,
! 	    N_("--param max-inline-insns-single limit reached after inlining into the callee"));
! 	  continue;
  	}
!       for (e = node->callers; e; e = next)
  	{
! 	  next = e->next_caller;
! 	  if (e->inline_failed)
  	    {
! 	      struct cgraph_node *where;
! 
! 	      if (cgraph_recursive_inlining_p (e->caller, e->callee,
! 				      	       &e->inline_failed)
! 		  || !cgraph_check_inline_limits (e->caller, e->callee,
! 			  			  &e->inline_failed))
! 		{
! 		  if (dump_file)
! 		    fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			     cgraph_node_name (e->caller), e->inline_failed);
! 		  continue;
! 		}
! 	      next = cgraph_mark_inline (e);
! 	      where = e->caller;
! 	      if (where->global.inlined_to)
! 		where = where->global.inlined_to;
! 
! 	      if (heap_node[where->uid])
! 		fibheap_replace_key (heap, heap_node[where->uid],
! 				     cgraph_estimate_growth (where));
! 
  	      if (dump_file)
! 		fprintf (dump_file, 
! 			 " Inlined into %s which now has %i insns.\n",
! 			 cgraph_node_name (e->caller),
! 			 e->caller->global.insns);
  	    }
  	}
  
!       cgraph_decide_recursive_inlining (node);
! 
!       /* Similarly all functions called by the function we just inlined
!          are now called more times; update keys.  */
!       update_callee_keys (heap, heap_node, node);
  
        if (dump_file)
  	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((node = fibheap_extract_min (heap)) != NULL)
!     if (!node->local.disregard_inline_limits)
!       cgraph_set_inline_failed (node, N_("--param inline-unit-growth limit reached"));
    fibheap_delete (heap);
!   free (heap_node);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
--- 561,721 ----
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
+       if (dump_file)
+ 	fprintf (dump_file, "Considering inline candidate %s.\n", cgraph_node_name (node));
  
+       node->global.estimated_growth = INT_MIN;
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
  
!       for (edge = node->callers; edge; edge = edge->next_caller)
! 	if (edge->inline_failed)
! 	  {
! 	    gcc_assert (!edge->aux);
! 	    edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge), edge);
! 	  }
!     }
!   while (overall_insns <= max_insns && (edge = fibheap_extract_min (heap)))
      {
        int old_insns = overall_insns;
+       struct cgraph_node *where;
+       int growth =
+ 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
+ 
+       growth -= edge->caller->global.insns;
  
        if (dump_file)
  	{
! 	  fprintf (dump_file, 
! 		   "\nConsidering %s with %i insns to be inlined into %s\n"
! 		   " Estimated growth after inlined into all callees is %+i insns.\n"
! 		   " Estimated badness is %i.\n",
! 		   cgraph_node_name (edge->callee),
! 		   edge->callee->global.insns,
! 		   cgraph_node_name (edge->caller),
! 		   cgraph_estimate_growth (edge->callee),
! 		   cgraph_edge_badness (edge));
! 	  if (edge->count)
! 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
  	}
!       gcc_assert (edge->aux);
!       edge->aux = NULL;
!       if (!edge->inline_failed)
! 	continue;
! 
!       /* When not having profile info ready we don't weight by any way the
!          possition of call in procedure itself.  This means if call of
! 	 function A from function B seems profitable to inline, the recursive
! 	 call of function A in inline copy of A in B will look profitable too
! 	 and we end up inlining until reaching maximal function growth.  This
! 	 is not good idea so prohibit the recursive inlining.
! 
! 	 ??? When the frequencies are taken into account we might not need this
! 	 restriction.   */
!       if (!max_count)
  	{
! 	  where = edge->caller;
! 	  while (where->global.inlined_to)
  	    {
! 	      if (where->decl == edge->callee->decl)
! 		break;
! 	      where = where->callers->caller;
! 	    }
! 	  if (where->global.inlined_to)
! 	    {
! 	      edge->inline_failed
! 		= (edge->callee->local.disregard_inline_limits ? N_("recursive inlining") : "");
  	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:Recursive inlining perfomed only for function itself.\n");
! 	      continue;
  	    }
  	}
  
!       if (!cgraph_maybe_hot_edge_p (edge) && growth > 0)
! 	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("call is unlikely");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
! 	  continue;
! 	}
!       if (!cgraph_default_inline_p (edge->callee))
! 	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("--param max-inline-insns-single limit reached after inlining into the callee");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
! 	  continue;
! 	}
!       if (cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				       &edge->inline_failed))
! 	{
! 	  where = edge->caller;
! 	  if (where->global.inlined_to)
! 	    where = where->global.inlined_to;
! 	  if (!cgraph_decide_recursive_inlining (where))
! 	    continue;
!           update_callee_keys (heap, where, updated_nodes);
! 	}
!       else
! 	{
! 	  if (!cgraph_check_inline_limits (edge->caller, edge->callee,
! 					   &edge->inline_failed))
! 	    {
! 	      if (dump_file)
! 		fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			 cgraph_node_name (edge->caller), edge->inline_failed);
! 	      continue;
! 	    }
! 	  cgraph_mark_inline_edge (edge);
!          update_callee_keys (heap, edge->callee, updated_nodes);
! 	}
!       where = edge->caller;
!       if (where->global.inlined_to)
! 	where = where->global.inlined_to;
! 
!       /* Our profitability metric can depend on local properties
! 	 such as number of inlinable calls and size of the function body.
! 	 After inlining these properties might change for the function we
! 	 inlined into (since it's body size changed) and for the functions
! 	 called by function we inlined (since number of it inlinable callers
! 	 might change).  */
!       update_caller_keys (heap, where, updated_nodes);
!       bitmap_clear (updated_nodes);
  
        if (dump_file)
  	fprintf (dump_file, 
+ 		 " Inlined into %s which now has %i insns.\n",
+ 		 cgraph_node_name (edge->caller),
+ 		 edge->caller->global.insns);
+       if (dump_file)
+ 	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((edge = fibheap_extract_min (heap)) != NULL)
!     {
!       gcc_assert (edge->aux);
!       edge->aux = NULL;
!       if (!edge->callee->local.disregard_inline_limits && edge->inline_failed
!           && !cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				           &edge->inline_failed))
! 	edge->inline_failed = N_("--param inline-unit-growth limit reached");
!     }
    fibheap_delete (heap);
!   BITMAP_FREE (updated_nodes);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
*************** cgraph_decide_inlining (void)
*** 551,559 ****
    int old_insns = 0;
    int i;
  
    for (node = cgraph_nodes; node; node = node->next)
!     initial_insns += node->local.self_insns;
    overall_insns = initial_insns;
  
    nnodes = cgraph_postorder (order);
  
--- 731,751 ----
    int old_insns = 0;
    int i;
  
+   timevar_push (TV_INLINE_HEURISTICS);
+   max_count = 0;
    for (node = cgraph_nodes; node; node = node->next)
!     {
!       struct cgraph_edge *e;
!       initial_insns += node->local.self_insns;
!       for (e = node->callees; e; e = e->next_callee)
! 	if (max_count < e->count)
! 	  max_count = e->count;
!     }
    overall_insns = initial_insns;
+   gcc_assert (!max_count || (profile_info && flag_branch_probabilities));
+ 
+   max_insns = ((HOST_WIDEST_INT) overall_insns
+ 	       * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
  
    nnodes = cgraph_postorder (order);
  
*************** cgraph_decide_inlining (void)
*** 668,673 ****
--- 860,866 ----
    /* We will never output extern functions we didn't inline. 
       ??? Perhaps we can prevent accounting of growth of external
       inline functions.  */
+ 
    cgraph_remove_unreachable_nodes (false, dump_file);
  
    if (dump_file)
*************** cgraph_decide_inlining (void)
*** 677,682 ****
--- 870,876 ----
  	     ncalls_inlined, nfunctions_inlined, initial_insns,
  	     overall_insns);
    free (order);
+   timevar_pop (TV_INLINE_HEURISTICS);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.90
diff -c -3 -p -r2.90 passes.c
*** passes.c	27 Apr 2005 21:35:20 -0000	2.90
--- passes.c	23 May 2005 09:49:05 -0000
*************** static void
*** 1427,1433 ****
  rest_of_clean_state (void)
  {
    rtx insn, next;
-   coverage_end_function ();
  
    /* It is very important to decompose the RTL instruction chain here:
       debug information keeps pointing into CODE_LABEL insns inside the function
--- 1427,1432 ----
Index: profile.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/profile.c,v
retrieving revision 1.157
diff -c -3 -p -r1.157 profile.c
*** profile.c	21 Apr 2005 09:17:19 -0000	1.157
--- profile.c	23 May 2005 09:49:05 -0000
*************** branch_prob (void)
*** 1141,1146 ****
--- 1141,1147 ----
    free_edge_list (el);
    if (flag_branch_probabilities)
      profile_status = PROFILE_READ;
+   coverage_end_function ();
  }
  
  /* Union find algorithm implementation for the basic blocks using
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.48
diff -c -3 -p -r1.48 timevar.def
*** timevar.def	17 May 2005 20:28:22 -0000	1.48
--- timevar.def	23 May 2005 09:49:05 -0000
*************** DEFTIMEVAR (TV_CPP		     , "preprocessin
*** 60,65 ****
--- 60,66 ----
  DEFTIMEVAR (TV_LEX		     , "lexical analysis")
  DEFTIMEVAR (TV_PARSE                 , "parser")
  DEFTIMEVAR (TV_NAME_LOOKUP           , "name lookup")
+ DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
  DEFTIMEVAR (TV_INTEGRATION           , "integration")
  DEFTIMEVAR (TV_TREE_GIMPLIFY	     , "tree gimplify")
  DEFTIMEVAR (TV_TREE_EH		     , "tree eh")
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.101
diff -c -3 -p -r2.101 tree-optimize.c
*** tree-optimize.c	19 May 2005 10:38:42 -0000	2.101
--- tree-optimize.c	23 May 2005 09:49:05 -0000
*************** init_tree_optimization_passes (void)
*** 377,387 ****
    NEXT_PASS (pass_build_cfg); 
    NEXT_PASS (pass_pre_expand);
    NEXT_PASS (pass_warn_function_return);
    *p = NULL;
  
    p = &all_passes;
    NEXT_PASS (pass_fixup_cfg);
-   NEXT_PASS (pass_tree_profile);
    NEXT_PASS (pass_init_datastructures);
    NEXT_PASS (pass_all_optimizations);
    NEXT_PASS (pass_warn_function_noreturn);
--- 377,387 ----
    NEXT_PASS (pass_build_cfg); 
    NEXT_PASS (pass_pre_expand);
    NEXT_PASS (pass_warn_function_return);
+   NEXT_PASS (pass_tree_profile);
    *p = NULL;
  
    p = &all_passes;
    NEXT_PASS (pass_fixup_cfg);
    NEXT_PASS (pass_init_datastructures);
    NEXT_PASS (pass_all_optimizations);
    NEXT_PASS (pass_warn_function_noreturn);
*************** tree_lowering_passes (tree fn)
*** 682,688 ****
  void
  ipa_passes (void)
  {
!    execute_pass_list (all_ipa_passes);
  }
  
  
--- 682,690 ----
  void
  ipa_passes (void)
  {
!   bitmap_obstack_initialize (NULL);
!   execute_pass_list (all_ipa_passes);
!   bitmap_obstack_release (NULL);
  }
  
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]