This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-profiling] Reimplement profile driven heuristics


Hi,
the attached patch implements commonize the profile driven and default
inlining heuristic implementation and also fixes multiple problems in
profile updating withing the inliner.  While testing the patch I also
experimented with profiledbootstrap and --enable-intermodule and fixed
some of problems reproducing on this.

I tested the patch on SPEC compiled with IMA and -O2. It improve peroformance
by roughly 1% (on K8, PPC testing is in progress) both with and without
profiling relative to the original heuristics not taking profiling into
account.  With profiling the code size savings are roughly 40% (I hope
to get close to -O2 numbers if heuristic is tuned curefully enought).

On tramp3d.cc benchmark the -O3 performance is somewhat worse but
looking into dumps this is caused only by instability of ordering
functions having same priorities and increasing unit growth limits
somewhat solve it (see PR 18704).  With profiling the performance is
mostly identical to Richard's leafify patch and I hope to get closer to
it's score without profiling by taking loop nest (and later estimated
profile once we are SSA) into account.

On this benchmark are performance issues with training the application,
because instrumenting functions before inlining cause redundant counters
to be produced (one get one counter per each inlined function) that
cause significant perofrmance degradation of -fprofile-generate build.
This can be solved later by doing inlining of small forwarder functions
first before instrumenting but it needs some restructuring to be done
first.

For gerald application the results are as follows:

[0]: /usr/bin/time dl-mainline  (mainline as of yesterday),   1754980bytes
[1]: /usr/bin/time dl-profold   (tree-profiling before patch),2272188bytes
[2]: /usr/bin/time dl-profnew   (tree-profiling with patch),  1911644bytes
[3]: /usr/bin/time dl-profuse   (ditto but profiling enabled),1540212bytes

                     |     [0]      |     [1]      |     [2]      |     [3]      |
---------------------+--------------+--------------+--------------+--------------+
      STRATCOMP1-ALL |  2.28 (0.00) |  2.24 (0.00) |  2.26 (0.00) |  2.23 (0.00) |
   STRATCOMP-770.2-Q |  0.41 (0.00) |  0.40 (0.00) |  0.40 (0.00) |  0.45 (0.00) |
               2QBF1 | 12.96 (0.01) | 12.72 (0.01) | 12.73 (0.01) | 11.04 (0.01) |
          PRIMEIMPL2 |  5.43 (0.00) |  5.25 (0.00) |  5.22 (0.02) |  4.79 (0.00) |
       3COL-SIMPLEX1 |  4.97 (0.04) |  4.67 (0.24) |  4.71 (0.23) |  4.72 (0.17) |
        3COL-RANDOM1 |  9.73 (0.09) |  9.50 (0.03) |  9.54 (0.06) |  8.74 (0.05) |
          HP-RANDOM1 | 11.32 (0.03) | 11.33 (0.12) | 11.39 (0.06) | 11.30 (0.07) |
       HAMCYCLE-FREE |  0.73 (0.00) |  0.74 (0.00) |  0.75 (0.00) |  0.69 (0.00) |
             DECOMP2 |  9.41 (0.22) |  8.76 (0.13) |  8.69 (0.07) | 12.25 (0.10) |
        BW-P5-nopush |  6.37 (0.02) |  6.34 (0.02) |  6.31 (0.01) |  6.30 (0.01) |
       BW-P5-pushbin |  5.32 (0.00) |  5.33 (0.00) |  5.32 (0.00) |  5.31 (0.03) |
     BW-P5-nopushbin |  1.55 (0.00) |  1.54 (0.00) |  1.54 (0.00) |  1.57 (0.00) |
        HANOI-Towers |  3.08 (0.00) |  2.96 (0.03) |  3.08 (0.00) |  3.24 (0.03) |
              RAMSEY |  4.32 (0.02) |  4.19 (0.00) |  4.24 (0.01) |  4.33 (0.00) |
             CRISTAL |  6.99 (0.01) |  6.92 (0.00) |  6.96 (0.01) |  7.08 (0.00) |
           21-QUEENS |  5.09 (0.10) |  5.08 (0.05) |  5.07 (0.01) |  4.75 (0.10) |
   MSTDir[V=13,A=40] |  7.90 (0.00) |  7.70 (0.00) |  8.15 (0.00) |  7.40 (0.00) |
   MSTDir[V=15,A=40] |  7.86 (0.00) |  7.67 (0.01) |  8.10 (0.01) |  7.36 (0.01) |
 MSTUndir[V=13,A=40] |  4.33 (0.00) |  4.24 (0.00) |  4.45 (0.00) |  4.08 (0.00) |
         TIMETABLING |  5.86 (0.00) |  5.71 (0.02) |  5.84 (0.07) |  5.78 (0.00) |
---------------------+--------------+--------------+--------------+--------------+

So there is one regression but otherwise the results are fine.


I think there is a lot of work remaining (better cost model, better
constraints and better guessing when profile is missing), but I think
the main framework is now functional (also needing cleanups especially
in tree-inline.c) so any ideas and fedback are wery welcome!  (I would
be especially interested in the tests of C++ programs and also
experiences Apple developers got while working on the earlier (pre
tree-ssa) profile driven inlining implementation).

In some sense this is the end of original plan I had concerning profile
driven optimizations in GCC so I feel bit sentimental and I would like
thank everyone who helped with this task!

Bootstrapped/regtested ppc-linux, will commit it shortly.
Honza

	* basic-block.h (REG_BR_PROB_BASE): Define here.
	* cgraph.c (cgraph_create_node): Set estimated_growth.
	(dump_cgraph_node): Print profile.
	(cgraph_clone_edge): Add count argument.
	(cgraph_clone_node): Likewise; update profile.
	* cgraph.h (cgraph_global_info): Add estimated_growth.
	(cgraph_node): Add count; kill most_desirable.
	(cgraph_edge): Kill desirability and undesirable fields.
	(cgraph_clone_edge, cgraph_clone_node): Update prototypes.
	* cgraphunit.c (cgraph_reset_node): Requeue nodes to reanalyze.
	(cgraph_analyze_function): Initialize profile, do not perform IPA.
	(cgraph_finalize_compilation_unit): Update comment; avoid
	redundant work when called multiple times; do not dump cgraph
	to avoid quadratic dumping on IMA; prettier -Q output.
	(cgraph_optimize): Prettier -Q output; analyze functions.
	* ipa-inline.c: Change heuristics to work per edge, not per inline
	candidate, replace previous profile driven heuristics by common
	implementation.
	(max_insns): change into int.
	(max_count): New static variable.
	(cgraph_clone_inlined_nodes, cgraph_clone_inlined_nodes_1,
	cgraph_mark_inline_edge): Rever the changes from cfginliner patch;
	update profile.
	(cgraph_estimate_growth): Cache result.
	(update_caller_keys): Likewise
	(update_callee_keys): Use update_caller_keys.
	(lookup_recursive_calls): Use priority heap.
	(cgraph_decide_recursive_inlining): Reorganize to use priority heap.
	(cgraph_edge_badness): New function.
	(cgraph_decide_inlining_of_small_functions): Reorganize to work per
	edge instead of per inline candidate.
	(cgraph_decide_inlining): Add TV_INLINE_HEURISTICS timevar; 
	do not call cgraph_profile_driven_inlining.
	(cgraph_profile_driven_inlining, cgraph_pick_most_desirable_edge,
	cgraph_desirability): Kill.
	* rtl.h (REG_BR_PROB_BASE): Move definition to basic-block.h.
	* timevar.def (TV_INLINE_HEURISTICS): New TV.
	* tree-inline.c (copy_cfg_body, copy_body): New count argument;
	update profile.
	(copy_body_r, expand_call_inline): Update profiling.
	* tree-optimize.c (tree_rest_of_compilation): Update.
Index: basic-block.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/basic-block.h,v
retrieving revision 1.153.2.46.2.15
diff -c -3 -p -r1.153.2.46.2.15 basic-block.h
*** basic-block.h	17 Nov 2004 22:07:29 -0000	1.153.2.46.2.15
--- basic-block.h	5 Dec 2004 16:58:43 -0000
*************** struct edge_list
*** 541,546 ****
--- 541,549 ----
    edge *index_to_edge;
  };
  
+ /* The base value for branch probability notes.  */
+ #define REG_BR_PROB_BASE  10000
+ 
  /* This is the value which indicates no edge is present.  */
  #define EDGE_INDEX_NO_EDGE	-1
  
Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.4.4.18.2.20
diff -c -3 -p -r1.4.4.18.2.20 cgraph.c
*** cgraph.c	3 Dec 2004 12:44:45 -0000	1.4.4.18.2.20
--- cgraph.c	5 Dec 2004 16:58:43 -0000
*************** cgraph_create_node (void)
*** 166,171 ****
--- 166,172 ----
      cgraph_nodes->previous = node;
    node->previous = NULL;
    node->static_vars_info = NULL;
+   node->global.estimated_growth = INT_MIN;
    cgraph_nodes = node;
    cgraph_n_nodes++;
    return node;
*************** dump_cgraph_node (FILE *f, struct cgraph
*** 504,509 ****
--- 505,513 ----
    fprintf (f, "%s/%i:", cgraph_node_name (node), node->uid);
    if (node->master_clone && node->master_clone->uid != node->uid)
      fprintf (f, "(%i)", node->master_clone->uid);
+   if (node->count)
+     fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
+ 	     (HOST_WIDEST_INT)node->count);
    if (node->global.inlined_to)
      fprintf (f, " (inline copy in %s/%i)",
  	     cgraph_node_name (node->global.inlined_to),
*************** dump_cgraph_node (FILE *f, struct cgraph
*** 548,553 ****
--- 552,560 ----
  	       edge->caller->uid);
        if (!edge->inline_failed)
  	fprintf(f, "(inlined) ");
+       if (edge->count)
+ 	fprintf (f, "("HOST_WIDEST_INT_PRINT_DEC"x) ",
+ 		 (HOST_WIDEST_INT)edge->count);
      }
  
    fprintf (f, "\n  calls: ");
*************** dump_cgraph_node (FILE *f, struct cgraph
*** 557,562 ****
--- 564,572 ----
  	       edge->callee->uid);
        if (!edge->inline_failed)
  	fprintf(f, "(inlined) ");
+       if (edge->count)
+ 	fprintf (f, "("HOST_WIDEST_INT_PRINT_DEC"x) ",
+ 		 (HOST_WIDEST_INT)edge->count);
      }
    fprintf (f, "\n  cycle: ");
    n = node->next_cycle;
*************** cgraph_function_possibly_inlined_p (tree
*** 791,810 ****
  
  /* Create clone of E in the node N represented by CALL_EXPR the callgraph.  */
  struct cgraph_edge *
! cgraph_clone_edge (struct cgraph_edge *e, struct cgraph_node *n, tree call_expr)
  {
    struct cgraph_edge *new = cgraph_create_edge (n, e->callee, call_expr);
  
    new->inline_failed = e->inline_failed;
    return new;
  }
  
! /* Create node representing clone of N.  */
  struct cgraph_node *
! cgraph_clone_node (struct cgraph_node *n)
  {
    struct cgraph_node *new = cgraph_create_node ();
    struct cgraph_edge *e;
  
    new->decl = n->decl;
    new->origin = n->origin;
--- 801,825 ----
  
  /* Create clone of E in the node N represented by CALL_EXPR the callgraph.  */
  struct cgraph_edge *
! cgraph_clone_edge (struct cgraph_edge *e, struct cgraph_node *n,
! 		   tree call_expr, int count_scale)
  {
    struct cgraph_edge *new = cgraph_create_edge (n, e->callee, call_expr);
  
    new->inline_failed = e->inline_failed;
+   new->count = e->count * count_scale / REG_BR_PROB_BASE;
+   e->count -= new->count;
    return new;
  }
  
! /* Create node representing clone of N executed COUNT times.  Decrease
!    the execution counts from original node too.  */
  struct cgraph_node *
! cgraph_clone_node (struct cgraph_node *n, gcov_type count)
  {
    struct cgraph_node *new = cgraph_create_node ();
    struct cgraph_edge *e;
+   int count_scale;
  
    new->decl = n->decl;
    new->origin = n->origin;
*************** cgraph_clone_node (struct cgraph_node *n
*** 819,827 ****
    new->global = n->global;
    new->rtl = n->rtl;
    new->master_clone = n->master_clone;
  
    for (e = n->callees;e; e=e->next_callee)
!     cgraph_clone_edge (e, new, e->call_expr);
  
    new->next_clone = n->next_clone;
    n->next_clone = new;
--- 834,848 ----
    new->global = n->global;
    new->rtl = n->rtl;
    new->master_clone = n->master_clone;
+   new->count = count;
+   if (n->count)
+     count_scale = new->count * REG_BR_PROB_BASE / n->count;
+   else
+     count_scale = 0;
+   n->count -= count;
  
    for (e = n->callees;e; e=e->next_callee)
!     cgraph_clone_edge (e, new, e->call_expr, count_scale);
  
    new->next_clone = n->next_clone;
    n->next_clone = new;
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.1.4.16.2.19
diff -c -3 -p -r1.1.4.16.2.19 cgraph.h
*** cgraph.h	3 Dec 2004 03:58:44 -0000	1.1.4.16.2.19
--- cgraph.h	5 Dec 2004 16:58:43 -0000
*************** struct cgraph_global_info GTY(())
*** 91,96 ****
--- 91,98 ----
  
    /* Estimated size of the function after inlining.  */
    int insns;
+   /* Estimated growth after inlining.  INT_MIN if not computed.  */
+   int estimated_growth;
  
    /* Set iff the function has been inlined at least once.  */
    bool inlined;
*************** struct cgraph_node GTY((chain_next ("%h.
*** 140,145 ****
--- 142,149 ----
       variables modified by function calls.  */
    ipa_static_vars_info_t GTY ((skip)) static_vars_info;
  
+   /* Expected number of executions: calculated in profile.c.  */
+   gcov_type count;
    /* Unique id of the node.  */
    int uid;
    /* Set when function must be output - it is externally visible
*************** struct cgraph_node GTY((chain_next ("%h.
*** 157,164 ****
    bool output;
    /* Used only while constructing the callgraph.  */
    basic_block current_basic_block;
-   /* The CALL within this function that we'd like to inline next.  */
-   struct cgraph_edge *most_desirable;
  };
  
  struct cgraph_edge GTY((chain_next ("%h.next_caller")))
--- 161,166 ----
*************** struct cgraph_edge GTY((chain_next ("%h.
*** 174,183 ****
    const char *inline_failed;
    /* Expected number of executions: calculated in profile.c.  */
    gcov_type count;
-   /* Desirability of this edge for inlining.  Higher numbers are more
-      likely to be inlined.  */
-   gcov_type desirability;
-   bool undesirable;
  };
  
  /* The cgraph_varpool data structure.
--- 176,181 ----
*************** struct cgraph_local_info *cgraph_local_i
*** 237,244 ****
  struct cgraph_global_info *cgraph_global_info (tree);
  struct cgraph_rtl_info *cgraph_rtl_info (tree);
  const char * cgraph_node_name (struct cgraph_node *);
! struct cgraph_edge * cgraph_clone_edge (struct cgraph_edge *, struct cgraph_node *, tree);
! struct cgraph_node * cgraph_clone_node (struct cgraph_node *);
  
  struct cgraph_varpool_node *cgraph_varpool_node (tree);
  void cgraph_varpool_mark_needed_node (struct cgraph_varpool_node *);
--- 235,242 ----
  struct cgraph_global_info *cgraph_global_info (tree);
  struct cgraph_rtl_info *cgraph_rtl_info (tree);
  const char * cgraph_node_name (struct cgraph_node *);
! struct cgraph_edge * cgraph_clone_edge (struct cgraph_edge *, struct cgraph_node *, tree, int);
! struct cgraph_node * cgraph_clone_node (struct cgraph_node *, gcov_type);
  
  struct cgraph_varpool_node *cgraph_varpool_node (tree);
  void cgraph_varpool_mark_needed_node (struct cgraph_varpool_node *);
Index: cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.1.4.35.2.34
diff -c -3 -p -r1.1.4.35.2.34 cgraphunit.c
*** cgraphunit.c	3 Dec 2004 12:44:45 -0000	1.1.4.35.2.34
--- cgraphunit.c	5 Dec 2004 16:58:43 -0000
*************** cgraph_reset_node (struct cgraph_node *n
*** 315,325 ****
    memset (&node->local, 0, sizeof (node->local));
    memset (&node->global, 0, sizeof (node->global));
    memset (&node->rtl, 0, sizeof (node->rtl));
    node->analyzed = node->local.finalized = false;
    node->local.redefined_extern_inline = true;
    while (node->callees)
      cgraph_remove_edge (node->callees);
- 
    /* We may need to re-queue the node for assembling in case
       we already proceeded it and ignored as not needed.  */
    if (node->reachable && !flag_unit_at_a_time)
--- 315,334 ----
    memset (&node->local, 0, sizeof (node->local));
    memset (&node->global, 0, sizeof (node->global));
    memset (&node->rtl, 0, sizeof (node->rtl));
+   /* Requeue the node to be re-analyzed if it has been seen in the other unit
+      already.
+      FIXME: Currently intermodule optimization never inline extern inline
+      function defined in multiple units.  This is very wrong.
+      */
+   if (node->analyzed && flag_unit_at_a_time)
+     {
+       node->next_needed = cgraph_nodes_queue;
+       cgraph_nodes_queue = node;
+     }
    node->analyzed = node->local.finalized = false;
    node->local.redefined_extern_inline = true;
    while (node->callees)
      cgraph_remove_edge (node->callees);
    /* We may need to re-queue the node for assembling in case
       we already proceeded it and ignored as not needed.  */
    if (node->reachable && !flag_unit_at_a_time)
*************** cgraph_analyze_function (struct cgraph_n
*** 779,784 ****
--- 788,795 ----
    if (flag_unit_at_a_time)
      tree_early_local_passes (decl);
  
+   node->count = ENTRY_BLOCK_PTR->count;
+ 
    /* First kill forward declaration so reverse inlining works properly.  */
    cgraph_create_edges (node, decl);
  
*************** cgraph_analyze_function (struct cgraph_n
*** 787,794 ****
       don't want to add more passes like this, it should be OK.  */
    if (!flag_unit_at_a_time)
      cgraph_analyze_function_inlinability (node);
-   else
-     ipa_analyze_function (node);
  
    node->analyzed = true;
    current_function_decl = NULL;
--- 798,803 ----
*************** cgraph_analyze_function (struct cgraph_n
*** 796,807 ****
    timevar_pop (TV_IPA_ANALYSIS);
  }
  
! /* Analyze the whole compilation unit once it is parsed completely.  */
  
  void
  cgraph_finalize_compilation_unit (void)
  {
    struct cgraph_node *node;
  
    if (!flag_unit_at_a_time)
      {
--- 805,822 ----
    timevar_pop (TV_IPA_ANALYSIS);
  }
  
! /* Analyze the whole (source level) compilation unit once it is parsed
!    completely.  For frontends supporting multiple compilation units to
!    be parsed at once this function shall be called for each of them so
!    unreachable static functions are elliminated early.  */
  
  void
  cgraph_finalize_compilation_unit (void)
  {
    struct cgraph_node *node;
+   /* Keep track of already processed nodes when called multiple times for
+      intermodule optmization.  */
+   static struct cgraph_node *first_analyzed;
  
    if (!flag_unit_at_a_time)
      {
*************** cgraph_finalize_compilation_unit (void)
*** 809,823 ****
        return;
      }
  
-   cgraph_varpool_analyze_pending_decls ();
    if (!quiet_flag)
!     fprintf (stderr, "\nAnalyzing compilation unit\n");
  
    timevar_push (TV_CGRAPH);
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Initial entry points:");
!       for (node = cgraph_nodes; node; node = node->next)
  	if (node->needed && DECL_SAVED_TREE (node->decl))
  	  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
        fprintf (cgraph_dump_file, "\n");
--- 824,841 ----
        return;
      }
  
    if (!quiet_flag)
!     {
!       fprintf (stderr, "\nAnalyzing compilation unit");
!       fflush (stderr);
!     }
  
    timevar_push (TV_CGRAPH);
+   cgraph_varpool_analyze_pending_decls ();
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Initial entry points:");
!       for (node = cgraph_nodes; node != first_analyzed; node = node->next)
  	if (node->needed && DECL_SAVED_TREE (node->decl))
  	  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
        fprintf (cgraph_dump_file, "\n");
*************** cgraph_finalize_compilation_unit (void)
*** 862,878 ****
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Unit entry points:");
!       for (node = cgraph_nodes; node; node = node->next)
  	if (node->needed && DECL_SAVED_TREE (node->decl))
  	  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
-       fprintf (cgraph_dump_file, "\n\nInitial ");
-       dump_cgraph (cgraph_dump_file);
      }
  
    if (cgraph_dump_file)
      fprintf (cgraph_dump_file, "\nReclaiming functions:");
  
!   for (node = cgraph_nodes; node; node = node->next)
      {
        tree decl = node->decl;
  
--- 880,894 ----
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Unit entry points:");
!       for (node = cgraph_nodes; node != first_analyzed; node = node->next)
  	if (node->needed && DECL_SAVED_TREE (node->decl))
  	  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
      }
  
    if (cgraph_dump_file)
      fprintf (cgraph_dump_file, "\nReclaiming functions:");
  
!   for (node = cgraph_nodes; node != first_analyzed; node = node->next)
      {
        tree decl = node->decl;
  
*************** cgraph_finalize_compilation_unit (void)
*** 895,905 ****
        gcc_assert (!node->local.finalized || DECL_SAVED_TREE (decl));
        gcc_assert (node->analyzed == node->local.finalized);
      }
!   if (cgraph_dump_file)
!     {
!       fprintf (cgraph_dump_file, "\n\nReclaimed ");
!       dump_cgraph (cgraph_dump_file);
!     }
    ggc_collect ();
    timevar_pop (TV_CGRAPH);
  }
--- 911,919 ----
        gcc_assert (!node->local.finalized || DECL_SAVED_TREE (decl));
        gcc_assert (node->analyzed == node->local.finalized);
      }
!   first_analyzed = cgraph_nodes;
!   if (!quiet_flag)
!     fprintf (stderr, "\n\n");
    ggc_collect ();
    timevar_pop (TV_CGRAPH);
  }
*************** cgraph_preserve_function_body_p (tree de
*** 1122,1127 ****
--- 1136,1142 ----
  void
  cgraph_optimize (void)
  {
+   struct cgraph_node *node;
  #ifdef ENABLE_CHECKING
    verify_cgraph ();
  #endif
*************** cgraph_optimize (void)
*** 1131,1136 ****
--- 1146,1161 ----
        return;
      }
    timevar_push (TV_IPA_OPT);
+   if (!quiet_flag)
+     {
+       fprintf (stderr, "Performing intraprocedural optimizations");
+       fflush (stderr);
+     }
+   if (cgraph_dump_file)
+     {
+       fprintf (cgraph_dump_file, "\n\nInitial ");
+       dump_cgraph (cgraph_dump_file);
+     }
  
    /* Frontend may output common variables after the unit has been finalized.
       It is safe to deal with them here as they are always zero initialized.  */
*************** cgraph_optimize (void)
*** 1138,1147 ****
  
    cgraph_function_and_variable_visibility ();
  
    if (flag_ipa_cp && flag_ipa_no_cloning)
      ipcp_driver ();
-   if (!quiet_flag)
-     fprintf (stderr, "Performing intraprocedural optimizations\n");
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Marked ");
--- 1163,1174 ----
  
    cgraph_function_and_variable_visibility ();
  
+   for (node = cgraph_nodes; node; node = node->next)
+     if (node->analyzed)
+       ipa_analyze_function (node);
+ 
    if (flag_ipa_cp && flag_ipa_no_cloning)
      ipcp_driver ();
    if (cgraph_dump_file)
      {
        fprintf (cgraph_dump_file, "Marked ");
*************** cgraph_optimize (void)
*** 1164,1170 ****
  
    /* Output everything.  */
    if (!quiet_flag)
!     fprintf (stderr, "Assembling functions:\n");
  #ifdef ENABLE_CHECKING
    verify_cgraph ();
  #endif
--- 1191,1197 ----
  
    /* Output everything.  */
    if (!quiet_flag)
!     fprintf (stderr, "\nAssembling functions:\n");
  #ifdef ENABLE_CHECKING
    verify_cgraph ();
  #endif
Index: ipa-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/ipa-inline.c,v
retrieving revision 1.1.2.3
diff -c -3 -p -r1.1.2.3 ipa-inline.c
*** ipa-inline.c	2 Dec 2004 15:50:27 -0000	1.1.2.3
--- ipa-inline.c	5 Dec 2004 16:58:43 -0000
*************** static int ncalls_inlined;
*** 87,95 ****
  static int nfunctions_inlined;
  static int initial_insns;
  static int overall_insns;
! static HOST_WIDEST_INT max_insns;
! 
! static struct cgraph_node *already_cloned;
  
  /* Estimate size of the function after inlining WHAT into TO.  */
  
--- 87,94 ----
  static int nfunctions_inlined;
  static int initial_insns;
  static int overall_insns;
! static int max_insns;
! static gcov_type max_count;
  
  /* Estimate size of the function after inlining WHAT into TO.  */
  
*************** cgraph_estimate_size_after_inlining (int
*** 108,121 ****
     DUPLICATE is used for bookkeeping on whether we are actually creating new
     clones or re-using node originally representing out-of-line function call.
     */
! static void
! cgraph_clone_inlined_nodes_1 (struct cgraph_edge *e, bool duplicate)
  {
!   struct cgraph_node *n = e->callee;
!   struct cgraph_edge *step_edge;
! 
!   if (e->callee->aux)
!     abort();
  
    /* We may eliminate the need for out-of-line copy to be output.  In that
       case just go ahead and re-use it.  */
--- 107,116 ----
     DUPLICATE is used for bookkeeping on whether we are actually creating new
     clones or re-using node originally representing out-of-line function call.
     */
! void
! cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate)
  {
!   struct cgraph_node *n;
  
    /* We may eliminate the need for out-of-line copy to be output.  In that
       case just go ahead and re-use it.  */
*************** cgraph_clone_inlined_nodes_1 (struct cgr
*** 131,175 ****
      }
     else if (duplicate)
      {
!       e->callee->aux = already_cloned;
!       already_cloned = e->callee;
!       n = cgraph_clone_node (e->callee);
        cgraph_redirect_edge_callee (e, n);
      }
  
    if (e->caller->global.inlined_to)
!     n->global.inlined_to = e->caller->global.inlined_to;
    else
!     n->global.inlined_to = e->caller;
  
!   /* Recursively clone all bodies.  Non-zero aux means we've handled
!      this edge already; skip it to avoid confusing ourselves.  */
!   for (step_edge = n->callees; step_edge; step_edge = step_edge->next_callee)
!     if (!step_edge->inline_failed && !step_edge->callee->aux)
!       cgraph_clone_inlined_nodes_1 (step_edge, duplicate);
! }
! 
! /* E is expected to be an edge being inlined.  Clone destination node of
!    the edge and redirect it to the new clone.
!    DUPLICATE is used for bookeeping on whether we are actually creating new
!    clones or re-using node originally representing out-of-line function call.
!    */
! void
! cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate)
! {
!   struct cgraph_node *next_node;
!   struct cgraph_node *step_node;
!   struct cgraph_node end_node;        /* A non-NULL end-of-chain marker.  */
!   
!   end_node.aux = (struct cgraph_node *)0;
!   already_cloned = &end_node;
!   cgraph_clone_inlined_nodes_1 (e, duplicate);
!   for (step_node = already_cloned; step_node != &end_node; step_node = next_node)
!     {
!       next_node = step_node->aux;
!       step_node->aux = (struct cgraph_node *)0;
!     }
!   already_cloned = (struct cgraph_node *)0;
  }
  
  /* Mark edge E as inlined and update callgraph accordingly.  */
--- 126,144 ----
      }
     else if (duplicate)
      {
!       n = cgraph_clone_node (e->callee, e->count);
        cgraph_redirect_edge_callee (e, n);
      }
  
    if (e->caller->global.inlined_to)
!     e->callee->global.inlined_to = e->caller->global.inlined_to;
    else
!     e->callee->global.inlined_to = e->caller;
  
!   /* Recursively clone all bodies.  */
!   for (e = e->callee->callees; e; e = e->next_callee)
!     if (!e->inline_failed)
!       cgraph_clone_inlined_nodes (e, duplicate);
  }
  
  /* Mark edge E as inlined and update callgraph accordingly.  */
*************** cgraph_mark_inline_edge (struct cgraph_e
*** 179,185 ****
  {
    int old_insns = 0, new_insns = 0;
    struct cgraph_node *to = NULL, *what;
-   bool simple_recursion = (e->callee == e->caller);
  
    gcc_assert (e->inline_failed);
    e->inline_failed = NULL;
--- 148,153 ----
*************** cgraph_mark_inline_edge (struct cgraph_e
*** 191,209 ****
    cgraph_clone_inlined_nodes (e, true);
  
    what = e->callee;
-   if (simple_recursion)
-     {
-       struct cgraph_edge* e2;
-       /* New edge from original to copy should be marked !inline_failed, but
- 	 edge(s) back from copy to original (which we just created by
- 	 cloning the original->copy edge) should not be.  Indirect
- 	 recursion doesn't have the problem. */
-       for (e2 = what->callees; e2; e2 = e2->next_callee)
- 	if (e2->callee == e->caller)
- 	  e2->inline_failed = N_("function not considered for inlining");
-     }
- 
-   /* Install the just-cloned cgraph_node on the list.  */
  
    /* Now update size of caller and all functions caller is inlined into.  */
    for (;e && !e->inline_failed; e = e->caller->callers)
--- 159,164 ----
*************** cgraph_estimate_growth (struct cgraph_no
*** 257,262 ****
--- 212,219 ----
  {
    int growth = 0;
    struct cgraph_edge *e;
+   if (node->global.estimated_growth != INT_MIN)
+     return node->global.estimated_growth;
  
    for (e = node->callers; e; e = e->next_caller)
      if (e->inline_failed)
*************** cgraph_estimate_growth (struct cgraph_no
*** 269,274 ****
--- 226,232 ----
    if (!node->needed && !DECL_EXTERNAL (node->decl))
      growth -= node->global.insns;
  
+   node->global.estimated_growth = growth;
    return growth;
  }
  
*************** cgraph_recursive_inlining_p (struct cgra
*** 347,401 ****
    return recursive;
  }
  
! /* Recompute heap nodes for each of callees.  */
  
  static void
! update_callee_keys (fibheap_t heap, struct fibnode **heap_node,
! 		    struct cgraph_node *node)
  {
    struct cgraph_edge *e;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed && heap_node[e->callee->uid])
!       fibheap_replace_key (heap, heap_node[e->callee->uid],
! 			   cgraph_estimate_growth (e->callee));
      else if (!e->inline_failed)
!       update_callee_keys (heap, heap_node, e->callee);
  }
  
! /* Enqueue all recursive calls from NODE into queue linked via aux pointers
!    in between FIRST and LAST.  WHERE is used for bookkeeping while looking
!    int calls inlined within NODE.  */
  
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			struct cgraph_edge **first, struct cgraph_edge **last)
  {
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	if (!*first)
! 	  *first = e;
! 	else
! 	  (*last)->aux = e;
! 	*last = e;
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, first, last);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
  
! static void
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   struct cgraph_edge *first_call = NULL, *last_call = NULL;
!   struct cgraph_edge *last_in_current_depth;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
--- 305,447 ----
    return recursive;
  }
  
! /* Return true if the call can be hot.  */
! static bool
! cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
! {
!   if (profile_info && flag_branch_probabilities
!       && (edge->count
! 	  < profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
!     return false;
!   return true;
! }
! 
! /* A cost model driving the inlining heuristics in a way so the edges with
!    smallest badness are inlined first.  After each inlining is performed
!    the costs of all caller edges of nodes affected are recompted so the
!    metrics may accurately depend on values such as number of inlinable callers
!    of the function or function body size.
! 
!    For the moment we use estimated growth caused by inlining callee into all
!    it's callers for driving the inlining but once we have loop depth or
!    frequency information readilly available we should do better.
! 
!    With profiling we use number of executions of each edge to drive the cost.
!    We also should distinguish hot and cold calls where the cold calls are
!    inlined into only when code size is overall improved.  
!    
!    Value INT_MAX can be returned to prevent function from being inlined.
!    */
! 
! static int
! cgraph_edge_badness (struct cgraph_edge *edge)
! {
!   if (profile_info && flag_branch_probabilities && max_count)
!     {
!       int growth =
! 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
!       growth -= edge->caller->global.insns;
! 
!       /* Always preffer inlining saving code size.  */
!       if (growth <= 0)
! 	return INT_MIN - growth;
!       return ((int)((double)edge->count * INT_MIN / max_count)) / growth;
!     }
!   else
!   {
!     int growth = cgraph_estimate_growth (edge->callee);
! 
!     /* Make recursive inlining happen always after other inlining is done.  */
!     if (cgraph_recursive_inlining_p (edge->caller, edge->callee, NULL))
!       return growth + 1;
!     else
!       return growth;
!   }
! }
! 
! /* Recompute heap nodes for each of caller edge.  */
  
  static void
! update_caller_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
! {
!   struct cgraph_edge *edge;
! 
!   if (!node->local.inlinable || node->local.disregard_inline_limits
!       || node->global.inlined_to)
!     return;
!   if (bitmap_bit_p (updated_nodes, node->uid))
!     return;
!   bitmap_set_bit (updated_nodes, node->uid);
! 
!   for (edge = node->callers; edge; edge = edge->next_caller)
!     if (edge->inline_failed)
!       {
! 	int badness = cgraph_edge_badness (edge);
! 	if (edge->aux)
! 	  {
! 	    fibnode_t n = edge->aux;
! 	    gcc_assert (n->data == edge);
! 	    if (n->key == badness)
! 	      continue;
! 	    /* FIXME: Why this produce different results?  */
! #if 0
! 	    fibheap_replace_key (heap, n, badness);
! 	    continue;
! #endif
! 	    fibheap_delete_node (heap, edge->aux);
! 	  }
! 	edge->aux = fibheap_insert (heap, badness, edge);
!       }
! }
! 
! /* Recompute heap nodes for each of caller edges of each of callees.  */
! 
! static void
! update_callee_keys (fibheap_t heap, struct cgraph_node *node,
! 		    bitmap updated_nodes)
  {
    struct cgraph_edge *e;
+   node->global.estimated_growth = INT_MIN;
  
    for (e = node->callees; e; e = e->next_callee)
!     if (e->inline_failed)
!       update_caller_keys (heap, e->callee, updated_nodes);
      else if (!e->inline_failed)
!       update_callee_keys (heap, e->callee, updated_nodes);
  }
  
! /* Enqueue all recursive calls from NODE into priority queue depending on
!    how likely we want to recursivly inline the call.  */
  
  static void
  lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! 			fibheap_t heap)
  {
+   static int priority;
    struct cgraph_edge *e;
    for (e = where->callees; e; e = e->next_callee)
      if (e->callee == node)
        {
! 	/* FIXME: Once counts and frequencies are available we should drive the
! 	   order by these.  For now force the order to be simple queue since
! 	   we get order dependent on recursion depth for free by this.  */
!         fibheap_insert (heap, priority++, e);
        }
    for (e = where->callees; e; e = e->next_callee)
      if (!e->inline_failed)
!       lookup_recursive_calls (node, e->callee, heap);
  }
  
  /* Decide on recursive inlining: in the case function has recursive calls,
     inline until body size reaches given argument.  */
  
! static bool
  cgraph_decide_recursive_inlining (struct cgraph_node *node)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
!   fibheap_t heap;
    struct cgraph_edge *e;
    struct cgraph_node *master_clone;
    int depth = 0;
*************** cgraph_decide_recursive_inlining (struct
*** 411,460 ****
    if (!max_depth
        || node->global.insns < INSNS_PER_CALL
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return;
!   lookup_recursive_calls (node, node, &first_call, &last_call);
!   if (!first_call)
!     return;
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "\nPerforming recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
!   master_clone = cgraph_clone_node (node);
    master_clone->needed = true;
    for (e = master_clone->callees; e; e = e->next_callee)
      if (!e->inline_failed)
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   last_in_current_depth = last_call;
!   while (first_call
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = first_call;
  
!       first_call = first_call->aux;
!       curr->aux = NULL;
  
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, &first_call, &last_call);
! 
!       if (last_in_current_depth
! 	  && ++depth >= max_depth)
! 	break;
        n++;
      }
  
!   /* Cleanup queue pointers.  */
!   while (first_call)
!     {
!       struct cgraph_edge *next = first_call->aux;
!       first_call->aux = NULL;
!       first_call = next;
!     }
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
--- 457,508 ----
    if (!max_depth
        || node->global.insns < INSNS_PER_CALL
        || cgraph_estimate_size_after_inlining (1, node, node)  >= limit)
!     return false;
!   heap = fibheap_new ();
!   lookup_recursive_calls (node, node, heap);
!   if (fibheap_empty (heap))
!     {
!       fibheap_delete (heap);
!       return false;
!     }
  
    if (dump_file)
      fprintf (dump_file, 
! 	     "  Performing recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
    /* We need original clone to copy around.  */
!   master_clone = cgraph_clone_node (node, 0);
    master_clone->needed = true;
    for (e = master_clone->callees; e; e = e->next_callee)
      if (!e->inline_failed)
        cgraph_clone_inlined_nodes (e, true);
  
    /* Do the inlining and update list of recursive call during process.  */
!   while (!fibheap_empty (heap)
  	 && cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
      {
!       struct cgraph_edge *curr = fibheap_extract_min (heap);
!       struct cgraph_node *node;
  
!       depth = 0;
!       for (node = curr->caller;
! 	   node; node = node->global.inlined_to)
! 	if (node->decl == curr->callee->decl)
! 	  depth++;
!       if (depth > max_depth)
! 	continue;
  
+       if (dump_file)
+ 	fprintf (dump_file, 
+ 		 "   Inlining call of depth %i\n", depth);
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr);
!       lookup_recursive_calls (node, curr->callee, heap);
        n++;
      }
  
!   fibheap_delete (heap);
    if (dump_file)
      fprintf (dump_file, 
  	     "\n   Inlined %i times, body grown from %i to %i insns\n", n,
*************** cgraph_decide_recursive_inlining (struct
*** 468,473 ****
--- 516,522 ----
      if (node->global.inlined_to == master_clone)
        cgraph_remove_node (node);
    cgraph_remove_node (master_clone);
+   return true;
  }
  
  /* Set inline_failed for all callers of given function to REASON.  */
*************** cgraph_set_inline_failed (struct cgraph_
*** 484,623 ****
        e->inline_failed = reason;
  }
  
- /* Return true if the call can be hot.  */
- static bool
- cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
- {
-   if (profile_info && flag_branch_probabilities
-       && (edge->count
- 	  < profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
-     return false;
-   return true;
- }
- 
- /* Given a call graph edge EDGE, return the "desirability" for inlining
-    the callee into the caller.
-    The desirability of a function body or a CALL_EXPR is a unit-less
-    measure of how beneficial it would be to inline.  Generally, smaller
-    bodies are more desirable, and CALL_EXPRs with higher block-counts are
-    more desirable.  This function computes the "desirability" of a
-    function body.  */
- 
- static gcov_type
- cgraph_desirability (struct cgraph_edge *edge)
- {
-   HOST_WIDEST_INT desire;
-   HOST_WIDEST_INT denominator;
-   struct cgraph_node *callee = edge->callee;
-   
-   if (!callee->local.inlinable)
-     return 0;
-   
-   /* FIXME: Ideally we'd replace this with a more sophisticated
-      "temperature" sort of metric.  */
-   denominator = edge->callee->global.insns - INSNS_PER_CALL;
-   if (denominator > 0)
-     desire = cgraph_maybe_hot_edge_p (edge) ? edge->count / denominator : 0;
-   else
-     desire = profile_info->sum_max;
-   return desire;
- }
- 
- /* Given a caller represented by NODE, find the edge that points to the
-    callee that is the most desirable to inline in the caller.  */
- 
- static struct cgraph_edge *
- cgraph_pick_most_desirable_edge (struct cgraph_node *node)
- {
-   struct cgraph_edge *step_edge;
-   struct cgraph_edge *most_desirable_edge;
-   HOST_WIDEST_INT highest_desire;
-   
-   highest_desire = 0;
-   most_desirable_edge = (struct cgraph_edge *)0;
-   for (step_edge = node->callees;
-        step_edge;
-        step_edge = step_edge->next_callee)
-     {
-       if (step_edge->inline_failed && !step_edge->undesirable
- 	  && (step_edge->desirability > highest_desire))
- 	{
- 	  highest_desire = step_edge->desirability;
- 	  most_desirable_edge = step_edge;
- 	}
-     }
-   return most_desirable_edge;
- }
- 
- /* Decide what to inline using profile information.
-    ???  Needs more explaining, htf does this work?!
-    ???  This algorithm i *way* too expensive, can be done
- 	much cheaper, no doubt.  */
- 
- static void
- cgraph_profile_driven_inlining (void)
- {
-   struct cgraph_node *step_node;
-   struct cgraph_edge *step_edge;
-   HOST_WIDEST_INT highest_desire;
-   struct cgraph_edge *most_desirable_edge;
-   struct cgraph_node *most_desirable_node;
-   
-   /* Pass 1: compute desirability of all CALL_EXPRs.  */
-   for (step_node = cgraph_nodes;
-        step_node;
-        step_node = step_node->next)
-     {
-       highest_desire = 0;
-       most_desirable_edge = (struct cgraph_edge *)0;
-       for (step_edge = step_node->callees;
- 	   step_edge;
- 	   step_edge = step_edge->next_callee)
- 	if (step_edge->inline_failed)
- 	  {
- 	    step_edge->desirability = cgraph_desirability (step_edge);
- 	    if (step_edge->desirability > highest_desire)
- 	      {
- 		highest_desire = step_edge->desirability;
- 		most_desirable_edge = step_edge;
- 	      }
- 	  }
-       step_node->most_desirable = most_desirable_edge;
-     }
-   /* Pass 2: pick edges to inline.  */
-   while (overall_insns <= max_insns)
-     {
-       highest_desire = 0;
-       most_desirable_node = (struct cgraph_node *)0;
-       for (step_node = cgraph_nodes;
- 	   step_node;
- 	   step_node = step_node->next)
- 	{
- 	  if (step_node->most_desirable
- 	      && (step_node->most_desirable->desirability > highest_desire))
- 	    {
- 	      highest_desire = step_node->most_desirable->desirability;
- 	      most_desirable_node = step_node;
- 	    }
- 	}
-       /* If we found a suitable CALL_EXPR, record our choice, else
- 	 abandon the search.  */
-       if (most_desirable_node)
- 	{
- 	  struct cgraph_edge *e = most_desirable_node->most_desirable;
- 	  if (e->inline_failed
- 	      && cgraph_check_inline_limits (e->caller, e->callee,
- 			  		     &e->inline_failed))
- 	    cgraph_mark_inline_edge (e);
- 	  else
- 	    e->undesirable = true;
- 	  most_desirable_node->most_desirable = cgraph_pick_most_desirable_edge (most_desirable_node);
- 	}
-       else
-         break;
-     }
- }
- 
  /* We use greedy algorithm for inlining of small functions:
     All inline candidates are put into prioritized heap based on estimated
     growth of the overall number of instructions and then update the estimates.
--- 533,538 ----
*************** static void
*** 629,637 ****
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
    fibheap_t heap = fibheap_new ();
!   struct fibnode **heap_node =
!     xcalloc (cgraph_max_uid, sizeof (struct fibnode *));
  
    /* Put all inline candidates into the heap.  */
  
--- 544,555 ----
  cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
+   struct cgraph_edge *edge;
    fibheap_t heap = fibheap_new ();
!   bitmap updated_nodes = BITMAP_XMALLOC ();
! 
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
  
    /* Put all inline candidates into the heap.  */
  
*************** cgraph_decide_inlining_of_small_function
*** 640,726 ****
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
  
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
-       heap_node[node->uid] =
- 	fibheap_insert (heap, cgraph_estimate_growth (node), node);
-     }
  
!   if (dump_file)
!     fprintf (dump_file, "\nDeciding on smaller functions:\n");
!   while (overall_insns <= max_insns && (node = fibheap_extract_min (heap)))
      {
-       struct cgraph_edge *e, *next;
        int old_insns = overall_insns;
  
-       heap_node[node->uid] = NULL;
        if (dump_file)
- 	fprintf (dump_file, 
- 		 "\nConsidering %s with %i insns\n"
- 		 " Estimated growth is %+i insns.\n",
- 		 cgraph_node_name (node), node->global.insns,
- 		 cgraph_estimate_growth (node));
-       if (!cgraph_default_inline_p (node))
  	{
! 	  cgraph_set_inline_failed (node,
! 	    N_("--param max-inline-insns-single limit reached after inlining into the callee"));
  	  continue;
  	}
!       for (e = node->callers; e; e = next)
  	{
! 	  next = e->next_caller;
! 	  if (e->inline_failed)
  	    {
- 	      struct cgraph_node *where;
- 
- 	      if (cgraph_recursive_inlining_p (e->caller, e->callee,
- 				      	       &e->inline_failed)
- 		  || !cgraph_check_inline_limits (e->caller, e->callee,
- 			  			  &e->inline_failed))
- 		{
- 		  if (dump_file)
- 		    fprintf (dump_file, " Not inlining into %s:%s.\n",
- 			     cgraph_node_name (e->caller), e->inline_failed);
- 		  continue;
- 		}
- 	      next = cgraph_mark_inline (e);
- 	      where = e->caller;
- 	      if (where->global.inlined_to)
- 		where = where->global.inlined_to;
- 
- 	      if (heap_node[where->uid])
- 		fibheap_replace_key (heap, heap_node[where->uid],
- 				     cgraph_estimate_growth (where));
- 
  	      if (dump_file)
! 		fprintf (dump_file, 
! 			 " Inlined into %s which now has %i insns.\n",
! 			 cgraph_node_name (e->caller),
! 			 e->caller->global.insns);
  	    }
  	}
  
!       cgraph_decide_recursive_inlining (node);
! 
!       /* Similarly all functions called by the function we just inlined
!          are now called more times; update keys.  */
!       update_callee_keys (heap, heap_node, node);
! 
        if (dump_file)
  	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((node = fibheap_extract_min (heap)) != NULL)
!     if (!node->local.disregard_inline_limits)
!       cgraph_set_inline_failed (node, N_("--param inline-unit-growth limit reached"));
    fibheap_delete (heap);
!   free (heap_node);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
--- 558,690 ----
        if (!node->local.inlinable || !node->callers
  	  || node->local.disregard_inline_limits)
  	continue;
+       if (dump_file)
+ 	fprintf (dump_file, "Considering inline candidate %s.\n", cgraph_node_name (node));
  
+       node->global.estimated_growth = INT_MIN;
        if (!cgraph_default_inline_p (node))
  	{
  	  cgraph_set_inline_failed (node,
  	    N_("--param max-inline-insns-single limit reached"));
  	  continue;
  	}
  
!       for (edge = node->callers; edge; edge = edge->next_caller)
! 	if (edge->inline_failed)
! 	  {
! 	    gcc_assert (!edge->aux);
! 	    edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge), edge);
! 	  }
!     }
!   while (overall_insns <= max_insns && (edge = fibheap_extract_min (heap)))
      {
        int old_insns = overall_insns;
+       struct cgraph_node *where;
+       int growth =
+ 	cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
+ 
+       growth -= edge->caller->global.insns;
  
        if (dump_file)
  	{
! 	  fprintf (dump_file, 
! 		   "\nConsidering %s with %i insns to be inlined into %s\n"
! 		   " Estimated growth after inlined into all callees is %+i insns.\n"
! 		   " Estimated badness is %i.\n",
! 		   cgraph_node_name (edge->callee),
! 		   edge->callee->global.insns,
! 		   cgraph_node_name (edge->caller),
! 		   cgraph_estimate_growth (edge->callee),
! 		   cgraph_edge_badness (edge));
! 	  if (edge->count)
! 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
! 	}
!       gcc_assert (edge->aux);
!       edge->aux = NULL;
!       if (!edge->inline_failed)
! 	continue;
! 
!       if (!cgraph_maybe_hot_edge_p (edge) && growth > 0)
! 	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("call is unlikely");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
  	  continue;
  	}
!       if (!cgraph_default_inline_p (edge->callee))
  	{
!           if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				            &edge->inline_failed))
! 	    {
! 	      edge->inline_failed = 
! 		N_("--param max-inline-insns-single limit reached after inlining into the callee");
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! 	    }
! 	  continue;
! 	}
!       if (cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				       &edge->inline_failed))
! 	{
! 	  where = edge->caller;
! 	  if (where->global.inlined_to)
! 	    where = where->global.inlined_to;
! 	  if (!cgraph_decide_recursive_inlining (where))
! 	    continue;
!           update_callee_keys (heap, where, updated_nodes);
! 	}
!       else
! 	{
! 	  if (!cgraph_check_inline_limits (edge->caller, edge->callee,
! 					   &edge->inline_failed))
  	    {
  	      if (dump_file)
! 		fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			 cgraph_node_name (edge->caller), edge->inline_failed);
! 	      continue;
  	    }
+ 	  cgraph_mark_inline_edge (edge);
+          update_callee_keys (heap, edge->callee, updated_nodes);
  	}
+       where = edge->caller;
+       if (where->global.inlined_to)
+ 	where = where->global.inlined_to;
+ 
+       /* Our profitability metric can depend on local properties
+ 	 such as number of inlinable calls and size of the function body.
+ 	 After inlining these properties might change for the function we
+ 	 inlined into (since it's body size changed) and for the functions
+ 	 called by function we inlined (since number of it inlinable callers
+ 	 might change).  */
+       update_caller_keys (heap, where, updated_nodes);
+       bitmap_clear (updated_nodes);
  
!       if (dump_file)
! 	fprintf (dump_file, 
! 		 " Inlined into %s which now has %i insns.\n",
! 		 cgraph_node_name (edge->caller),
! 		 edge->caller->global.insns);
        if (dump_file)
  	fprintf (dump_file, 
  		 " Inlined for a net change of %+i insns.\n",
  		 overall_insns - old_insns);
      }
!   while ((edge = fibheap_extract_min (heap)) != NULL)
!     {
!       gcc_assert (edge->inline_failed && edge->aux);
!       edge->aux = NULL;
!       if (!edge->callee->local.disregard_inline_limits && edge->inline_failed
!           && !cgraph_recursive_inlining_p (edge->caller, edge->callee,
! 				           &edge->inline_failed))
! 	edge->inline_failed = N_("--param inline-unit-growth limit reached");
!     }
    fibheap_delete (heap);
!   BITMAP_XFREE (updated_nodes);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
*************** cgraph_decide_inlining (void)
*** 736,743 ****
    int old_insns = 0;
    int i;
  
    for (node = cgraph_nodes; node; node = node->next)
!     initial_insns += node->local.self_insns;
    overall_insns = initial_insns;
  
    max_insns = ((HOST_WIDEST_INT) overall_insns
--- 700,715 ----
    int old_insns = 0;
    int i;
  
+   timevar_push (TV_INLINE_HEURISTICS);
+   max_count = 0;
    for (node = cgraph_nodes; node; node = node->next)
!     {
!       struct cgraph_edge *e;
!       initial_insns += node->local.self_insns;
!       for (e = node->callees; e; e = e->next_callee)
! 	if (max_count < e->count)
! 	  max_count = e->count;
!     }
    overall_insns = initial_insns;
  
    max_insns = ((HOST_WIDEST_INT) overall_insns
*************** cgraph_decide_inlining (void)
*** 794,804 ****
  
    if (!flag_really_no_inline)
      {
!       /* If we have profiling information, use it to choose inlining candidates.  */
!       if (profile_info)
! 	cgraph_profile_driven_inlining ();
!       else
! 	cgraph_decide_inlining_of_small_functions ();
  
        if (dump_file)
  	fprintf (dump_file, "\nDeciding on functions called once:\n");
--- 766,772 ----
  
    if (!flag_really_no_inline)
      {
!       cgraph_decide_inlining_of_small_functions ();
  
        if (dump_file)
  	fprintf (dump_file, "\nDeciding on functions called once:\n");
*************** cgraph_decide_inlining (void)
*** 875,880 ****
--- 843,849 ----
  	     ncalls_inlined, nfunctions_inlined, initial_insns,
  	     overall_insns);
    free (order);
+   timevar_pop (TV_INLINE_HEURISTICS);
  }
  
  /* Decide on the inlining.  We do so in the topological order to avoid
Index: rtl.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/rtl.h,v
retrieving revision 1.362.2.34.2.15
diff -c -3 -p -r1.362.2.34.2.15 rtl.h
*** rtl.h	3 Dec 2004 03:58:45 -0000	1.362.2.34.2.15
--- rtl.h	5 Dec 2004 16:58:43 -0000
*************** enum reg_note
*** 730,738 ****
    REG_NOTE_MAX
  };
  
- /* The base value for branch probability notes.  */
- #define REG_BR_PROB_BASE  10000
- 
  /* Define macros to extract and insert the reg-note kind in an EXPR_LIST.  */
  #define REG_NOTE_KIND(LINK) ((enum reg_note) GET_MODE (LINK))
  #define PUT_REG_NOTE_KIND(LINK, KIND) \
--- 730,735 ----
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.14.2.35.2.9
diff -c -3 -p -r1.14.2.35.2.9 timevar.def
*** timevar.def	28 Nov 2004 17:59:35 -0000	1.14.2.35.2.9
--- timevar.def	5 Dec 2004 16:58:43 -0000
*************** DEFTIMEVAR (TV_CPP		     , "preprocessin
*** 61,66 ****
--- 61,67 ----
  DEFTIMEVAR (TV_LEX		     , "lexical analysis")
  DEFTIMEVAR (TV_PARSE                 , "parser")
  DEFTIMEVAR (TV_NAME_LOOKUP           , "name lookup")
+ DEFTIMEVAR (TV_INLINE_HEURISTICS     , "inline heuristics")
  DEFTIMEVAR (TV_INTEGRATION           , "integration")
  DEFTIMEVAR (TV_TREE_GIMPLIFY	     , "tree gimplify")
  DEFTIMEVAR (TV_TREE_EH		     , "tree eh")
Index: tree-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-inline.c,v
retrieving revision 1.26.2.83.2.18
diff -c -3 -p -r1.26.2.83.2.18 tree-inline.c
*** tree-inline.c	3 Dec 2004 03:58:48 -0000	1.26.2.83.2.18
--- tree-inline.c	5 Dec 2004 16:58:43 -0000
*************** static GTY (()) varray_type eh_dst;
*** 159,167 ****
  
  static tree declare_return_variable (inline_data *, tree, tree, tree *);
  static tree copy_body_r (tree *, int *, void *);
! static tree copy_cfg_body (inline_data *);
  static tree copy_generic_body (inline_data *);
! static tree copy_body (inline_data *);
  static bool expand_call_inline (tree *, int *, void *);
  static bool gimple_expand_calls_inline (tree *, inline_data *);
  static bool inlinable_function_p (tree);
--- 159,167 ----
  
  static tree declare_return_variable (inline_data *, tree, tree, tree *);
  static tree copy_body_r (tree *, int *, void *);
! static tree copy_cfg_body (inline_data *, gcov_type, int);
  static tree copy_generic_body (inline_data *);
! static tree copy_body (inline_data *, gcov_type, int);
  static bool expand_call_inline (tree *, int *, void *);
  static bool gimple_expand_calls_inline (tree *, inline_data *);
  static bool inlinable_function_p (tree);
*************** copy_body_r (tree *tp, int *walk_subtree
*** 540,556 ****
  	 ...)"); just toss the entire RETURN_EXPR.  */
        if (assignment && TREE_CODE (assignment) == MODIFY_EXPR)
  	{
  	  /* Replace the RETURN_EXPR with (a copy of) the
  	     MODIFY_EXPR hangning underneath.  */
  	  *tp = copy_node (assignment);
  
  	  /* If we're working on CFGs, add an outgoing CFG edge to the
  	     return block.  */
! 	  make_edge (id->copy_basic_block, id->return_block,
! 		     EDGE_FALLTHRU);
  	}
        else /* Else the RETURN_EXPR returns no value.  */
  	{
  	  /* Since we're not returning anything, just delete the
  	     RETURN_EXPR.  This is spooky, as we're altering a
  	     tree_stmt_iterator owned by our caller, who is using
--- 540,561 ----
  	 ...)"); just toss the entire RETURN_EXPR.  */
        if (assignment && TREE_CODE (assignment) == MODIFY_EXPR)
  	{
+ 	  edge new;
+ 
  	  /* Replace the RETURN_EXPR with (a copy of) the
  	     MODIFY_EXPR hangning underneath.  */
  	  *tp = copy_node (assignment);
  
  	  /* If we're working on CFGs, add an outgoing CFG edge to the
  	     return block.  */
! 	  new = make_edge (id->copy_basic_block, id->return_block,
! 		           EDGE_FALLTHRU);
! 	  new->count = id->copy_basic_block->count;
! 	  new->probability = REG_BR_PROB_BASE;
  	}
        else /* Else the RETURN_EXPR returns no value.  */
  	{
+ 	  edge new;
  	  /* Since we're not returning anything, just delete the
  	     RETURN_EXPR.  This is spooky, as we're altering a
  	     tree_stmt_iterator owned by our caller, who is using
*************** copy_body_r (tree *tp, int *walk_subtree
*** 560,567 ****
  
  	  /* If we're working on CFGs, add an outgoing CFG edge to the
  	     return block.  */
! 	  make_edge (id->copy_basic_block, id->return_block,
! 		     EDGE_FALLTHRU);
  	  /* Return something to stop iterating.  */
  	  return (void *)1;
  	}
--- 565,574 ----
  
  	  /* If we're working on CFGs, add an outgoing CFG edge to the
  	     return block.  */
! 	  new = make_edge (id->copy_basic_block, id->return_block,
! 		           EDGE_FALLTHRU);
! 	  new->count = id->copy_basic_block->count;
! 	  new->probability = REG_BR_PROB_BASE;
  	  /* Return something to stop iterating.  */
  	  return (void *)1;
  	}
*************** copy_body_r (tree *tp, int *walk_subtree
*** 708,714 ****
  	      edge = cgraph_edge (id->current_node, old_node);
  
  	      if (edge)
! 	        cgraph_clone_edge (edge, id->node, *tp);
  	    }
  	}
        else if (TREE_CODE (*tp) == RESX_EXPR)
--- 715,721 ----
  	      edge = cgraph_edge (id->current_node, old_node);
  
  	      if (edge)
! 	        cgraph_clone_edge (edge, id->node, *tp, REG_BR_PROB_BASE);
  	    }
  	}
        else if (TREE_CODE (*tp) == RESX_EXPR)
*************** copy_body_r (tree *tp, int *walk_subtree
*** 773,779 ****
     another function.  Walks FN via CFG, returns new fndecl.  */
  
  static tree
! copy_cfg_body (inline_data *id)
  {
    tree callee_fndecl = id->callee;
    tree caller_fndecl = id->caller;
--- 780,786 ----
     another function.  Walks FN via CFG, returns new fndecl.  */
  
  static tree
! copy_cfg_body (inline_data *id, gcov_type count, int frequency)
  {
    tree callee_fndecl = id->callee;
    tree caller_fndecl = id->caller;
*************** copy_cfg_body (inline_data *id)
*** 800,805 ****
--- 807,825 ----
    tree orig_stmt;
    tree_stmt_iterator tsi, tsi_copy, tsi_end;
    int current_bb_index = 0;
+   int count_scale, frequency_scale;
+ 
+   if (ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->count)
+     count_scale = (REG_BR_PROB_BASE * count
+ 		   / ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->count);
+   else
+     count_scale = 1;
+ 
+   if (ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->frequency)
+     frequency_scale = (REG_BR_PROB_BASE * frequency
+ 		       / ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->frequency);
+   else
+     frequency_scale = count_scale;
  
    /* Register specific tree functions.  */
    tree_register_cfg_hooks ();
*************** copy_cfg_body (inline_data *id)
*** 872,882 ****
--- 892,912 ----
    n_edges = 0;
    ENTRY_BLOCK_PTR = ggc_alloc_cleared (sizeof (*ENTRY_BLOCK_PTR));
    ENTRY_BLOCK_PTR->index = ENTRY_BLOCK;
+   ENTRY_BLOCK_PTR->count = (ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->count
+ 		            * count_scale / REG_BR_PROB_BASE);
+   ENTRY_BLOCK_PTR->frequency
+     = (ENTRY_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->frequency
+        * frequency_scale / REG_BR_PROB_BASE);
    EXIT_BLOCK_PTR = ggc_alloc_cleared (sizeof (*EXIT_BLOCK_PTR));
    EXIT_BLOCK_PTR->index = EXIT_BLOCK;
    ENTRY_BLOCK_PTR->next_bb = EXIT_BLOCK_PTR;
    EXIT_BLOCK_PTR->prev_bb = ENTRY_BLOCK_PTR;
    id->copy_basic_block = ENTRY_BLOCK_PTR;
+   EXIT_BLOCK_PTR->count = (EXIT_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->count
+ 		           * count_scale / REG_BR_PROB_BASE);
+   EXIT_BLOCK_PTR->frequency
+     = (EXIT_BLOCK_PTR_FOR_FUNCTION (callee_cfun)->frequency
+        * frequency_scale / REG_BR_PROB_BASE);
  
    /* Standard tree walk doesn't iterate over CFG-built basic_blocks
       yet.  */
*************** copy_cfg_body (inline_data *id)
*** 886,891 ****
--- 916,924 ----
  	 basic_block_info automatically.  */
        id->copy_basic_block = create_basic_block (bb->stmt_list, 
  			    (void*)0, id->copy_basic_block);
+       id->copy_basic_block->count = bb->count * count_scale / REG_BR_PROB_BASE;
+       id->copy_basic_block->frequency = (bb->frequency
+ 		      			 * frequency_scale / REG_BR_PROB_BASE);
        /* Number the blocks.  When we duplicate the edges of this
  	 funciton body, we need an easy way to map edges
  	 from the original body into our newly-copied body.  We could
*************** copy_cfg_body (inline_data *id)
*** 938,943 ****
--- 971,978 ----
        old_src_index = bb->index;
        FOR_EACH_EDGE (old_edge, ei, bb->succs)
  	{
+ 	  edge new;
+ 
  	  old_dest_index = old_edge->dest->index;
  	  /* Ignore edges that reach ENTRY and EXIT blocks
  	     (bb->index < 0).  The first non-ENTRY block will be spliced
*************** copy_cfg_body (inline_data *id)
*** 947,955 ****
  	     (e.g. noreturn functions) should be redirected to the
  	     next exit block.  */
  	  if (old_src_index >= 0 && old_dest_index >= 0)
! 	    make_edge (new_bb,
! 		       VARRAY_BB (new_bb_info, old_dest_index),
! 		       old_edge->flags);
  	}
  
        tsi_end = tsi_last (bb->stmt_list);
--- 982,994 ----
  	     (e.g. noreturn functions) should be redirected to the
  	     next exit block.  */
  	  if (old_src_index >= 0 && old_dest_index >= 0)
! 	    {
! 	      new = make_edge (new_bb,
! 			       VARRAY_BB (new_bb_info, old_dest_index),
! 			       old_edge->flags);
! 	      new->count = old_edge->count * count_scale / REG_BR_PROB_BASE;
! 	      new->probability = old_edge->probability;
! 	    }
  	}
  
        tsi_end = tsi_last (bb->stmt_list);
*************** copy_cfg_body (inline_data *id)
*** 1023,1031 ****
      }
    if (saving_or_cloning)
      {
!       make_edge (ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
!       make_edge (BASIC_BLOCK (last_basic_block-1), EXIT_BLOCK_PTR,
! 		 EDGE_FALLTHRU);
      }
    id->copy_basic_block = (basic_block)0;
    if (!saving_or_cloning)
--- 1062,1084 ----
      }
    if (saving_or_cloning)
      {
!       edge new;
!       int exit_probability = REG_BR_PROB_BASE;
!       gcov_type exit_count = BASIC_BLOCK (last_basic_block - 1)->count;
!       edge_iterator ei;
! 
!       new = make_edge (ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
!       new->probability = REG_BR_PROB_BASE;
!       new->count = ENTRY_BLOCK_PTR->count;
!       FOR_EACH_EDGE (old_edge, ei, BASIC_BLOCK (last_basic_block-1)->succs)
! 	{
! 	  exit_probability -= old_edge->probability;
! 	  exit_count -= old_edge->count;
! 	}
!       new = make_edge (BASIC_BLOCK (last_basic_block-1), EXIT_BLOCK_PTR,
! 		       EDGE_FALLTHRU);
!       new->probability = exit_probability;
!       new->count = exit_count;
      }
    id->copy_basic_block = (basic_block)0;
    if (!saving_or_cloning)
*************** copy_generic_body (inline_data *id)
*** 1067,1080 ****
  }
  
  static tree
! copy_body (inline_data *id)
  {
    tree fndecl = id->callee;
    tree body;
  
    /* If this body has a CFG, walk CFG and copy.  */
    gcc_assert (ENTRY_BLOCK_PTR_FOR_FUNCTION (DECL_STRUCT_FUNCTION (fndecl)));
!   body = copy_cfg_body (id);
  
    return body;
  }
--- 1120,1133 ----
  }
  
  static tree
! copy_body (inline_data *id, gcov_type count, int frequency)
  {
    tree fndecl = id->callee;
    tree body;
  
    /* If this body has a CFG, walk CFG and copy.  */
    gcc_assert (ENTRY_BLOCK_PTR_FOR_FUNCTION (DECL_STRUCT_FUNCTION (fndecl)));
!   body = copy_cfg_body (id, count, frequency);
  
    return body;
  }
*************** expand_call_inline (tree *tp, int *walk_
*** 1939,1944 ****
--- 1992,1998 ----
    struct function *my_cfun;
    edge e_step;
    edge_iterator ei;
+   edge e;
    tree_stmt_iterator test_iter;
    block_stmt_iterator bsi;
    tree tmp_stmt_list;
*************** expand_call_inline (tree *tp, int *walk_
*** 2129,2135 ****
       function in any way before this point, as this CALL_EXPR may be
       a self-referential call; if we're calling ourselves, we need to
       duplicate our body before altering anything.  */
!   dup_fndecl = copy_body (id);
    id->current_node = old_node;
    my_cfun = DECL_STRUCT_FUNCTION (dup_fndecl);
    /* Append copied EH information to that of current function.  */
--- 2183,2189 ----
       function in any way before this point, as this CALL_EXPR may be
       a self-referential call; if we're calling ourselves, we need to
       duplicate our body before altering anything.  */
!   dup_fndecl = copy_body (id, id->oic_basic_block->count, id->oic_basic_block->frequency);
    id->current_node = old_node;
    my_cfun = DECL_STRUCT_FUNCTION (dup_fndecl);
    /* Append copied EH information to that of current function.  */
*************** expand_call_inline (tree *tp, int *walk_
*** 2137,2145 ****
  
    if (arg_inits)
      {
        head_copied_body = create_basic_block (arg_inits, (void*)0, ENTRY_BLOCK_PTR_FOR_FUNCTION (my_cfun));
        set_bb_for_stmt (arg_inits, head_copied_body);
!       make_edge (head_copied_body, head_copied_body->next_bb, EDGE_FALLTHRU);
      }
    else
      head_copied_body = ENTRY_BLOCK_PTR_FOR_FUNCTION (my_cfun)->next_bb;
--- 2191,2204 ----
  
    if (arg_inits)
      {
+       edge new;
        head_copied_body = create_basic_block (arg_inits, (void*)0, ENTRY_BLOCK_PTR_FOR_FUNCTION (my_cfun));
+       head_copied_body->count = id->oic_basic_block->count;
+       head_copied_body->frequency = id->oic_basic_block->frequency;
        set_bb_for_stmt (arg_inits, head_copied_body);
!       new = make_edge (head_copied_body, head_copied_body->next_bb, EDGE_FALLTHRU);
!       new->probability = REG_BR_PROB_BASE;
!       new->count = id->oic_basic_block->count;
      }
    else
      head_copied_body = ENTRY_BLOCK_PTR_FOR_FUNCTION (my_cfun)->next_bb;
*************** expand_call_inline (tree *tp, int *walk_
*** 2160,2166 ****
       callee body will be inserted between these, and the CALL_EXPR
       will be replaced with a reference to the return variable (if
       any).  tsi_split_statement_list_before() loses if the CALL_EXPR
!      is the first statement.  */
    if (!tsi_end_p (test_iter))
      {
        second_half_bb
--- 2219,2226 ----
       callee body will be inserted between these, and the CALL_EXPR
       will be replaced with a reference to the return variable (if
       any).  tsi_split_statement_list_before() loses if the CALL_EXPR
!      is the first statement. 
!      FIXME: Is this hand implementation of split_basic_block?  */
    if (!tsi_end_p (test_iter))
      {
        second_half_bb
*************** expand_call_inline (tree *tp, int *walk_
*** 2177,2182 ****
--- 2237,2247 ----
        first_half_bb->stmt_list = tmp_stmt_list;
        set_bb_for_stmt (first_half_bb->stmt_list, first_half_bb);
      }
+   /* FIXME: The function call might not return each time.  We can scale the
+      other BB frequency by ratio of exit and entry block frequency of inlined
+      function.  */
+   second_half_bb->count = first_half_bb->count;
+   second_half_bb->frequency = first_half_bb->frequency;
  
    /* The existing id->return_block is a just a place to collect the
       edges created from RETURN_EXPRs.  Move those edges, if any, to
*************** expand_call_inline (tree *tp, int *walk_
*** 2220,2226 ****
    /* We'll fix this when we splice in the new body.  */
    first_half_bb->succs = NULL;
  
!   make_edge (first_half_bb, head_copied_body, EDGE_FALLTHRU);
  
    n_edges += n_edges_for_function (my_cfun);
  
--- 2285,2293 ----
    /* We'll fix this when we splice in the new body.  */
    first_half_bb->succs = NULL;
  
!   e = make_edge (first_half_bb, head_copied_body, EDGE_FALLTHRU);
!   e->count = first_half_bb->count;
!   e->probability = REG_BR_PROB_BASE;
  
    n_edges += n_edges_for_function (my_cfun);
  
*************** save_body (tree fn, tree *arg_copy, tree
*** 2561,2567 ****
    /* Actually copy the body, including a new (struct function *) and CFG.
       EH info is also duplicated so its labels point into the copied
       CFG, not the original.  */
!   body = copy_body (&id);
    DECL_STRUCT_FUNCTION (fn)->saved_cfg = DECL_STRUCT_FUNCTION (body)->cfg;
    DECL_STRUCT_FUNCTION (fn)->saved_eh = DECL_STRUCT_FUNCTION (body)->eh;
  
--- 2628,2636 ----
    /* Actually copy the body, including a new (struct function *) and CFG.
       EH info is also duplicated so its labels point into the copied
       CFG, not the original.  */
!   body = copy_body (&id,
!       		    ENTRY_BLOCK_PTR_FOR_FUNCTION (DECL_STRUCT_FUNCTION (fn))->count,
!       		    ENTRY_BLOCK_PTR_FOR_FUNCTION (DECL_STRUCT_FUNCTION (fn))->frequency);
    DECL_STRUCT_FUNCTION (fn)->saved_cfg = DECL_STRUCT_FUNCTION (body)->cfg;
    DECL_STRUCT_FUNCTION (fn)->saved_eh = DECL_STRUCT_FUNCTION (body)->eh;
  
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 1.1.4.122.2.27
diff -c -3 -p -r1.1.4.122.2.27 tree-optimize.c
*** tree-optimize.c	29 Nov 2004 02:39:31 -0000	1.1.4.122.2.27
--- tree-optimize.c	5 Dec 2004 16:58:44 -0000
*************** tree_rest_of_compilation (tree fndecl)
*** 773,779 ****
  	  struct cgraph_edge *e;
  
  	  node = cgraph_node (current_function_decl);
! 	  saved_node = cgraph_clone_node (node);
  	  for (e = saved_node->callees; e; e = e->next_callee)
  	    if (!e->inline_failed)
  	      cgraph_clone_inlined_nodes (e, true);
--- 773,779 ----
  	  struct cgraph_edge *e;
  
  	  node = cgraph_node (current_function_decl);
! 	  saved_node = cgraph_clone_node (node, node->count);
  	  for (e = saved_node->callees; e; e = e->next_callee)
  	    if (!e->inline_failed)
  	      cgraph_clone_inlined_nodes (e, true);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]