[PATCH] New c++ inliner

Fri Jul 13 08:27:00 GMT 2001

Hi,
here's a patch which changes the C++ AST inliner. There are a number of
changes, and a number of pieces of future work indicated. However, the
high lights are
a) no horrible compile time & memory degradation.
b) produced object code is faster and smaller

This algorithm does a bottom up inlining. What I do is
1) do non-inline optimizations on the current function
2) gather a list of all the inlinable call sites in the current
function
3) fully optimize each of those callees. This includes the
step of inlining into the body of the callee. There is a new
flag for FUNCTION_DECLS indicating whether it has already under
gone this optimization step.
4) Once that is done, I perform the inlining of each callee
into the current function -- provided it still meets the inlinability
criteria (it might have got bigger due to inlining into it's body).
5) If any inlining happened, repeat the non-inlining optimizations.

There are a number of issues with this algorithm.

Q) doesn't this mean there are lots of fully expanded AST's hanging
around for inline functions?
A) Yes, but this does not appear to be a problem. The memory usage is
still lower than it was before, and we have a compile time advantage in
that we only optimize the body of each function once, rather than
multiple times for each of its inlined copies.

Q) What about mutual & direct recursion?
A) Recursion is detected in step 3 by keeping a stack of the functions
being optimized. We check for a loop. If one is found, we have to decide
where to break the recursion i.e. which inlinable function do we _not_
inline. I pick the largest function, as that has the least chance
of being inlinable. [Actually, I should tweak that to highest cost]

Q) what other non-inline optimizations do you do?
A) None, I've marked the place where they should go with XXX, and
commented on the sort of thing I'd expect to go there.

Q) How is the inlining cost measured?
A) The inlining cost is based on the size of the function in terms
of AST nodes. There are two parameters, max-inline-ast and
arg-inline-ast if the size of the function is < max-inline-ast +
arg-inline-ast * num_args, we inline it.

Q) Why is there no threshold on the size of the function being inlined
into?
A) This is irrelevant. If a function is worth inlining, it's worth
inlining into a small function just as much as into a big function. N.B.
I tried with such a limit, and it had no significant effect. The code to
do this kind of stuff is still in the patch, but maybe I should remove
it. (the qsort). Also I discovered something that was surprising, but in
hindsight should be expected. If the inlining parameters are correct the
object code gets faster AND smaller.

Q) What about profile information?
A) I had thought this would be important, but see the previous answer
as to why it isn't. The usage information machinery is still in the patch,
but I think perhaps I should remove it (it currently estimates '1' for
any node, anyway).

Q) What about inlining static functions with only one caller?
A) I still do inlining on demand - there is no postponement until the
end of the TU, and therefore no complete call graph information. We
do generate (implicitly) the call graph routed at a particular function,
but not that for the whole TU. So I can answer the question 'what
functions does this function call?', but not 'what functions call this
function?'. So I don't have the information to make that decision.

Q) Doesn't DECL_NUM_STMTS count the number of statements in a
function, not the number of tree nodes.
A) Yes it does, we need to change that. Also, we'll need to recalculate
it when other AST optimizations start mangling the tree.

I've set max-inline-ast to 50 and ast-inline-ast to 8, based on
experiment. I'll present that data next week, as I've not got time to do
it with pretty 3D graphs, before end of play today [this weekend is the
building's electricity supply's maintainance, and all power is off].

These parameters will probably need retuning, once other AST
optimizations go in. I tried other cost functions, these were
1) allow inlining of a function if it is no bigger than X% of the
original caller's size.
2) allow inlining of a function if it will not increase the caller's
size to more than X% of it's original size. Pick smaller functions first.

I also experimented with a zero cost threshold -- functions smaller than
a certain size were presumed to have zero cost. I rejected this as a stop
gap pending other ast optimizations.

Gerald's horrible test case previously required 270MB of core and took
1133 seconds on my machine (192MB memory, so went into swap). Now it
takes about 102MB and 80 seconds at -O3. At -O2 it took about
80MB and 75 seconds. [These numbers were gathered with untuned parameters
that gave more inlining than turned out to be desirable -- tuned
parameters should give better results.]

booted and tested on i686-pc-linux-gnu

nathan

-- 
Dr Nathan Sidwell   ::   http://www.codesourcery.com   ::   CodeSourcery LLC
         'But that's a lie.' - 'Yes it is. What's your point?'
nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org
optimize4.patch

-------------- next part --------------
2001-07-13  Nathan Sidwell  <nathan@codesourcery.com>

	* params.def (PARAM_MAX_INLINE_AST): New parameter.
	(PARAM_ARG_INLINE_AST): New parameter.

doc:
2001-07-13  Nathan Sidwell  <nathan@codesourcery.com>
	* invoke.texi (max-inline-ast, arg-inline-ast): Document parameters.

cp:
2001-07-13  Nathan Sidwell  <nathan@codesourcery.com>

	Change tree optimizer to bottom up inlining.
	* cp-tree.h (lang_decl_flags): Add optimized_p, optimized_partial_p.
	(lang_decl): Remove inlined_fns.
	(DECL_OPTIMIZED_P): New macro.
	(DECL_OPTIMIZED_PARTIAL_P): New macro.
	(DECL_INLINED_FNS): Remove.
	* pt.c (instantiate_decl): If we're given a cloned function,
	return the instantiated clone.
	* decl.c (lang_mark_tree): Remove inlined_fns member.
	* optimize.c (struct call_site_data): New structure.
	(struct inline_data): Remove first_inlined_fn, inlined_fns,
	inlined_stmts. Add first_fn, fn, inlinable_call_sites, usage.
	(remap_decl): Use id->fn. Don't walk DECL_SIZE and
	DECL_SIZE_UNIT. Only copy TREE_TYPE of a variable length
	array.
	(remap_block): Use id->fn.
	(copy_body_r): Allow meeting the return label. Inhibit subwalk
	of remapped local decls.
	(initialize_inlined_parameters): Use id->fn. Don't fold
	argument values.
	(declare_return_variable): Use id->fn.
	(inlinable_function_p): Remove.
	(expand_call_inline): Change parameters, adjust for bottom up.
	(expand_calls_inline): Remove.
	(inlinable_size_p): New function.
	(find_inlinable_call_sites): New function.
	(compare_inlinable_call_sites): Likewise.
	(optimize_inline_calls): Add ID parameter. Adjust.
	(optimize_function_real): New function, broken out of ...
	(optimize_function): ... here. Set up ID.
	(maybe_clone_body): Set up id.fn.
	(dump_function): Return stream, take flag pointer parameter.

Index: params.def
===================================================================
RCS file: /cvs/gcc/egcs/gcc/params.def,v
retrieving revision 1.6
diff -c -3 -p -r1.6 params.def
*** params.def	2001/06/27 14:35:21	1.6
--- params.def	2001/07/12 14:51:18
*************** DEFPARAM(PARAM_MAX_GCSE_PASSES,
*** 79,84 ****
--- 79,97 ----
  	"max-gcse-passes",
  	"The maximum number of passes to make when doing GCSE",
  	1)
+ 
+ /* The maximum number of tree nodes in a function eligible for tree
+    based inlining. */
+ DEFPARAM(PARAM_MAX_INLINE_AST,
+ 	 "max-inline-ast",
+ 	 "The maximum number of tree nodes in a function eligible for inlining",
+ 	 50)
+ /* The number of additional tree nodes allowable per argument. */
+ DEFPARAM (PARAM_ARG_INLINE_AST,
+ 	  "arg-inline-ast",
+ 	  "Additional tree nodes per argument allowed for an inlinable function",
+ 	  8)
+ 
  /*
  Local variables:
  mode:c
Index: doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/egcs/gcc/doc/invoke.texi,v
retrieving revision 1.35
diff -c -3 -p -r1.35 invoke.texi
*** invoke.texi	2001/07/06 07:57:39	1.35
--- invoke.texi	2001/07/12 14:51:23
*************** If an function contains more than this m
*** 3811,3816 ****
--- 3811,3825 ----
  will not be inlined.  This option is precisely equivalent to
  @option{-finline-limit}.

+ @item max-inline-ast @r{(C++ only)}
+ When inlining on abstract syntax trees (AST), this sets the limit on a
+ function's AST size, above which it will not be inlined into its
+ callers.
+ 
+ @item arg-inline-ast @r{(C++ only)}
+ When inlining on abstract syntax trees, this sets the additional
+ size an inlinable function is permitted per argument.
+ 
  @end table
  @end table

Index: cp/cp-tree.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cp/cp-tree.h,v
retrieving revision 1.623
diff -c -3 -p -r1.623 cp-tree.h
*** cp-tree.h	2001/07/06 06:36:47	1.623
--- cp-tree.h	2001/07/12 14:51:24
*************** struct lang_decl_flags
*** 1799,1805 ****
    unsigned global_dtor_p : 1;
    unsigned assignment_operator_p : 1;
    unsigned anticipated_p : 1;
!   /* Three unused bits.  */

    union {
      /* In a FUNCTION_DECL, VAR_DECL, TYPE_DECL, or TEMPLATE_DECL, this
--- 1799,1807 ----
    unsigned global_dtor_p : 1;
    unsigned assignment_operator_p : 1;
    unsigned anticipated_p : 1;
!   unsigned optimized_p : 1;
!   unsigned optimized_partial_p : 1;
!   /* One unused bit.  */

    union {
      /* In a FUNCTION_DECL, VAR_DECL, TYPE_DECL, or TEMPLATE_DECL, this
*************** struct lang_decl
*** 1842,1851 ****
    /* In a FUNCTION_DECL, this is DECL_CLONED_FUNCTION.  */
    tree cloned_function;

-   /* In a FUNCTION_DECL, these are function data which is to be kept
-      as long as FUNCTION_DECL is kept.  */
-   tree inlined_fns;
- 
    union
    {
      tree sorted_fields;
--- 1844,1849 ----
*************** struct lang_decl
*** 1972,1981 ****
  #define DECL_CLONED_FUNCTION(NODE) \
    (DECL_LANG_SPECIFIC (NODE)->cloned_function)

- /* List of FUNCION_DECLs inlined into this function's body.  */
- #define DECL_INLINED_FNS(NODE) \
-   (DECL_LANG_SPECIFIC (NODE)->inlined_fns)
- 
  /* Nonzero if NODE has DECL_DISCRIMINATOR and not DECL_ACCESS.  */
  #define DECL_DISCRIMINATOR_P(NODE)	\
    (TREE_CODE (NODE) == VAR_DECL		\
--- 1970,1975 ----
*************** extern int flag_new_for_scope;
*** 2461,2466 ****
--- 2455,2470 ----
     not yet seen a prototype for that function.  */
  #define DECL_ANTICIPATED(NODE) \
    (DECL_LANG_SPECIFIC (DECL_CHECK (NODE))->decl_flags.anticipated_p)
+ 
+ /* Nonzero if NODE is a FUNCTION_DECL whose DECL_SAVED_TREE has been
+    optimized. */
+ #define DECL_OPTIMIZED_P(NODE) \
+   (DECL_LANG_SPECIFIC (DECL_CHECK (NODE))->decl_flags.optimized_p)
+ 
+ /* Nonzero if NODE is a FUNCTION_DECL whose DECL_SAVED_TREE has been
+    partially optimized (upto the inlining point). */
+ #define DECL_OPTIMIZED_PARTIAL_P(NODE) \
+   (DECL_LANG_SPECIFIC (DECL_CHECK (NODE))->decl_flags.optimized_partial_p)

  /* Record whether a typedef for type `int' was actually `signed int'.  */
  #define C_TYPEDEF_EXPLICITLY_SIGNED(exp) DECL_LANG_FLAG_1 ((exp))
Index: cp/decl.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cp/decl.c,v
retrieving revision 1.795
diff -c -3 -p -r1.795 decl.c
*** decl.c	2001/06/26 18:09:24	1.795
--- decl.c	2001/07/12 14:51:29
*************** lang_mark_tree (t)
*** 14528,14534 ****
  	      ggc_mark_tree (ld->befriending_classes);
  	      ggc_mark_tree (ld->context);
  	      ggc_mark_tree (ld->cloned_function);
- 	      ggc_mark_tree (ld->inlined_fns);
  	      if (TREE_CODE (t) == TYPE_DECL)
  		ggc_mark_tree (ld->u.sorted_fields);
  	      else if (TREE_CODE (t) == FUNCTION_DECL
--- 14528,14533 ----
Index: cp/optimize.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cp/optimize.c,v
retrieving revision 1.74
diff -c -3 -p -r1.74 optimize.c
*** optimize.c	2001/07/02 12:16:57	1.74
--- optimize.c	2001/07/12 14:51:29
*************** Software Foundation, 59 Temple Place - S
*** 45,64 ****
     o Provide heuristics to clamp inlining of recursive template
       calls?  */

  /* Data required for function inlining.  */

  typedef struct inline_data
  {
!   /* A stack of the functions we are inlining.  For example, if we are
!      compiling `f', which calls `g', which calls `h', and we are
!      inlining the body of `h', the stack will contain, `h', followed
!      by `g', followed by `f'.  The first few elements of the stack may
!      contain other functions that we know we should not recurse into,
!      even though they are not directly being inlined.  */
    varray_type fns;
    /* The index of the first element of FNS that really represents an
       inlined function.  */
!   unsigned first_inlined_fn;
    /* The label to jump to when a return statement is encountered.  If
       this value is NULL, then return statements will simply be
       remapped as return statements, rather than as jumps.  */
--- 45,79 ----
     o Provide heuristics to clamp inlining of recursive template
       calls?  */

+ typedef struct call_site_data
+ {
+   tree *site_ptr;		/* Pointer containing the CALL_EXPR
+ 				   ptr */
+   tree callee;			/* Callee. */
+   tree target_expr;             /* Containing TARGET_EXPR. */
+   unsigned in_target_cleanup_p;	/* We're inside a TARGET_EXPR
+ 				   cleanup. */
+   HOST_WIDE_INT usage;		/* Use estimate. */
+ } call_site_data;
+ 
  /* Data required for function inlining.  */

  typedef struct inline_data
  {
!   /* A stack of the functions we are inlining. This is used to detect
!      and inhibit recursive inlining. The first few entries (below
!      first_fn), are functions that are not yet complete (this happens
!      for nested functions). The function we are inlining is
!      VARRAY_TOP_TREE.  */
    varray_type fns;
    /* The index of the first element of FNS that really represents an
       inlined function.  */
!   unsigned first_fn;
!   /* The function we are inlining into. */
!   tree fn;
!   /* Array of pointers to CALL_SITE_DATA for CALL_EXPRs that can be
!      inlined. */
!   varray_type inlinable_call_sites;
    /* The label to jump to when a return statement is encountered.  If
       this value is NULL, then return statements will simply be
       remapped as return statements, rather than as jumps.  */
*************** typedef struct inline_data
*** 71,81 ****
    int in_target_cleanup_p;
    /* A stack of the TARGET_EXPRs that we are currently processing.  */
    varray_type target_exprs;
-   /* A list of the functions current function has inlined.  */
-   varray_type inlined_fns;
-   /* The approximate number of statements we have inlined in the
-      current call stack.  */
-   int inlined_stmts;
    /* We use the same mechanism to build clones that we do to perform
       inlining.  However, there are a few places where we need to
       distinguish between those two situations.  This flag is true if
--- 86,91 ----
*************** typedef struct inline_data
*** 84,89 ****
--- 94,106 ----
    /* Hash table used to prevent walk_tree from visiting the same node
       umpteen million times.  */
    htab_t tree_pruner;
+   /* Relative estimate of number of times we reach this point.
+ 
+      XXX We should divide this by (say) N when doing branches of an
+      N-way if or switch, and multiply by (say) 10 when doing the body
+      of a loop. The tree walker is not sophisticated enough to acheive
+      that yet. Best would be to extract real profile data.  */
+   HOST_WIDE_INT usage;
  } inline_data;

  /* Prototypes.  */
*************** static tree initialize_inlined_parameter
*** 92,112 ****
  static tree declare_return_variable PARAMS ((inline_data *, tree *));
  static tree copy_body_r PARAMS ((tree *, int *, void *));
  static tree copy_body PARAMS ((inline_data *));
- static tree expand_call_inline PARAMS ((tree *, int *, void *));
- static void expand_calls_inline PARAMS ((tree *, inline_data *));
- static int inlinable_function_p PARAMS ((tree, inline_data *));
  static tree remap_decl PARAMS ((tree, inline_data *));
  static void remap_block PARAMS ((tree, tree, inline_data *));
  static void copy_scope_stmt PARAMS ((tree *, int *, inline_data *));
! static void optimize_inline_calls PARAMS ((tree));
  static tree calls_setjmp_r PARAMS ((tree *, int *, void *));
  static void update_cloned_parm PARAMS ((tree, tree));
! static void dump_function PARAMS ((enum tree_dump_index, tree));
! 
! /* The approximate number of instructions per statement.  This number
!    need not be particularly accurate; it is used only to make
!    decisions about when a function is too big to inline.  */
! #define INSNS_PER_STMT (10)

  /* Remap DECL during the copying of the BLOCK tree for the function.  */

--- 109,126 ----
  static tree declare_return_variable PARAMS ((inline_data *, tree *));
  static tree copy_body_r PARAMS ((tree *, int *, void *));
  static tree copy_body PARAMS ((inline_data *));
  static tree remap_decl PARAMS ((tree, inline_data *));
  static void remap_block PARAMS ((tree, tree, inline_data *));
  static void copy_scope_stmt PARAMS ((tree *, int *, inline_data *));
! static void expand_call_inline PARAMS ((call_site_data *, inline_data *));
! static bool inlinable_size_p PARAMS ((tree));
! static tree find_inlinable_call_sites PARAMS ((tree *, int *, void *));
! static int compare_inlinable_call_sites PARAMS ((const void *, const void *));
! static int optimize_inline_calls PARAMS ((tree, inline_data *));
! static void optimize_function_real PARAMS ((tree, inline_data *));
  static tree calls_setjmp_r PARAMS ((tree *, int *, void *));
  static void update_cloned_parm PARAMS ((tree, tree));
! static FILE *dump_function PARAMS ((enum tree_dump_index, tree, int *));

  /* Remap DECL during the copying of the BLOCK tree for the function.  */

*************** remap_decl (decl, id)
*** 132,153 ****
        tree t;

        /* Make a copy of the variable or label.  */
!       t = copy_decl_for_inlining (decl, fn,
! 				  VARRAY_TREE (id->fns, 0));

        /* The decl T could be a dynamic array or other variable size type,
  	 in which case some fields need to be remapped because they may
! 	 contain SAVE_EXPRs.  */
!       walk_tree (&DECL_SIZE (t), copy_body_r, id, NULL);
!       walk_tree (&DECL_SIZE_UNIT (t), copy_body_r, id, NULL);
        if (TREE_TYPE (t) && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
! 	  && TYPE_DOMAIN (TREE_TYPE (t)))
  	{
  	  TREE_TYPE (t) = copy_node (TREE_TYPE (t));
  	  TYPE_DOMAIN (TREE_TYPE (t))
  	    = copy_node (TYPE_DOMAIN (TREE_TYPE (t)));
  	  walk_tree (&TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (t))),
! 		     copy_body_r, id, NULL);
  	}

        if (!DECL_NAME (t) && TREE_TYPE (t)
--- 146,169 ----
        tree t;

        /* Make a copy of the variable or label.  */
!       t = copy_decl_for_inlining (decl, fn, id->fn);

        /* The decl T could be a dynamic array or other variable size type,
  	 in which case some fields need to be remapped because they may
! 	 contain SAVE_EXPRs. We don't need to deal with DECL_SIZE and
! 	 DECL_SIZE_UNIT here, as walk_tree will do that for us because
! 	 the first location of this _DECL will be in a
! 	 DECL_STMT. Walking twice is wrong.  We have to do the type,
! 	 as that won't be covered by walk_tree.  */
        if (TREE_TYPE (t) && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
! 	  && TYPE_DOMAIN (TREE_TYPE (t))
! 	  && !TREE_CONSTANT (TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (t)))))
  	{
  	  TREE_TYPE (t) = copy_node (TREE_TYPE (t));
  	  TYPE_DOMAIN (TREE_TYPE (t))
  	    = copy_node (TYPE_DOMAIN (TREE_TYPE (t)));
  	  walk_tree (&TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (t))),
! 		     copy_body_r, id, id->tree_pruner);
  	}

        if (!DECL_NAME (t) && TREE_TYPE (t)
*************** remap_block (scope_stmt, decls, id)
*** 210,216 ****
        tree old_block;
        tree new_block;
        tree old_var;
-       tree fn;

        /* Make the new block.  */
        old_block = SCOPE_STMT_BLOCK (scope_stmt);
--- 226,231 ----
*************** remap_block (scope_stmt, decls, id)
*** 243,249 ****
  	}
        /* We put the BLOCK_VARS in reverse order; fix that now.  */
        BLOCK_VARS (new_block) = nreverse (BLOCK_VARS (new_block));
-       fn = VARRAY_TREE (id->fns, 0);
        if (id->cloning_p)
  	/* We're building a clone; DECL_INITIAL is still
  	   error_mark_node, and current_binding_level is the parm
--- 258,263 ----
*************** remap_block (scope_stmt, decls, id)
*** 255,264 ****
  	     function into which this block is being inlined.  In
  	     rest_of_compilation we will straighten out the BLOCK tree.  */
  	  tree *first_block;
! 	  if (DECL_INITIAL (fn))
! 	    first_block = &BLOCK_CHAIN (DECL_INITIAL (fn));
  	  else
! 	    first_block = &DECL_INITIAL (fn);
  	  BLOCK_CHAIN (new_block) = *first_block;
  	  *first_block = new_block;
  	}
--- 269,278 ----
  	     function into which this block is being inlined.  In
  	     rest_of_compilation we will straighten out the BLOCK tree.  */
  	  tree *first_block;
! 	  if (DECL_INITIAL (id->fn))
! 	    first_block = &BLOCK_CHAIN (DECL_INITIAL (id->fn));
  	  else
! 	    first_block = &DECL_INITIAL (id->fn);
  	  BLOCK_CHAIN (new_block) = *first_block;
  	  *first_block = new_block;
  	}
*************** copy_body_r (tp, walk_subtrees, data)
*** 358,380 ****
       variables.  We don't want to copy static variables; there's only
       one of those, no matter how many times we inline the containing
       function.  */
!   else if (nonstatic_local_decl_p (*tp) && DECL_CONTEXT (*tp) == fn)
      {
!       tree new_decl;

!       /* Remap the declaration.  */
!       new_decl = remap_decl (*tp, id);
!       my_friendly_assert (new_decl != NULL_TREE, 19991203);
!       /* Replace this variable with the copy.  */
!       STRIP_TYPE_NOPS (new_decl);
!       *tp = new_decl;
!     }
!   else if (nonstatic_local_decl_p (*tp)
! 	   && DECL_CONTEXT (*tp) != VARRAY_TREE (id->fns, 0))
!     my_friendly_abort (0);
    else if (TREE_CODE (*tp) == SAVE_EXPR)
!     remap_save_expr (tp, id->decl_map, VARRAY_TREE (id->fns, 0),
! 		     walk_subtrees);
    else if (TREE_CODE (*tp) == UNSAVE_EXPR)
      /* UNSAVE_EXPRs should not be generated until expansion time.  */
      my_friendly_abort (19991113);
--- 372,397 ----
       variables.  We don't want to copy static variables; there's only
       one of those, no matter how many times we inline the containing
       function.  */
!   else if (nonstatic_local_decl_p (*tp))
      {
!       if (*tp != id->ret_label)
! 	{
! 	  tree new_decl;

! 	  my_friendly_assert (DECL_CONTEXT (*tp) == fn, 20010702);
!       
! 	  /* Remap the declaration.  */
! 	  new_decl = remap_decl (*tp, id);
! 	  my_friendly_assert (new_decl != NULL_TREE, 19991203);
!           /* Replace this variable with the copy.  */
! 	  STRIP_TYPE_NOPS (new_decl);
! 	  *tp = new_decl;
! 	  /* The new decl might be an expression - don't walk it. */
! 	  *walk_subtrees = 0;
! 	}
!     }
    else if (TREE_CODE (*tp) == SAVE_EXPR)
!     remap_save_expr (tp, id->decl_map, id->fn, walk_subtrees);
    else if (TREE_CODE (*tp) == UNSAVE_EXPR)
      /* UNSAVE_EXPRs should not be generated until expansion time.  */
      my_friendly_abort (19991113);
*************** copy_body (id)
*** 436,442 ****
    return body;
  }

! /* Generate code to initialize the parameters of the function at the
     top of the stack in ID from the ARGS (presented as a TREE_LIST).  */

  static tree
--- 453,461 ----
    return body;
  }

! /* ARGS is a TREE_LIST of argument expressions for a call to FN.
!    We must use the expression trees within ARGS directly.
!    Generate code to initialize the parameters of the function at the
     top of the stack in ID from the ARGS (presented as a TREE_LIST).  */

  static tree
*************** initialize_inlined_parameters (id, args,
*** 473,481 ****
  	  && !TREE_ADDRESSABLE (p)
  	  && !TREE_SIDE_EFFECTS (value))
  	{
! 	  /* Simplify the value, if possible.  */
! 	  value = fold (decl_constant_value (value));
! 
  	  /* We can't risk substituting complex expressions.  They
  	     might contain variables that will be assigned to later.
  	     Theoretically, we could check the expression to see if
--- 492,499 ----
  	  && !TREE_ADDRESSABLE (p)
  	  && !TREE_SIDE_EFFECTS (value))
  	{
! 	  my_friendly_assert (TREE_CODE (value) != CALL_EXPR, 20010702);
! 	  
  	  /* We can't risk substituting complex expressions.  They
  	     might contain variables that will be assigned to later.
  	     Theoretically, we could check the expression to see if
*************** initialize_inlined_parameters (id, args,
*** 483,489 ****
  	     read-only, but we don't bother.  */
  	  if (TREE_CONSTANT (value) || TREE_READONLY_DECL_P (value))
  	    {
! 	      /* If this is a declaration, wrap it a NOP_EXPR so that
  		 we don't try to put the VALUE on the list of
  		 BLOCK_VARS.  */
  	      if (DECL_P (value))
--- 501,507 ----
  	     read-only, but we don't bother.  */
  	  if (TREE_CONSTANT (value) || TREE_READONLY_DECL_P (value))
  	    {
! 	      /* If this is a _DECL, wrap it a NOP_EXPR so that
  		 we don't try to put the VALUE on the list of
  		 BLOCK_VARS.  */
  	      if (DECL_P (value))
*************** initialize_inlined_parameters (id, args,
*** 497,503 ****
  	}

        /* Make an equivalent VAR_DECL.  */
!       var = copy_decl_for_inlining (p, fn, VARRAY_TREE (id->fns, 0));
        /* Register the VAR_DECL as the equivalent for the PARM_DECL;
  	 that way, when the PARM_DECL is encountered, it will be
  	 automatically replaced by the VAR_DECL.  */
--- 515,521 ----
  	}

        /* Make an equivalent VAR_DECL.  */
!       var = copy_decl_for_inlining (p, fn, id->fn);
        /* Register the VAR_DECL as the equivalent for the PARM_DECL;
  	 that way, when the PARM_DECL is encountered, it will be
  	 automatically replaced by the VAR_DECL.  */
*************** declare_return_variable (id, use_stmt)
*** 592,598 ****
      }
    /* Otherwise, make an appropriate copy.  */
    else
!     var = copy_decl_for_inlining (result, fn, VARRAY_TREE (id->fns, 0));

    /* Register the VAR_DECL as the equivalent for the RESULT_DECL; that
       way, when the RESULT_DECL is encountered, it will be
--- 610,616 ----
      }
    /* Otherwise, make an appropriate copy.  */
    else
!     var = copy_decl_for_inlining (result, fn, id->fn);

    /* Register the VAR_DECL as the equivalent for the RESULT_DECL; that
       way, when the RESULT_DECL is encountered, it will be
*************** declare_return_variable (id, use_stmt)
*** 614,781 ****
      return NULL_TREE;
  }

! /* Returns non-zero if FN is a function that can be inlined.  */

! static int
! inlinable_function_p (fn, id)
!      tree fn;
       inline_data *id;
  {
!   int inlinable;
! 
!   /* If we've already decided this function shouldn't be inlined,
!      there's no need to check again.  */
!   if (DECL_UNINLINABLE (fn))
!     return 0;
! 
!   /* Assume it is not inlinable.  */
!   inlinable = 0;
! 
!   /* If we're not inlining things, then nothing is inlinable.  */
!   if (!flag_inline_trees)
!     ;
!   /* If the function was not declared `inline', then we don't inline
!      it.  */
!   else if (!DECL_INLINE (fn))
!     ;
!   /* We can't inline varargs functions.  */
!   else if (varargs_function_p (fn))
!     ;
!   /* We can't inline functions that are too big.  */
!   else if (DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS)
!     ;
!   /* All is well.  We can inline this function.  Traditionally, GCC
!      has refused to inline functions using alloca, or functions whose
!      values are returned in a PARALLEL, and a few other such obscure
!      conditions.  We are not equally constrained at the tree level.  */
!   else
!     inlinable = 1;
! 
!   /* Squirrel away the result so that we don't have to check again.  */
!   DECL_UNINLINABLE (fn) = !inlinable;
! 
!   /* Even if this function is not itself too big to inline, it might
!      be that we've done so much inlining already that we don't want to
!      risk inlining any more.  */
!   if ((DECL_NUM_STMTS (fn) + id->inlined_stmts) * INSNS_PER_STMT 
!       > MAX_INLINE_INSNS)
!     inlinable = 0;
! 
!   /* We can inline a template instantiation only if it's fully
!      instantiated.  */
!   if (inlinable
!       && DECL_TEMPLATE_INFO (fn)
!       && TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (fn)))
!     {
!       fn = instantiate_decl (fn, /*defer_ok=*/0);
!       inlinable = !TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (fn));
!     }
! 
!   /* If we don't have the function body available, we can't inline
!      it.  */
!   if (!DECL_SAVED_TREE (fn))
!     inlinable = 0;
! 
!   /* Don't do recursive inlining, either.  We don't record this in
!      DECL_UNINLINABLE; we may be able to inline this function later.  */
!   if (inlinable)
!     {
!       size_t i;
! 
!       for (i = 0; i < VARRAY_ACTIVE_SIZE (id->fns); ++i)
! 	if (VARRAY_TREE (id->fns, i) == fn)
! 	  return 0;
! 
!       if (inlinable && DECL_LANG_SPECIFIC (fn) && DECL_INLINED_FNS (fn))
! 	{
! 	  int j;
! 	  tree inlined_fns = DECL_INLINED_FNS (fn);
! 
! 	  for (j = 0; j < TREE_VEC_LENGTH (inlined_fns); ++j)
! 	    if (TREE_VEC_ELT (inlined_fns, j) == VARRAY_TREE (id->fns, 0))
! 	      return 0;
! 	}
!     }
! 
!   /* Return the result.  */
!   return inlinable;
! }
! 
! /* If *TP is a CALL_EXPR, replace it with its inline expansion.  */
! 
! static tree
! expand_call_inline (tp, walk_subtrees, data)
!      tree *tp;
!      int *walk_subtrees;
!      void *data;
! {
!   inline_data *id;
!   tree t;
    tree expr;
    tree chain;
-   tree fn;
    tree scope_stmt;
    tree use_stmt;
    tree arg_inits;
    tree *inlined_body;
    splay_tree st;
- 
-   /* See what we've got.  */
-   id = (inline_data *) data;
-   t = *tp;
- 
-   /* Recurse, but letting recursive invocations know that we are
-      inside the body of a TARGET_EXPR.  */
-   if (TREE_CODE (*tp) == TARGET_EXPR)
-     {
-       int i, len = first_rtl_op (TARGET_EXPR);
- 
-       /* We're walking our own subtrees.  */
-       *walk_subtrees = 0;
- 
-       /* Push *TP on the stack of pending TARGET_EXPRs.  */
-       VARRAY_PUSH_TREE (id->target_exprs, *tp);
- 
-       /* Actually walk over them.  This loop is the body of
- 	 walk_trees, omitting the case where the TARGET_EXPR
- 	 itself is handled.  */
-       for (i = 0; i < len; ++i)
- 	{
- 	  if (i == 2)
- 	    ++id->in_target_cleanup_p;
- 	  walk_tree (&TREE_OPERAND (*tp, i), expand_call_inline, data,
- 		     id->tree_pruner);
- 	  if (i == 2)
- 	    --id->in_target_cleanup_p;
- 	}
- 
-       /* We're done with this TARGET_EXPR now.  */
-       VARRAY_POP (id->target_exprs);
- 
-       return NULL_TREE;
-     }
- 
-   if (TYPE_P (t))
-     /* Because types were not copied in copy_body, CALL_EXPRs beneath
-        them should not be expanded.  This can happen if the type is a
-        dynamic array type, for example.  */
-     *walk_subtrees = 0;
- 
-   /* From here on, we're only interested in CALL_EXPRs.  */
-   if (TREE_CODE (t) != CALL_EXPR)
-     return NULL_TREE;
- 
-   /* First, see if we can figure out what function is being called.
-      If we cannot, then there is no hope of inlining the function.  */
-   fn = get_callee_fndecl (t);
-   if (!fn)
-     return NULL_TREE;
- 
-   /* Don't try to inline functions that are not well-suited to
-      inlining.  */
-   if (!inlinable_function_p (fn, id))
-     return NULL_TREE;

    /* Set the current filename and line number to the function we are
       inlining so that when we create new _STMT nodes here they get
       line numbers corresponding to the function we are calling.  We
--- 632,666 ----
      return NULL_TREE;
  }

! /* CALL_SITE describes an inlinable function call in the body of
!    ID->FN. Expand that function call inline. Because of the way
!    function calling is expanded, we must be careful not to lose the
!    argument exprs as they may contain as yet unexpanded function
!    calls. */

! static void
! expand_call_inline (call_site, id)
!      call_site_data *call_site;
       inline_data *id;
  {
!   tree t = *call_site->site_ptr;
!   tree fn = call_site->callee;
    tree expr;
    tree chain;
    tree scope_stmt;
    tree use_stmt;
    tree arg_inits;
    tree *inlined_body;
    splay_tree st;

+   my_friendly_assert (TREE_CODE (t) == CALL_EXPR, 20010702);
+   id->fn = VARRAY_TOP_TREE (id->fns);
+   VARRAY_PUSH_TREE (id->fns, fn);
+   
+   id->in_target_cleanup_p = call_site->in_target_cleanup_p;
+   if (call_site->target_expr)
+     VARRAY_PUSH_TREE (id->target_exprs, call_site->target_expr);
+   
    /* Set the current filename and line number to the function we are
       inlining so that when we create new _STMT nodes here they get
       line numbers corresponding to the function we are calling.  We
*************** expand_call_inline (tp, walk_subtrees, d
*** 798,832 ****

    /* Initialize the parameters.  */
    arg_inits = initialize_inlined_parameters (id, TREE_OPERAND (t, 1), fn);
-   /* Expand any inlined calls in the initializers.  Do this before we
-      push FN on the stack of functions we are inlining; we want to
-      inline calls to FN that appear in the initializers for the
-      parameters.  */
-   expand_calls_inline (&arg_inits, id);
    /* And add them to the tree.  */
    STMT_EXPR_STMT (expr) = chainon (STMT_EXPR_STMT (expr), arg_inits);

-   /* Record the function we are about to inline so that we can avoid
-      recursing into it.  */
-   VARRAY_PUSH_TREE (id->fns, fn);
- 
-   /* Record the function we are about to inline if optimize_function
-      has not been called on it yet and we don't have it in the list.  */
-   if (DECL_LANG_SPECIFIC (fn) && !DECL_INLINED_FNS (fn))
-     {
-       int i;
- 
-       for (i = VARRAY_ACTIVE_SIZE (id->inlined_fns) - 1; i >= 0; i--)
- 	if (VARRAY_TREE (id->inlined_fns, i) == fn)
- 	  break;
-       if (i < 0)
- 	VARRAY_PUSH_TREE (id->inlined_fns, fn);
-     }
- 
    /* Return statements in the function body will be replaced by jumps
       to the RET_LABEL.  */
    id->ret_label = build_decl (LABEL_DECL, NULL_TREE, NULL_TREE);
!   DECL_CONTEXT (id->ret_label) = VARRAY_TREE (id->fns, 0);

    /* Create a block to put the parameters in.  We have to do this
       after the parameters have been remapped because remapping
--- 683,695 ----

    /* Initialize the parameters.  */
    arg_inits = initialize_inlined_parameters (id, TREE_OPERAND (t, 1), fn);
    /* And add them to the tree.  */
    STMT_EXPR_STMT (expr) = chainon (STMT_EXPR_STMT (expr), arg_inits);

    /* Return statements in the function body will be replaced by jumps
       to the RET_LABEL.  */
    id->ret_label = build_decl (LABEL_DECL, NULL_TREE, NULL_TREE);
!   DECL_CONTEXT (id->ret_label) = id->fn;

    /* Create a block to put the parameters in.  We have to do this
       after the parameters have been remapped because remapping
*************** expand_call_inline (tp, walk_subtrees, d
*** 886,1010 ****
    /* Replace the call by the inlined body.  Wrap it in an
       EXPR_WITH_FILE_LOCATION so that we'll get debugging line notes
       pointing to the right place.  */
!   chain = TREE_CHAIN (*tp);
!   *tp = build_expr_wfl (expr, DECL_SOURCE_FILE (fn), DECL_SOURCE_LINE (fn),
! 			/*col=*/0);
!   EXPR_WFL_EMIT_LINE_NOTE (*tp) = 1;
!   TREE_CHAIN (*tp) = chain;
    pop_srcloc ();

    /* If the value of the new expression is ignored, that's OK.  We
       don't warn about this for CALL_EXPRs, so we shouldn't warn about
       the equivalent inlined version either.  */
!   TREE_USED (*tp) = 1;

!   /* Our function now has more statements than it did before.  */
!   DECL_NUM_STMTS (VARRAY_TREE (id->fns, 0)) += DECL_NUM_STMTS (fn);
!   id->inlined_stmts += DECL_NUM_STMTS (fn);
! 
!   /* Recurse into the body of the just inlined function.  */
!   expand_calls_inline (inlined_body, id);
    VARRAY_POP (id->fns);
! 
!   /* If we've returned to the top level, clear out the record of how
!      much inlining has been done.  */
!   if (VARRAY_ACTIVE_SIZE (id->fns) == id->first_inlined_fn)
!     id->inlined_stmts = 0;

!   /* Don't walk into subtrees.  We've already handled them above.  */
!   *walk_subtrees = 0;

!   /* Keep iterating.  */
!   return NULL_TREE;
  }

! /* Walk over the entire tree *TP, replacing CALL_EXPRs with inline
!    expansions as appropriate.  */

! static void
! expand_calls_inline (tp, id)
       tree *tp;
!      inline_data *id;
  {
!   /* Search through *TP, replacing all calls to inline functions by
!      appropriate equivalents.  Use walk_tree in no-duplicates mode
!      to avoid exponential time complexity.  (We can't just use
!      walk_tree_without_duplicates, because of the special TARGET_EXPR
!      handling in expand_calls.  The hash table is set up in
!      optimize_function.  */
!   walk_tree (tp, expand_call_inline, id, id->tree_pruner);
! }

! /* Expand calls to inline functions in the body of FN.  */

! static void
! optimize_inline_calls (fn)
!      tree fn;
! {
!   inline_data id;
!   tree prev_fn;
!   struct saved_scope *s;

!   /* Clear out ID.  */
!   memset (&id, 0, sizeof (id));

!   /* Don't allow recursion into FN.  */
!   VARRAY_TREE_INIT (id.fns, 32, "fns");
!   VARRAY_PUSH_TREE (id.fns, fn);
!   /* Or any functions that aren't finished yet.  */
!   prev_fn = NULL_TREE;
!   if (current_function_decl)
      {
!       VARRAY_PUSH_TREE (id.fns, current_function_decl);
!       prev_fn = current_function_decl;
      }
!   for (s = scope_chain; s; s = s->prev)
!     if (s->function_decl && s->function_decl != prev_fn)
!       {
! 	VARRAY_PUSH_TREE (id.fns, s->function_decl);
! 	prev_fn = s->function_decl;
!       }

!   /* Create the stack of TARGET_EXPRs.  */
!   VARRAY_TREE_INIT (id.target_exprs, 32, "target_exprs");

!   /* Create the list of functions this call will inline.  */
!   VARRAY_TREE_INIT (id.inlined_fns, 32, "inlined_fns");

!   /* Keep track of the low-water mark, i.e., the point where the first
!      real inlining is represented in ID.FNS.  */
!   id.first_inlined_fn = VARRAY_ACTIVE_SIZE (id.fns);

!   /* Replace all calls to inline functions with the bodies of those
!      functions.  */
!   id.tree_pruner = htab_create (37, htab_hash_pointer,
! 				htab_eq_pointer, NULL);
!   expand_calls_inline (&DECL_SAVED_TREE (fn), &id);

!   /* Clean up.  */
!   htab_delete (id.tree_pruner);
!   VARRAY_FREE (id.fns);
!   VARRAY_FREE (id.target_exprs);
!   if (DECL_LANG_SPECIFIC (fn))
      {
!       tree ifn = make_tree_vec (VARRAY_ACTIVE_SIZE (id.inlined_fns));

!       memcpy (&TREE_VEC_ELT (ifn, 0), &VARRAY_TREE (id.inlined_fns, 0),
! 	      VARRAY_ACTIVE_SIZE (id.inlined_fns) * sizeof (tree));
!       DECL_INLINED_FNS (fn) = ifn;
      }
!   VARRAY_FREE (id.inlined_fns);

!   dump_function (TDI_inlined, fn);
  }

! /* Optimize the body of FN. */

! void
! optimize_function (fn)
       tree fn;
  {
!   dump_function (TDI_original, fn);

    /* While in this function, we may choose to go off and compile
       another function.  For example, we might instantiate a function
--- 749,1110 ----
    /* Replace the call by the inlined body.  Wrap it in an
       EXPR_WITH_FILE_LOCATION so that we'll get debugging line notes
       pointing to the right place.  */
!   chain = TREE_CHAIN (*call_site->site_ptr);
!   *call_site->site_ptr = build_expr_wfl (expr, DECL_SOURCE_FILE (fn),
! 					 DECL_SOURCE_LINE (fn), /*col=*/0);
!   EXPR_WFL_EMIT_LINE_NOTE (*call_site->site_ptr) = 1;
!   TREE_CHAIN (*call_site->site_ptr) = chain;
    pop_srcloc ();

    /* If the value of the new expression is ignored, that's OK.  We
       don't warn about this for CALL_EXPRs, so we shouldn't warn about
       the equivalent inlined version either.  */
!   TREE_USED (*call_site->site_ptr) = 1;

!   id->in_target_cleanup_p = 0;
!   if (call_site->target_expr)
!     VARRAY_POP (id->target_exprs);
    VARRAY_POP (id->fns);
!   id->fn = NULL_TREE;
! }

! /* Return true if FN is inlineable considering its size. */

! static bool
! inlinable_size_p (fn)
!      tree fn;
! {
!   return DECL_NUM_STMTS (fn)
!     < (PARAM_VALUE (PARAM_MAX_INLINE_AST)
!        + (PARAM_VALUE (PARAM_ARG_INLINE_AST)
! 	  * type_num_arguments (TREE_TYPE (fn))));
  }

! /* Called via walk_trees, DATA is an inline_data. If *TP is a call to
!    a potentially inlinable function, remember it. Also deal with tracking
!    TARGET_EXPRs.  */

! static tree
! find_inlinable_call_sites (tp, walk_subtrees, data)
       tree *tp;
!      int *walk_subtrees;
!      void *data;
  {
!   inline_data *id = (inline_data *) data;
!   tree t = *tp;
!   tree fn;
!   call_site_data *call_site;
!   
!   if (TREE_CODE (*tp) == TARGET_EXPR)
!     {
!       /* Recurse, but letting recursive invocations know that we are
!      	 inside the body of a TARGET_EXPR.  */
!       int i, len = first_rtl_op (TARGET_EXPR);

!       /* Push T on the stack of pending TARGET_EXPRs.  */
!       VARRAY_PUSH_TREE (id->target_exprs, t);

!       /* Actually walk over them.  This loop is the body of
! 	 walk_trees, omitting the case where the TARGET_EXPR
! 	 itself is handled.  */
!       for (i = 0; i < len; ++i)
! 	{
! 	  if (i == 2)
! 	    ++id->in_target_cleanup_p;
! 	  walk_tree (&TREE_OPERAND (t, i), find_inlinable_call_sites, data,
! 		     id->tree_pruner);
! 	  if (i == 2)
! 	    --id->in_target_cleanup_p;
! 	}
! 
!       /* We're done with this TARGET_EXPR now.  */
!       VARRAY_POP (id->target_exprs);
! 
!       /* We walked our own subtrees. */
!       *walk_subtrees = 0;
! 
!       return NULL_TREE;
!     }
! 
!   if (TYPE_P (t))
!     /* Because types were not copied in copy_body, CALL_EXPRs beneath
!        them should not be expanded.  This can happen if the type is a
!        dynamic array type, for example.  */
!     *walk_subtrees = 0;
! 
!   /* From here on, we're only interested in CALL_EXPRs.  */
!   if (TREE_CODE (t) != CALL_EXPR)
!     return NULL_TREE;
! 
!   /* First, see if we can figure out what function is being called.
!      If we cannot, then there is no hope of inlining the function.  */
!   fn = get_callee_fndecl (t);
!   if (!fn)
!     return NULL_TREE;
! 
!   if (DECL_UNINLINABLE (fn))
!     return NULL_TREE;

!   if (!DECL_INLINE (fn))
!     /* Not an inline function. */
!     return NULL_TREE;

!   /* We can inline a template instantiation only if it's fully
!      instantiated.  */
!   if (DECL_TEMPLATE_INFO (fn)
!       && TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (fn)))
      {
!       fn = instantiate_decl (fn, /*defer_ok=*/0);
!       if (TI_PENDING_TEMPLATE_FLAG (DECL_TEMPLATE_INFO (fn)))
! 	return NULL_TREE;
      }
! 
!   /* If we don't have the function body available, we can't inline
!      it.  */
!   if (!DECL_SAVED_TREE (fn))
!     return NULL_TREE;
! 
!   /* We can't check yet if the function cannot be inlined due to size,
!      because it might not have been optimized. That can make it
!      smaller in some cases. */

!   /* Remember this call site. */
!   call_site = xmalloc (sizeof (call_site_data));
!   call_site->site_ptr = tp;
!   call_site->callee = fn;
!   call_site->in_target_cleanup_p = id->in_target_cleanup_p;
!   call_site->target_expr = (VARRAY_ACTIVE_SIZE (id->target_exprs)
! 			    ? VARRAY_TOP_TREE (id->target_exprs) : NULL_TREE);
!   call_site->usage = id->usage;

!   VARRAY_PUSH_GENERIC_PTR (id->inlinable_call_sites, call_site);
!   
!   return NULL_TREE;
! }

! /* Called via qsort. A_ and B_ are pointers into an array of to
!    call_site_data pointers.  We want the largest, most unused function
!    at the bottom and the smallest most used function at the top.
!    Return <0 if A_ better than B_, >0 if B_ better than A_. If they
!    are equal first order by function size then order by function. That
!    way equally weighted calls to the same function will be
!    consecutive.
! 
!    We treat the usage information as a dynamic count to weight the
!    function's statement count. If the dynamic count has a large error
!    margin, we want to reduce its impact by moving it towards its
!    initial value. I.e. usage' = usage_init * (usage/usage_init)^(1/N),
!    where N gets bigger the more uncertain the estimate is.  */

! static int
! compare_inlinable_call_sites (a_, b_)
!      const void *a_;
!      const void *b_;
! {
!   call_site_data *a = *(call_site_data **)a_;
!   call_site_data *b = *(call_site_data **)b_;
!   HOST_WIDE_INT ua = DECL_NUM_STMTS (a->callee) * b->usage;
!   HOST_WIDE_INT ub = DECL_NUM_STMTS (b->callee) * a->usage;
! 
!   /* Order by dynamic size. */
!   if (ua - ub)
!     return ua - ub;
!   
!   /* Ordered by static size. */
!   ua = DECL_NUM_STMTS (a->callee);
!   ub = DECL_NUM_STMTS (b->callee);
!   if (ua - ub)
!     return ua - ub;
! 
!   /* Ordered (arbitrarily) by uid, don't use the address as that is
!      essentially random, the UID indicates something about source ordering. */
!   ua = DECL_UID (a->callee);
!   ub = DECL_UID (b->callee);
!   return ua - ub;
! }

! /* Expand calls to inline functions in the body of FN.  We do a bottom
!    up inlining, the idea being that expanding already optimized
!    functions is better than top down inlining of unoptimized
!    functions.  This is more memory intensive, as we have the expanded
!    bodies of inlinable functions lying about. The approach taken is to
!    walk the body of FN, collecting the inlineable function call
!    sites. We optimize each inlinable function - so we have a good
!    estimate of their sizes. That optimization will do inlining into
!    the body of those functions. We have to protect that inlining from
!    recursion -- see the check at the start of
!    optimize_function_real. We sort the inlinable call sites by size &
!    frequency.  Then we perform inlining from smallest to largest,
!    stopping when we're about to exceed the inline statement limit.
!    Returns a count of call sites we inlined.  */
! 
! static int
! optimize_inline_calls (fn, id)
!      tree fn;
!      inline_data *id;
! {
!   unsigned first_inlinable = VARRAY_ACTIVE_SIZE (id->inlinable_call_sites);
!   unsigned ix;
!   int flags;
!   int count = 0;
!   FILE *stream;
!   HOST_WIDE_INT orig_stmts, stmts;
!   
!   VARRAY_PUSH_TREE (id->fns, fn);
!   
!   /* Locate all the inlinable call sites. */
!   id->usage = 65536;
!   id->tree_pruner = htab_create (37, htab_hash_pointer,
! 	 			 htab_eq_pointer, NULL);
!   walk_tree (&DECL_SAVED_TREE (fn), &find_inlinable_call_sites,
! 	     id, id->tree_pruner);
!   htab_delete (id->tree_pruner);
!   id->tree_pruner = 0;
! 
!   /* Make sure each is optimized. */
!   for (ix = first_inlinable;
!        ix != VARRAY_ACTIVE_SIZE (id->inlinable_call_sites);
!        ix++)
      {
!       call_site_data *call_site =
! 	(call_site_data *)VARRAY_GENERIC_PTR (id->inlinable_call_sites, ix);

!       optimize_function_real (call_site->callee, id);
      }
!   if (DECL_OPTIMIZED_P (fn))
!     /* We recursively optimized ourselves.  This happens when we
!        mutually recurse via some function that is larger than us.  */
!     goto cleanup;
! 
!   /* Sort them from best to worst. */
!   if (first_inlinable != VARRAY_ACTIVE_SIZE (id->inlinable_call_sites))
!     qsort (&VARRAY_GENERIC_PTR (id->inlinable_call_sites, first_inlinable),
! 	   VARRAY_ACTIVE_SIZE (id->inlinable_call_sites) - first_inlinable,
! 	   sizeof (PTR), &compare_inlinable_call_sites);
! 
!   count = 0;
!   stmts = orig_stmts = DECL_NUM_STMTS (fn);

!   /* Now do the actual inlining.  */
!   for (ix = first_inlinable;
!        ix != VARRAY_ACTIVE_SIZE (id->inlinable_call_sites);
!        ix++)
!     {
!       call_site_data *call_site = (call_site_data *) VARRAY_GENERIC_PTR
! 					(id->inlinable_call_sites, ix);
!       
!       if (!DECL_OPTIMIZED_P (call_site->callee)
! 	  || DECL_UNINLINABLE (call_site->callee))
! 	/* We failed to optimize this, or we discovered that it is
! 	   uninlinable. */
! 	continue;
! 
!       if (!inlinable_size_p (call_site->callee))
! 	break;
!       
!       stmts += DECL_NUM_STMTS (call_site->callee);
!       expand_call_inline (call_site, id);
!       count++;
!     }
!       
!   stream = dump_function (TDI_inlined, fn, &flags);
!   if (stream)
!     {
!       fprintf (stream, "\n");
!       fprintf (stream, ";; expanded from "
! 	       HOST_WIDE_INT_PRINT_DEC " to "
! 	       HOST_WIDE_INT_PRINT_DEC " statements\n",
! 	       DECL_NUM_STMTS (fn), stmts);
!       
!       if (!(flags & TDF_SLIM))
! 	{
! 	  unsigned jx;
!       
! 	  for (jx = first_inlinable;
! 	       jx != VARRAY_ACTIVE_SIZE (id->inlinable_call_sites);
! 	       jx++)
! 	    {
! 	      call_site_data *call_site = (call_site_data *) VARRAY_GENERIC_PTR
! 						(id->inlinable_call_sites, jx);
! 	      const char *reason = NULL;
! 	      
! 	      if (jx >= ix)
! 		reason = "too big";
! 	      else if (!DECL_OPTIMIZED_P (call_site->callee))
! 		reason = "not optimized";
! 	      else if (DECL_UNINLINABLE (call_site->callee))
! 		reason = "not inlinable";
! 	  
! 	      fprintf (stream, ";; %sinlined %s ", reason ? "not " : "",
! 		       decl_as_string (call_site->callee,
! 				       TFF_DECL_SPECIFIERS));
! 	      fprintf (stream, "(%s) " HOST_WIDE_INT_PRINT_DEC,
! 		       decl_as_string
! 		       		(DECL_ASSEMBLER_NAME (call_site->callee), 0),
! 		       DECL_NUM_STMTS (call_site->callee));
! 	      if (reason)
! 		fprintf (stream , " %s", reason);
! 	      fprintf (stream, "\n");
! 	    }
! 	}
!       dump_end (TDI_inlined, stream);
!     }
!   DECL_NUM_STMTS (fn) = stmts;
!   
! cleanup:;
!   while (VARRAY_ACTIVE_SIZE (id->inlinable_call_sites) != first_inlinable)
!     {
!       free (VARRAY_TOP_GENERIC_PTR (id->inlinable_call_sites));
!       VARRAY_POP (id->inlinable_call_sites);
!     }
!   
!   VARRAY_POP (id->fns);
! 
!   return count;
  }

! /* Optimize the body of FN. The function stack in ID is used to detect
!    inlining recursion. When we encounter that, we skip the
!    optimization of the largest function in the loop.  */

! static void
! optimize_function_real (fn, id)
       tree fn;
+      inline_data *id;
  {
!   unsigned ix;
!   HOST_WIDE_INT biggest = 0;
!   
!   my_friendly_assert ((!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn)
! 		       && !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (fn))
! 		      || DECL_CLONED_FUNCTION_P (fn), 20010702);
! 
!   if (DECL_OPTIMIZED_P (fn))
!     /* We've already done this function. */
!     return;
!   
!   for (ix = VARRAY_ACTIVE_SIZE (id->fns); ix--;)
!     {
!       tree nested_fn = VARRAY_TREE (id->fns, ix);
!       
!       if (biggest < DECL_NUM_STMTS (nested_fn))
! 	biggest = DECL_NUM_STMTS (nested_fn);
!       if (fn == nested_fn)
! 	{
! 	  if (ix < id->first_fn)
! 	    /* This is an incomplete function. */
! 	    return;
! 	  if (biggest == DECL_NUM_STMTS (fn))
! 	    /* We have mutual recursion, and this is the biggest
!        	       function in the recursive loop. This stands the least
!        	       chance of being inlined within the loop, so break the
!        	       recursion here. */
! 	    return;
! 
! 	  /* We _do_ want to optimize now. */
! 	  break;
! 	}
!     }

    /* While in this function, we may choose to go off and compile
       another function.  For example, we might instantiate a function
*************** optimize_function (fn)
*** 1017,1034 ****
       of the function.  */
    ++function_depth;

    if (flag_inline_trees
        /* We do not inline thunks, as (a) the backend tries to optimize
           the call to the thunkee, (b) tree based inlining breaks that
           optimization, (c) virtual functions are rarely inlineable,
           and (d) ASM_OUTPUT_MI_THUNK is there to DTRT anyway.  */
!       && !DECL_THUNK_P (fn))
!     optimize_inline_calls (fn);

!   /* Undo the call to ggc_push_context above.  */
    --function_depth;

!   dump_function (TDI_optimized, fn);
  }

  /* Called from calls_setjmp_p via walk_tree.  */
--- 1117,1220 ----
       of the function.  */
    ++function_depth;

+   
+   if (!DECL_OPTIMIZED_PARTIAL_P (fn))
+     {
+       dump_function (TDI_original, fn, NULL);
+       
+       /* XXX Do preliminary optimization here, like constant
+ 	 propagation, unreachable code elimination, return value
+ 	 optimization, etc. */
+ 
+       /* We can't inline varargs functions. (Well actually we could if
+      	 they never refer to the varadic arguments. We could have
+      	 checked for that in the preceding optimizations.)  */
+       if (varargs_function_p (fn))
+ 	DECL_UNINLINABLE (fn) = 1;
+   
+       DECL_OPTIMIZED_PARTIAL_P (fn) = 1;
+     }
+ 
+   if (VARRAY_ACTIVE_SIZE (id->fns) > id->first_fn)
+     {
+       /* We're optimizing this function because we want to inline it
+          into the function at the top of id->fns.  Check our current
+          size does not prevent that.  If it does, we abort the
+          optimization now, because it is unlikely that this function
+          will get smaller when inlining into its body. Of course,
+          we'll do inline expansion in this function when we come to
+          really emit it. */
+       if (!inlinable_size_p (fn))
+ 	goto cleanup;
+     }
+ 
+   
    if (flag_inline_trees
        /* We do not inline thunks, as (a) the backend tries to optimize
           the call to the thunkee, (b) tree based inlining breaks that
           optimization, (c) virtual functions are rarely inlineable,
           and (d) ASM_OUTPUT_MI_THUNK is there to DTRT anyway.  */
!       && !DECL_THUNK_P (fn)
!       && optimize_inline_calls (fn, id))
!     {
!       /* We managed to inline something. */
!       
!       /* XXX Repeat some optimizations, like constant propagation,
! 	 unreachable code elimination, etc. */
!     }
! 
!   if (!inlinable_size_p (fn))
!     DECL_UNINLINABLE (fn) = 1;
!   
!   DECL_OPTIMIZED_P (fn) = 1;
!   dump_function (TDI_optimized, fn, NULL);

! cleanup:
    --function_depth;
+ }
+ 
+ /* Optimize the body of FN. This will cause the optimization of any
+    inlinable calls made in FN (recursively).  */
+ 
+ void
+ optimize_function (fn)
+      tree fn;
+ {
+   inline_data id;
+   tree prev_fn;
+   struct saved_scope *s;
+   
+   /* Clear out ID.  */
+   memset (&id, 0, sizeof (id));
+   VARRAY_TREE_INIT (id.fns, 32, "fns");
+   VARRAY_TREE_INIT (id.target_exprs, 32, "target_exprs");
+ 
+   /* Don't allow optimization of functions that aren't finished yet.  */
+   prev_fn = NULL_TREE;
+   if (current_function_decl && current_function_decl != fn)
+     {
+       VARRAY_PUSH_TREE (id.fns, current_function_decl);
+       prev_fn = current_function_decl;
+     }
+   for (s = scope_chain; s; s = s->prev)
+     if (s->function_decl && s->function_decl != prev_fn)
+       {
+ 	VARRAY_PUSH_TREE (id.fns, s->function_decl);
+ 	prev_fn = s->function_decl;
+       }
+   /* Keep track of the low-water mark, i.e., the point where the first
+      real inlining is represented in ID.FNS.  */
+   id.first_fn = VARRAY_ACTIVE_SIZE (id.fns);
+ 
+   /* Create the list of call sites we can inline inside this function.  */
+   VARRAY_GENERIC_PTR_INIT (id.inlinable_call_sites, 32,
+ 			   "inlinable_call_sites");
+ 
+   optimize_function_real (fn, &id);

!   VARRAY_FREE (id.target_exprs);
!   VARRAY_FREE (id.inlinable_call_sites);
!   VARRAY_FREE (id.fns);
  }

  /* Called from calls_setjmp_p via walk_tree.  */
*************** maybe_clone_body (fn)
*** 1165,1170 ****
--- 1351,1357 ----
        memset (&id, 0, sizeof (id));
        VARRAY_TREE_INIT (id.fns, 2, "fns");
        VARRAY_PUSH_TREE (id.fns, clone);
+       id.fn = clone;
        VARRAY_PUSH_TREE (id.fns, fn);

        /* Cloning is treated slightly differently from inlining.  Set
*************** maybe_clone_body (fn)
*** 1245,1258 ****
    return 1;
  }

! /* Dump FUNCTION_DECL FN as tree dump PHASE. */

! static void
! dump_function (phase, fn)
       enum tree_dump_index phase;
       tree fn;
  {
!   FILE *stream;
    int flags;

    stream = dump_begin (phase, &flags);
--- 1432,1449 ----
    return 1;
  }

! /* Dump FUNCTION_DECL FN as tree dump PHASE.  If FLAG_PTR is non-NULL,
!    then return any user supplied flags in there and return the open
!    stream after dumping the function.  The caller is expected to
!    append more information and then call dump_end. */

! static FILE *
! dump_function (phase, fn, flag_ptr)
       enum tree_dump_index phase;
       tree fn;
+      int *flag_ptr;
  {
!   FILE *stream = NULL;
    int flags;

    stream = dump_begin (phase, &flags);
*************** dump_function (phase, fn)
*** 1264,1271 ****
  	       decl_as_string (DECL_ASSEMBLER_NAME (fn), 0));
        fprintf (stream, ";; enabled by -%s\n", dump_flag_name (phase));
        fprintf (stream, "\n");
!       
        dump_node (fn, TDF_SLIM | flags, stream);
!       dump_end (phase, stream);
      }
  }
--- 1455,1469 ----
  	       decl_as_string (DECL_ASSEMBLER_NAME (fn), 0));
        fprintf (stream, ";; enabled by -%s\n", dump_flag_name (phase));
        fprintf (stream, "\n");
! 
        dump_node (fn, TDF_SLIM | flags, stream);
!       if (flag_ptr)
! 	*flag_ptr = flags;
!       else
! 	{
! 	  dump_end (phase, stream);
! 	  stream = NULL;
! 	}
      }
+   return stream;
  }
Index: cp/pt.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cp/pt.c,v
retrieving revision 1.538
diff -c -3 -p -r1.538 pt.c
*** pt.c	2001/07/06 06:36:46	1.538
--- pt.c	2001/07/12 14:51:32
*************** instantiate_decl (d, defer_ok)
*** 9732,9737 ****
--- 9732,9738 ----
    tree code_pattern;
    tree spec;
    tree gen_tmpl;
+   tree clone_name = NULL_TREE;
    int pattern_defined;
    int line = lineno;
    int need_push;
*************** instantiate_decl (d, defer_ok)
*** 9745,9751 ****
    /* Don't instantiate cloned functions.  Instead, instantiate the
       functions they cloned.  */
    if (TREE_CODE (d) == FUNCTION_DECL && DECL_CLONED_FUNCTION_P (d))
!     d = DECL_CLONED_FUNCTION (d);

    if (DECL_TEMPLATE_INSTANTIATED (d))
      /* D has already been instantiated.  It might seem reasonable to
--- 9746,9756 ----
    /* Don't instantiate cloned functions.  Instead, instantiate the
       functions they cloned.  */
    if (TREE_CODE (d) == FUNCTION_DECL && DECL_CLONED_FUNCTION_P (d))
!     {
!       /* Remember the clone so we can locate the instantiated clone. */
!       clone_name = DECL_NAME (d);
!       d = DECL_CLONED_FUNCTION (d);
!     }

    if (DECL_TEMPLATE_INSTANTIATED (d))
      /* D has already been instantiated.  It might seem reasonable to
*************** instantiate_decl (d, defer_ok)
*** 9753,9759 ****
         stop here.  But when an explicit instantiation is deferred
         until the end of the compilation, DECL_EXPLICIT_INSTANTIATION
         is set, even though we still need to do the instantiation.  */
!     return d;

    /* If we already have a specialization of this declaration, then
       there's no reason to instantiate it.  Note that
--- 9758,9764 ----
         stop here.  But when an explicit instantiation is deferred
         until the end of the compilation, DECL_EXPLICIT_INSTANTIATION
         is set, even though we still need to do the instantiation.  */
!     goto find_clone;

    /* If we already have a specialization of this declaration, then
       there's no reason to instantiate it.  Note that
*************** instantiate_decl (d, defer_ok)
*** 9763,9773 ****
    gen_tmpl = most_general_template (tmpl);
    spec = retrieve_specialization (gen_tmpl, args);
    if (spec != NULL_TREE && DECL_TEMPLATE_SPECIALIZATION (spec))
!     return spec;

    /* This needs to happen before any tsubsting.  */
    if (! push_tinst_level (d))
!     return d;

    timevar_push (TV_PARSE);

--- 9768,9781 ----
    gen_tmpl = most_general_template (tmpl);
    spec = retrieve_specialization (gen_tmpl, args);
    if (spec != NULL_TREE && DECL_TEMPLATE_SPECIALIZATION (spec))
!     {
!       d = spec;
!       goto find_clone;
!     }

    /* This needs to happen before any tsubsting.  */
    if (! push_tinst_level (d))
!     goto find_clone;

    timevar_push (TV_PARSE);

*************** out:
*** 9978,9983 ****
--- 9986,10007 ----

    timevar_pop (TV_PARSE);

+ find_clone:
+   if (clone_name)
+     {
+       tree d_clone;
+ 
+       for (d_clone = TREE_CHAIN (d);
+ 	   d_clone && DECL_CLONED_FUNCTION_P (d_clone);
+ 	   d_clone = TREE_CHAIN (d_clone))
+ 	if (DECL_NAME (d_clone) == clone_name)
+ 	  {
+ 	    d = d_clone;
+ 	    break;
+ 	  }
+       my_friendly_assert (DECL_CLONED_FUNCTION_P (d), 20010702);
+     }
+   
    return d;
  }