This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Please try out new inlining heuristics


Hi,
I am attaching combined unit-at-a-time/new inlining patch relative to
the current CVS.  I tested it with SPEC2000 and the results are
positive.  GCC now bootstraps 2% faster with unit-at-a-time enabled by
default, integer and fortran benchmarks are unaffected, while eon code
size is reduced by 14% and it gets 3% faster. I also got noticeable
compile time speedups on some C++ sources as reported by previous
emails.

Also the compilation time scales lineraly with inlining limit and
performance  slightly improves until it hits cache limits.  This is as
it should be :)

However Gerald has kindly tested my patch on his C++ template heavy code
and results are not as great:

                 Builde time:   Size:    Stripped:

   mainline              5:21   2327839  1073852
   +Jan's patch          3:48   2077397   774396
   +Jan's with "600"     4:02   2064289   819580
   +Jan's with "1000"    4:20   2097513   873852

                        mainline      default         600          1000
                    +-------------+-------------+-------------+--------------+
      STRATCOMP1-ALL|  6.32 (0.02)|  6.49 (0.01)|  6.35 (0.01)|  6.23 (0.01) |
   STRATCOMP-770.2-Q|  0.68 (0.00)|  0.91 (0.00)|  0.72 (0.00)|  0.68 (0.00) |
               2QBF1| 18.86 (0.04)| 25.19 (0.05)| 22.33 (0.04)| 20.12 (0.04) |
          PRIMEIMPL2|  9.82 (0.14)| 18.79 (0.01)| 14.63 (0.02)| 11.91 (0.00) |
            ANCESTOR|104.14 (0.12)|138.62 (1.75)|104.43 (0.11)|101.19 (1.28) |
       3COL-SIMPLEX1|  5.88 (0.01)|  7.48 (0.01)|  5.88 (0.01)|  5.38 (0.00) |
         3COL-LADDER|117.14 (0.14)|140.94 (0.17)|117.84 (0.44)|108.32 (0.13) |
       3COL-N-LADDER|  2.38 (0.02)|  2.64 (0.01)|  2.51 (0.00)|  2.47 (0.00) |
        3COL-RANDOM1| 10.75 (0.03)| 18.49 (0.00)| 15.48 (0.03)| 12.36 (0.02) |
          HP-RANDOM1|  7.52 (0.03)|  9.79 (0.02)|  8.52 (0.00)|  7.08 (0.01) |
       HAMCYCLE-FREE|  1.56 (0.00)|  2.17 (0.00)|  1.82 (0.00)|  1.28 (0.01) |
             DECOMP2| 12.96 (0.01)| 20.14 (0.03)| 14.15 (0.00)| 13.64 (0.00) |
        BW-P4-Esra-a| 74.13 (0.10)| 98.25 (0.05)| 86.62 (0.06)| 78.32 (0.08) |
        BW-P5-nopush|  7.05 (0.22)|  9.30 (0.01)|  8.26 (0.01)|  7.27 (0.02) |
       BW-P5-pushbin|  5.58 (0.04)|  7.96 (0.07)|  6.79 (0.00)|  5.94 (0.00) |
     BW-P5-nopushbin|  1.77 (0.00)|  2.49 (0.01)|  2.09 (0.00)|  1.84 (0.00) |
              3SAT-1| 30.50 (0.07)| 58.10 (0.04)| 44.85 (0.04)| 37.27 (0.02) |
   3SAT-1-CONSTRAINT| 18.57 (0.00)| 34.52 (0.01)| 28.18 (0.01)| 21.94 (0.02) |
        HANOI-Towers|  3.36 (0.00)|  4.46 (0.03)|  3.45 (0.02)|  3.28 (0.02) |
              RAMSEY|  6.95 (0.01)|  9.44 (0.01)|  7.47 (0.00)|  6.57 (0.02) |
             CRISTAL|  6.96 (0.02)|  9.14 (0.05)|  7.07 (0.01)|  6.54 (0.03) |
             HANOI-K| 37.55 (0.05)| 62.50 (0.44)| 51.36 (0.18)| 44.88 (0.03) |
           21-QUEENS|  8.38 (0.04)| 15.57 (0.06)| 12.97 (0.04)|  9.83 (0.04) |
   MSTDir[V=13,A=40]| 18.04 (0.01)| 25.27 (0.11)| 22.18 (0.02)| 18.83 (0.01) |
   MSTDir[V=15,A=40]| 18.31 (0.00)| 25.28 (0.04)| 22.35 (0.01)| 18.95 (0.03) |
 MSTUndir[V=13,A=40]|  9.46 (0.00)| 14.29 (0.91)| 11.98 (0.00)|  9.80 (0.01) |
 MSTUndir[V=15,A=40]|154.03 (0.13)|218.24 (0.03)|193.55 (0.40)|158.01 (0.03) |
         TIMETABLING|  8.96 (0.01)| 12.34 (0.00)|  9.42 (0.01)|  8.69 (0.00) |

The situation where old inliner inlines significanly more is where there
is deep nesting of various functions.  It looks like the abstraction
penalty is high and one needs to inline far enought to get it
elliminated, thus I added abstraction penalty parameter.  That parameter
is used to scale function body size afer it is inlined (so given % of
body is expected to be elliminated by optimizer).  With settings 50% I
get smaller resulting .s than with setting 0 (original heuristics) for
8162 testcase without too much of costs at compile time.  With setting
100% one gets old inlining heuristics before additional tricks were
added to trottle it down.

The results with default setting to 300 insns inlining are:

		 Build time:   Size:	Stripped:

  mainline		5:18   2325423  1072156
  +patch with "25"      3:50   2072927   774588
  +patch with "75"      4:28   2086049   868700
  +patch with "90"      5:23   2228940  1019356

                    |   mainline  |     "25"    |    "75"     |    "90"      |
                    +-------------+-------------+-------------+--------------+
      STRATCOMP1-ALL|  6.20 (0.02)|  6.29 (0.00)|  6.01 (0.01)|  6.12 (0.00) |
   STRATCOMP-770.2-Q|  0.69 (0.00)|  0.82 (0.01)|  0.73 (0.00)|  0.72 (0.00) |
               2QBF1| 18.90 (0.04)| 22.91 (0.01)| 20.39 (0.01)| 18.92 (0.02) |
          PRIMEIMPL2|  9.69 (0.01)| 15.62 (0.01)| 12.35 (0.01)|  9.66 (0.01) |
            ANCESTOR|106.63 (0.16)|121.86 (1.92)|109.81 (0.03)|116.49 (0.14) |
       3COL-SIMPLEX1|  6.00 (0.01)|  6.84 (0.01)|  6.39 (0.01)|  6.42 (0.01) |
         3COL-LADDER|119.01 (0.01)|135.48 (0.03)|126.90 (0.05)|128.36 (0.13) |
       3COL-N-LADDER|  2.48 (0.08)|  2.46 (0.00)|  2.39 (0.01)|  2.37 (0.01) |
        3COL-RANDOM1| 10.68 (0.02)| 15.83 (0.01)| 12.28 (0.03)| 10.73 (0.06) |
          HP-RANDOM1|  7.52 (0.01)|  9.10 (0.04)|  7.55 (0.02)|  6.63 (0.01) |
       HAMCYCLE-FREE|  1.57 (0.04)|  1.98 (0.00)|  1.58 (0.00)|  1.19 (0.00) |
             DECOMP2| 13.06 (0.01)| 16.22 (0.07)| 14.59 (0.07)| 14.03 (0.04) |
        BW-P4-Esra-a| 74.51 (0.02)| 90.86 (0.05)| 80.16 (0.06)| 74.27 (0.18) |
        BW-P5-nopush|  6.89 (0.01)|  8.56 (0.04)|  7.19 (0.01)|  6.91 (0.01) |
       BW-P5-pushbin|  5.54 (0.01)|  7.20 (0.02)|  5.90 (0.01)|  5.55 (0.00) |
     BW-P5-nopushbin|  1.76 (0.00)|  2.24 (0.00)|  1.89 (0.00)|  1.78 (0.01) |
              3SAT-1| 30.40 (0.01)| 47.85 (0.00)| 38.82 (0.02)| 30.54 (0.03) |
   3SAT-1-CONSTRAINT| 18.53 (0.02)| 29.79 (0.02)| 21.71 (0.02)| 18.98 (0.12) |
        HANOI-Towers|  3.23 (0.00)|  3.79 (0.01)|  3.39 (0.01)|  3.29 (0.00) |
              RAMSEY|  6.96 (0.00)|  8.60 (0.00)|  7.57 (0.01)|  7.53 (0.00) |
             CRISTAL|  6.97 (0.02)|  8.23 (0.02)|  7.66 (0.00)|  7.66 (0.03) |
             HANOI-K| 37.04 (0.93)| 55.03 (0.23)| 39.35 (0.09)| 37.02 (0.10) |
           21-QUEENS|  8.45 (0.02)| 13.15 (0.05)|  9.86 (0.02)|  8.90 (0.03) |
   MSTDir[V=13,A=40]| 18.18 (0.02)| 23.35 (0.01)| 18.81 (0.01)| 17.13 (0.02) |
   MSTDir[V=15,A=40]| 18.31 (0.10)| 23.40 (0.01)| 18.73 (0.00)| 16.92 (0.01) |
 MSTUndir[V=13,A=40]|  9.55 (0.09)| 12.48 (0.00)|  9.82 (0.00)|  8.70 (0.00) |
 MSTUndir[V=15,A=40]|157.19 (2.03)|201.73 (0.03)|158.00 (0.50)|141.97 (0.13) |
         TIMETABLING|  9.05 (0.00)| 10.88 (0.01)|  9.94 (0.01)|  9.69 (0.02) |

So still hit or miss.  I hope that setting both limit upper (500/50)
will work better, but I would like to know results from other
applications too.

The inline limits are set via
--param max-inline-insns-auto=X --param max-inline-insns-single=X
The abstraction penalty (in percents):
--param inline-abstraction-penalty=25
Additionally there is --param large-function-insns (defaulting to 30000)
and --param large-function-growth (defaulting to 200%) used to avoid
huge function bodies.

If you have some CPU time to spare, I would be interested in performance
of various settings of inline-abstraction-penalty (20/40/60/80) and
max-inline-insns-auto+max-inline-insns-single (150/300/600/900).

In general I hope to get the code into shape where it is still easy and
work resonably well for most of uses.  

Index: builtins.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/builtins.c,v
retrieving revision 1.222
diff -c -3 -p -r1.222 builtins.c
*** builtins.c	25 Jun 2003 03:09:02 -0000	1.222
--- builtins.c	25 Jun 2003 21:52:26 -0000
*************** expand_builtin_mathfn (tree exp, rtx tar
*** 1771,1795 ****
    if (! flag_errno_math || ! HONOR_NANS (mode))
      errno_set = false;
  
!   /* Stabilize and compute the argument.  */
!   if (errno_set)
!     switch (TREE_CODE (arg))
!       {
!       case VAR_DECL:
!       case PARM_DECL:
!       case SAVE_EXPR:
!       case REAL_CST:
! 	break;
! 
!       default:
! 	/* Wrap the computation of the argument in a SAVE_EXPR, as we
! 	   need to expand the argument again in expand_errno_check.  This
! 	   way, we will not perform side-effects more the once.  */
! 	arg = save_expr (arg);
! 	arglist = build_tree_list (NULL_TREE, arg);
! 	exp = build_function_call_expr (fndecl, arglist);
! 	break;
!       }
  
    op0 = expand_expr (arg, subtarget, VOIDmode, 0);
  
--- 1771,1793 ----
    if (! flag_errno_math || ! HONOR_NANS (mode))
      errno_set = false;
  
!   /* Always stabilize the argument list.  */
!   switch (TREE_CODE (arg))
!     {
!     case VAR_DECL:
!     case PARM_DECL:
!     case SAVE_EXPR:
!     case REAL_CST:
!       break;
! 
!     default:
!       /* Wrap the computation of the argument in a SAVE_EXPR, as we may
! 	 need to expand the argument again.  This way, we will not perform
! 	 side-effects more the once.  */
!       arg = save_expr (arg);
!       arglist = build_tree_list (NULL_TREE, arg);
!       exp = build_function_call_expr (fndecl, arglist);
!     }
  
    op0 = expand_expr (arg, subtarget, VOIDmode, 0);
  
*************** expand_builtin_mathfn (tree exp, rtx tar
*** 1800,1812 ****
       Set TARGET to wherever the result comes back.  */
    target = expand_unop (mode, builtin_optab, op0, target, 0);
  
!   /* If we were unable to expand via the builtin, stop the
!      sequence (without outputting the insns) and return 0, causing
!      a call to the library function.  */
    if (target == 0)
      {
        end_sequence ();
!       return 0;
      }
  
    if (errno_set)
--- 1798,1810 ----
       Set TARGET to wherever the result comes back.  */
    target = expand_unop (mode, builtin_optab, op0, target, 0);
  
!   /* If we were unable to expand via the builtin, stop the sequence
!      (without outputting the insns) and call to the library function
!      with the stabilized argument list.  */
    if (target == 0)
      {
        end_sequence ();
!       return expand_call (exp, target, target == const0_rtx);
      }
  
    if (errno_set)
*************** expand_builtin_mathfn_2 (tree exp, rtx t
*** 1866,1911 ****
    if (! flag_errno_math || ! HONOR_NANS (mode))
      errno_set = false;
  
!   /* Stabilize the arguments.  */
!   if (errno_set)
      {
!       switch (TREE_CODE (arg1))
! 	{
! 	case VAR_DECL:
! 	case PARM_DECL:
! 	case SAVE_EXPR:
! 	case REAL_CST:
! 	  temp = TREE_CHAIN (arglist);
! 	  break;
! 
! 	default:
! 	  stable = false;
! 	  arg1 = save_expr (arg1);
! 	  temp = build_tree_list (NULL_TREE, arg1);
! 	  break;
!         }
! 
!       switch (TREE_CODE (arg0))
! 	{
! 	case VAR_DECL:
! 	case PARM_DECL:
! 	case SAVE_EXPR:
! 	case REAL_CST:
! 	  if (! stable)
! 	    arglist = tree_cons (NULL_TREE, arg0, temp);
! 	  break;
! 
! 	default:
! 	  stable = false;
! 	  arg0 = save_expr (arg0);
! 	  arglist = tree_cons (NULL_TREE, arg0, temp);
! 	  break;
! 	}
  
        if (! stable)
! 	exp = build_function_call_expr (fndecl, arglist);
      }
  
    op0 = expand_expr (arg0, subtarget, VOIDmode, 0);
    op1 = expand_expr (arg1, 0, VOIDmode, 0);
  
--- 1864,1906 ----
    if (! flag_errno_math || ! HONOR_NANS (mode))
      errno_set = false;
  
!   /* Alway stabilize the argument list.  */
!   switch (TREE_CODE (arg1))
      {
!     case VAR_DECL:
!     case PARM_DECL:
!     case SAVE_EXPR:
!     case REAL_CST:
!       temp = TREE_CHAIN (arglist);
!       break;
  
+     default:
+       stable = false;
+       arg1 = save_expr (arg1);
+       temp = build_tree_list (NULL_TREE, arg1);
+       break;
+     }
+ 
+   switch (TREE_CODE (arg0))
+     {
+     case VAR_DECL:
+     case PARM_DECL:
+     case SAVE_EXPR:
+     case REAL_CST:
        if (! stable)
!         arglist = tree_cons (NULL_TREE, arg0, temp);
!       break;
! 
!     default:
!       stable = false;
!       arg0 = save_expr (arg0);
!       arglist = tree_cons (NULL_TREE, arg0, temp);
!       break;
      }
  
+   if (! stable)
+     exp = build_function_call_expr (fndecl, arglist);
+ 
    op0 = expand_expr (arg0, subtarget, VOIDmode, 0);
    op1 = expand_expr (arg1, 0, VOIDmode, 0);
  
*************** expand_builtin_mathfn_2 (tree exp, rtx t
*** 1917,1929 ****
    target = expand_binop (mode, builtin_optab, op0, op1,
  			 target, 0, OPTAB_DIRECT);
  
!   /* If we were unable to expand via the builtin, stop the
!      sequence (without outputting the insns) and return 0, causing
!      a call to the library function.  */
    if (target == 0)
      {
        end_sequence ();
!       return 0;
      }
  
    if (errno_set)
--- 1912,1924 ----
    target = expand_binop (mode, builtin_optab, op0, op1,
  			 target, 0, OPTAB_DIRECT);
  
!   /* If we were unable to expand via the builtin, stop the sequence
!      (without outputting the insns) and call to the library function
!      with the stabilized argument list.  */
    if (target == 0)
      {
        end_sequence ();
!       return expand_call (exp, target, target == const0_rtx);
      }
  
    if (errno_set)
Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.13
diff -c -3 -p -r1.13 cgraph.c
*** cgraph.c	24 Jun 2003 16:50:26 -0000	1.13
--- cgraph.c	25 Jun 2003 21:52:26 -0000
*************** create_edge (caller, callee)
*** 166,171 ****
--- 166,185 ----
       struct cgraph_node *caller, *callee;
  {
    struct cgraph_edge *edge = xmalloc (sizeof (struct cgraph_edge));
+   struct cgraph_edge *edge2;
+ 
+   edge->inline_call = false;
+   /* At the moment we don't associate calls with specific CALL_EXPRs
+      as we probably ought to, so we must preserve inline_call flags to
+      be the same in all copies of the same edge.  */
+   if (cgraph_global_info_ready)
+     for (edge2 = caller->callees; edge2; edge2 = edge2->next_caller)
+       if (edge2->callee == callee)
+ 	{
+ 	  edge->inline_call = edge2->inline_call;
+ 	  break;
+ 	}
+ 
  
    edge->caller = caller;
    edge->callee = callee;
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.6
diff -c -3 -p -r1.6 cgraph.h
*** cgraph.h	24 Jun 2003 16:50:26 -0000	1.6
--- cgraph.h	25 Jun 2003 21:52:26 -0000
*************** struct cgraph_local_info
*** 30,40 ****
    /* Set when function function is visiable in current compilation unit only
       and it's address is never taken.  */
    bool local;
!   /* Set when function is small enought to be inlinable many times.  */
!   bool inline_many;
!   /* Set when function can be inlined once (false only for functions calling
!      alloca, using varargs and so on).  */
!   bool can_inline_once;
  };
  
  /* Information about the function that needs to be computed globally
--- 30,42 ----
    /* Set when function function is visiable in current compilation unit only
       and it's address is never taken.  */
    bool local;
! 
!   /* False when there is something making inlining impossible (such as va_arg) */
!   bool inlinable;
!   /* True when function should be inlined independently on it's size.  */
!   bool disgread_inline_limits;
!   /* Size of the function before inlining.  */
!   int self_insns;
  };
  
  /* Information about the function that needs to be computed globally
*************** struct cgraph_global_info
*** 44,49 ****
--- 46,54 ----
  {
    /* Set when the function will be inlined exactly once.  */
    bool inline_once;
+ 
+   /* Estimated size of the function after inlining.  */
+   int insns;
  };
  
  /* Information about the function that is propagated by the RTL backend.
*************** struct cgraph_edge
*** 95,100 ****
--- 100,106 ----
    struct cgraph_node *caller, *callee;
    struct cgraph_edge *next_caller;
    struct cgraph_edge *next_callee;
+   bool inline_call;
  };
  
  /* The cgraph_varpool data strutcture.
*************** void cgraph_finalize_compilation_unit	PA
*** 147,151 ****
--- 153,158 ----
  void cgraph_create_edges		PARAMS ((tree, tree));
  void cgraph_optimize			PARAMS ((void));
  void cgraph_mark_needed_node		PARAMS ((struct cgraph_node *, int));
+ bool cgraph_inline_p			PARAMS ((tree, tree));
  
  #endif  /* GCC_CGRAPH_H  */
Index: cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.7
diff -c -3 -p -r1.7 cgraphunit.c
*** cgraphunit.c	24 Jun 2003 16:50:28 -0000	1.7
--- cgraphunit.c	25 Jun 2003 21:52:26 -0000
*************** Software Foundation, 59 Temple Place - S
*** 34,46 ****
  #include "target.h"
  #include "cgraph.h"
  #include "diagnostic.h"
  
  static void cgraph_expand_functions PARAMS ((void));
  static void cgraph_mark_functions_to_output PARAMS ((void));
  static void cgraph_expand_function PARAMS ((struct cgraph_node *));
  static tree record_call_1 PARAMS ((tree *, int *, void *));
  static void cgraph_mark_local_functions PARAMS ((void));
- static void cgraph_mark_functions_to_inline_once PARAMS ((void));
  static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
  
  /* Analyze function once it is parsed.  Set up the local information
--- 34,50 ----
  #include "target.h"
  #include "cgraph.h"
  #include "diagnostic.h"
+ #include "c-common.h"
+ #include "params.h"
+ 
+ #define INSNS_PER_STMT 10
+ #define INSNS_PER_CALL 10
  
  static void cgraph_expand_functions PARAMS ((void));
  static void cgraph_mark_functions_to_output PARAMS ((void));
  static void cgraph_expand_function PARAMS ((struct cgraph_node *));
  static tree record_call_1 PARAMS ((tree *, int *, void *));
  static void cgraph_mark_local_functions PARAMS ((void));
  static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
  
  /* Analyze function once it is parsed.  Set up the local information
*************** cgraph_finalize_function (decl, body)
*** 74,87 ****
        cgraph_mark_needed_node (node, 1);
      }
  
!   if (!node->needed && !DECL_COMDAT (node->decl))
!     node->local.can_inline_once = tree_inlinable_function_p (decl, 1);
!   else
!     node->local.can_inline_once = 0;
!   if (flag_inline_trees)
!     node->local.inline_many = tree_inlinable_function_p (decl, 0);
!   else
!     node->local.inline_many = 0;
  
    (*debug_hooks->deferred_inline_function) (decl);
  }
--- 78,88 ----
        cgraph_mark_needed_node (node, 1);
      }
  
!   node->local.inlinable = tree_inlinable_function_p (decl, 1);
!   node->local.self_insns = DECL_NUM_STMTS (decl) * INSNS_PER_STMT;
!   if (node->local.inlinable)
!     node->local.disgread_inline_limits
!       = (*lang_hooks.tree_inlining.disregard_inline_limits) (decl);
  
    (*debug_hooks->deferred_inline_function) (decl);
  }
*************** cgraph_mark_functions_to_output ()
*** 230,245 ****
    for (node = cgraph_nodes; node; node = node->next)
      {
        tree decl = node->decl;
  
        /* We need to output all local functions that are used and not
  	 always inlined, as well as those that are reachable from
  	 outside the current compilation unit.  */
        if (DECL_SAVED_TREE (decl)
  	  && (node->needed
! 	      || (!node->local.inline_many && !node->global.inline_once
! 		  && node->reachable)
! 	      || (DECL_ASSEMBLER_NAME_SET_P (decl)
! 	          && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
  	  && !TREE_ASM_WRITTEN (decl) && !node->origin
  	  && !DECL_EXTERNAL (decl))
  	node->output = 1;
--- 231,250 ----
    for (node = cgraph_nodes; node; node = node->next)
      {
        tree decl = node->decl;
+       struct cgraph_edge *e;
+       if (node->output)
+ 	abort ();
+ 
+       for (e = node->callers; e; e = e->next_caller)
+ 	if (!e->inline_call)
+ 	  break;
  
        /* We need to output all local functions that are used and not
  	 always inlined, as well as those that are reachable from
  	 outside the current compilation unit.  */
        if (DECL_SAVED_TREE (decl)
  	  && (node->needed
! 	      || (e && node->reachable))
  	  && !TREE_ASM_WRITTEN (decl) && !node->origin
  	  && !DECL_EXTERNAL (decl))
  	node->output = 1;
*************** cgraph_optimize_function (node)
*** 254,259 ****
--- 259,266 ----
  {
    tree decl = node->decl;
  
+   /* optimize_inline_calls avoids inlining of current_function_decl.  */
+   current_function_decl = 0;
    if (flag_inline_trees)
      optimize_inline_calls (decl);
    if (node->nested)
*************** cgraph_expand_function (node)
*** 270,275 ****
--- 277,283 ----
       struct cgraph_node *node;
  {
    tree decl = node->decl;
+   struct cgraph_edge *e;
  
    announce_function (decl);
  
*************** cgraph_expand_function (node)
*** 279,319 ****
       via lang_expand_decl_stmt.  */
    (*lang_hooks.callgraph.expand_function) (decl);
  
!   /* When we decided to inline the function once, we never ever should
!      need to output it separately.  */
!   if (node->global.inline_once)
!     abort ();
!   if (!node->local.inline_many
!       || !node->callers)
      DECL_SAVED_TREE (decl) = NULL;
    current_function_decl = NULL;
  }
  
! 
! /* Expand all functions that must be output. 
!   
!    Attempt to topologically sort the nodes so function is output when
!    all called functions are already assembled to allow data to be
!    propagated accross the callgraph.  Use a stack to get smaller distance
!    between a function and it's callees (later we may choose to use a more
!    sophisticated algorithm for function reordering; we will likely want
!    to use subsections to make the output functions appear in top-down
!    order.  */
! 
! static void
! cgraph_expand_functions ()
  {
    struct cgraph_node *node, *node2;
-   struct cgraph_node **stack =
-     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
-   struct cgraph_node **order =
-     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
    int stack_size = 0;
    int order_pos = 0;
    struct cgraph_edge *edge, last;
-   int i;
  
!   cgraph_mark_functions_to_output ();
  
    /* We have to deal with cycles nicely, so use a depth first traversal
       output algorithm.  Ignore the fact that some functions won't need
--- 287,312 ----
       via lang_expand_decl_stmt.  */
    (*lang_hooks.callgraph.expand_function) (decl);
  
!   for (e = node->callers; e; e = e->next_caller)
!     if (e->inline_call)
!       break;
!   if (!e)
      DECL_SAVED_TREE (decl) = NULL;
    current_function_decl = NULL;
  }
  
! /* Fill array order with all nodes with output flag set in the reverse
!    topological order.  */
! static int
! cgraph_postorder (struct cgraph_node **order)
  {
    struct cgraph_node *node, *node2;
    int stack_size = 0;
    int order_pos = 0;
    struct cgraph_edge *edge, last;
  
!   struct cgraph_node **stack =
!     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
  
    /* We have to deal with cycles nicely, so use a depth first traversal
       output algorithm.  Ignore the fact that some functions won't need
*************** cgraph_expand_functions ()
*** 359,486 ****
  	      }
  	  }
        }
!   for (i = order_pos - 1; i >= 0; i--)
      {
!       node = order[i];
!       if (node->output)
  	{
! 	  if (!node->reachable)
! 	    abort ();
! 	  node->output = 0;
! 	  cgraph_expand_function (node);
  	}
      }
    free (stack);
!   free (order);
  }
  
! /* Mark all local functions.
!    We can not use node->needed directly as it is modified during
!    execution of cgraph_optimize.  */
  
  static void
! cgraph_mark_local_functions ()
  {
!   struct cgraph_node *node;
  
!   if (!quiet_flag)
!     fprintf (stderr, "\n\nMarking local functions:");
  
!   /* Figure out functions we want to assemble.  */
!   for (node = cgraph_nodes; node; node = node->next)
!     {
!       node->local.local = (!node->needed
! 		           && DECL_SAVED_TREE (node->decl)
! 			   && !DECL_COMDAT (node->decl)
! 		           && !TREE_PUBLIC (node->decl));
!       if (node->local.local)
! 	announce_function (node->decl);
      }
  }
  
! /* Decide what function should be inlined because they are invoked once
!    (so inlining won't result in duplication of the code).  */
  
  static void
! cgraph_mark_functions_to_inline_once ()
  {
!   struct cgraph_node *node, *node1;
  
!   if (!quiet_flag)
!     fprintf (stderr, "\n\nMarking functions to inline once:");
  
-   /* Now look for function called only once and mark them to inline.
-      From this point number of calls to given function won't grow.  */
    for (node = cgraph_nodes; node; node = node->next)
      {
        if (node->callers && !node->callers->next_caller && !node->needed
! 	  && node->local.can_inline_once)
  	{
  	  bool ok = true;
  
  	  /* Verify that we won't duplicate the caller.  */
  	  for (node1 = node->callers->caller;
! 	       node1->local.inline_many
! 	       && node1->callers
! 	       && ok;
! 	       node1 = node1->callers->caller)
  	    if (node1->callers->next_caller || node1->needed)
  	      ok = false;
  	  if (ok)
  	    {
! 	      node->global.inline_once = true;
! 	      announce_function (node->decl);
  	    }
  	}
      }
  }
  
  
  /* Perform simple optimizations based on callgraph.  */
  
  void
  cgraph_optimize ()
  {
-   struct cgraph_node *node;
-   bool changed = true;
- 
    cgraph_mark_local_functions ();
  
!   cgraph_mark_functions_to_inline_once ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
      fprintf (stderr, "\n\nAssembling functions:");
  
!   /* Output everything.  
!      ??? Our inline heuristic may decide to not inline functions previously
!      marked as inlinable thus adding new function bodies that must be output.
!      Later we should move all inlining decisions to callgraph code to make
!      this impossible.  */
    cgraph_expand_functions ();
-   if (!quiet_flag)
-     fprintf (stderr, "\n\nAssembling functions that failed to inline:");
-   while (changed && !errorcount && !sorrycount)
-     {
-       changed = false;
-       for (node = cgraph_nodes; node; node = node->next)
- 	{
- 	  tree decl = node->decl;
- 	  if (!node->origin
- 	      && !TREE_ASM_WRITTEN (decl)
- 	      && DECL_SAVED_TREE (decl)
- 	      && !DECL_EXTERNAL (decl))
- 	    {
- 	      struct cgraph_edge *edge;
- 
- 	      for (edge = node->callers; edge; edge = edge->next_caller)
- 		if (TREE_ASM_WRITTEN (edge->caller->decl))
- 		  {
- 		    changed = true;
- 		    cgraph_expand_function (node);
- 		    break;
- 		  }
- 	    }
- 	}
-     }
  }
--- 352,756 ----
  	      }
  	  }
        }
!   free (stack);
!   return order_pos;
! }
! 
! #define INLINED_TIMES(node) ((size_t)(node)->aux)
! #define SET_INLINED_TIMES(node,times) ((node)->aux = (void *)(times))
! 
! /* Return list of nodes we decided to inline NODE into, set their output
!    flag and compute INLINED_TIMES. 
! 
!    We do simple backgracing to get INLINED_TIMES right.  This should not be
!    expensive as we limit the amount of inlining.  Alternatively we may first
!    discover set of nodes, topologically sort these and propagate
!    INLINED_TIMES  */
! 
! static int
! cgraph_inlined_into (struct cgraph_node *node, struct cgraph_node **array)
! {
!   int nfound = 0;
!   struct cgraph_edge **stack;
!   struct cgraph_edge *e, *e1;
!   int sp;
!   int i;
! 
!   /* Fast path: since we traverse in mostly topological order, we will likely
!      find no edges.  */
!   for (e = node->callers; e; e = e->next_caller)
!     if (e->inline_call)
!       break;
! 
!   if (!e)
!     return 0;
! 
!   /* Allocate stack for back-tracking up callgraph.  */
!   stack = xmalloc ((cgraph_n_nodes + 1) * sizeof (struct cgraph_edge));
!   sp = 0;
! 
!   /* Push the first edge on to the stack.  */
!   stack[sp++] = e;
! 
!   while (sp)
      {
!       struct cgraph_node *caller;
! 
!       /* Look at the edge on the top of the stack.  */
!       e = stack[sp - 1];
!       caller = e->caller;
! 
!       /* Check if the caller destination has been visited yet.  */
!       if (!caller->output)
  	{
! 	  array[nfound++] = e->caller;
! 	  /* Mark that we have visited the destination.  */
! 	  caller->output = true;
! 	  SET_INLINED_TIMES (caller, 0);
! 	}
!       SET_INLINED_TIMES (caller, INLINED_TIMES (caller) + 1);
! 
!       for (e1 = caller->callers; e1; e1 = e1->next_caller)
! 	if (e1->inline_call)
! 	  break;
!       if (e1)
! 	stack[sp++] = e1;
!       else
! 	{
! 	  while (true)
! 	    {
! 	      for (e1 = e->next_caller; e1; e1 = e1->next_caller)
! 		if (e1->inline_call)
! 		  break;
! 
! 	      if (e1)
! 		{
! 		  stack[sp - 1] = e1;
! 		  break;
! 		}
! 	      else
! 		{
! 		  sp--;
! 		  if (!sp)
! 		    break;
! 		  e = stack[sp - 1];
! 		}
! 	    }
  	}
      }
+ 
    free (stack);
! 
! 
!   if (!quiet_flag)
!     {
!       fprintf (stderr, "\nFound inline predecesors of ");
!       announce_function (node->decl);
!       fprintf (stderr, ":");
!       for (i = 0; i < nfound; i++)
! 	{
! 	  announce_function (array[i]->decl);
! 	  if (INLINED_TIMES (array[i]) != 1)
! 	    fprintf (stderr, " (%i times)", INLINED_TIMES (array[i]));
! 	}
!       fprintf (stderr, "\n");
!     }
! 
!   return nfound;
  }
  
! /* Estimate size of the function after inlining WHAT into TO.  */
! static int
! cgraph_estimate_size_after_inlining (int times,
! 				     struct cgraph_node *to,
! 				     struct cgraph_node *what)
! {
!   int size = (what->global.insns - INSNS_PER_CALL) * times;
!   size -= size * PARAM_VALUE (PARAM_INLINE_ABSTRACTION_PENALTY) / 100;
!   return to->global.insns + size;
! }
  
+ /* Update insn sizes after inlining WHAT into TO that is already inlined into
+    all nodes in INLINED array.  */
  static void
! cgraph_mark_inline (struct cgraph_node *to,
! 		    struct cgraph_node *what,
! 		    struct cgraph_node **inlined, int ninlined)
  {
!   int i;
!   int times = 0;
!   struct cgraph_edge *e;
  
!   for (e = to->callees; e; e = e->next_callee)
!     if (e->callee == what)
!       {
! 	if (e->inline_call)
! 	  abort ();
! 	e->inline_call = true;
! 	times++;
!       }
!   if (!times)
!     abort ();
  
!   to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
!   for (i = 0; i < ninlined; i++)
!     inlined[i]->global.insns =
!       cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) * times,
! 					   inlined[i], what);
! }
! 
! /* Return false when inlining WHAT into TO is not good idea as it would cause
!    too large growth of function bodies.  */
! static bool
! cgraph_check_inline_limits (struct cgraph_node *to,
! 			    struct cgraph_node *what,
! 			    struct cgraph_node **inlined, int ninlined)
! {
!   int i;
!   int times = 0;
!   struct cgraph_edge *e;
!   int newsize;
! 
!   for (e = to->callees; e; e = e->next_callee)
!     if (e->callee == what)
!       times++;
! 
!   newsize = cgraph_estimate_size_after_inlining (times, to, what);
!   if (newsize > PARAM_VALUE (PARAM_LARGE_FUNCTION_INSNS)
!       && newsize > to->local.self_insns * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH) / 100)
!     return false;
!   for (i = 0; i < ninlined; i++)
!     {
!       newsize =
! 	cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) *
! 					     times, inlined[i], what);
!       if (newsize > PARAM_VALUE (PARAM_LARGE_FUNCTION_INSNS)
! 	  && newsize >
! 	  inlined[i]->local.self_insns * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH)  / 100)
! 	return false;
      }
+   return true;
  }
  
! /* Return true when function N is small enought to be inlined.  */
! static bool
! cgraph_default_inline_p (struct cgraph_node *n)
! {
!   if (!DECL_INLINE (n->decl))
!     return false;
!   if (DID_INLINE_FUNC (n->decl))
!     return n->global.insns < MAX_INLINE_INSNS_AUTO;
!   else
!     return n->global.insns < MAX_INLINE_INSNS_SINGLE;
! }
  
  static void
! cgraph_decide_inlining (void)
  {
!   struct cgraph_node *node;
!   int nnodes;
!   struct cgraph_node **order =
!     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
!   struct cgraph_node **inlined =
!     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
!   int ninlined;
!   int i, y;
  
!   for (node = cgraph_nodes; node; node = node->next)
!     node->global.insns = node->local.self_insns;
! 
!   nnodes = cgraph_postorder (order);
  
    for (node = cgraph_nodes; node; node = node->next)
+     node->aux = 0;
+ 
+   /* In the first pass mark all always_inline edges.  Do this with a priority
+      so no our decisions makes this impossible.  */
+   for (i = nnodes - 1; i >= 0; i--)
      {
+       struct cgraph_edge *e;
+ 
+       node = order[i];
+ 
+       for (e = node->callees; e; e = e->next_callee)
+ 	if (e->callee->local.disgread_inline_limits)
+ 	  break;
+       if (!e)
+ 	continue;
+       ninlined = cgraph_inlined_into (order[i], inlined);
+       for (; e; e = e->next_callee)
+ 	{
+ 	  if (e->inline_call || !e->callee->local.disgread_inline_limits)
+ 	    continue;
+ 	  if (e->callee->output || e->callee == node)
+ 	    continue;
+ 	  cgraph_mark_inline (node, e->callee, inlined, ninlined);
+ 	}
+       for (y = 0; y < ninlined; y++)
+ 	inlined[y]->output = 0, node->aux = 0;
+     }
+ 
+   /* Now inline small functions.  */
+ 
+   if (!quiet_flag)
+     fprintf (stderr, "\n\nDeciding on inlining:");
+   for (i = nnodes - 1; i >= 0; i--)
+     {
+       struct cgraph_edge *e;
+ 
+       node = order[i];
+ 
+       for (e = node->callees; e; e = e->next_callee)
+ 	if (!e->inline_call && cgraph_default_inline_p (e->callee))
+ 	  break;
+       if (!e)
+ 	continue;
+       ninlined = cgraph_inlined_into (order[i], inlined);
+       for (; e; e = e->next_callee)
+ 	{
+ 	  if (e->inline_call || !e->callee->local.inlinable
+ 	      || !cgraph_default_inline_p (e->callee))
+ 	    continue;
+ 	  if (e->callee == node)
+ 	    continue;
+ 	  if (e->callee->output)
+ 	    continue;
+ 	  if (!cgraph_check_inline_limits
+ 	      (node, e->callee, inlined, ninlined))
+ 	    continue;
+ 	  cgraph_mark_inline (node, e->callee, inlined, ninlined);
+ 	}
+       for (y = 0; y < ninlined; y++)
+ 	inlined[y]->output = 0, node->aux = 0;
+     }
+ 
+   /* And finally decide what functions are called once.  */
+ 
+   for (i = nnodes - 1; i >= 0; i--)
+     {
+       node = order[i];
+ 
        if (node->callers && !node->callers->next_caller && !node->needed
! 	  && node->local.inlinable && !node->callers->inline_call
! 	  && !DECL_EXTERNAL (node->decl) && !DECL_COMDAT (node->decl))
  	{
  	  bool ok = true;
+ 	  struct cgraph_node *node1;
  
  	  /* Verify that we won't duplicate the caller.  */
  	  for (node1 = node->callers->caller;
! 	       node1->callers && node1->callers->inline_call
! 	       && ok; node1 = node1->callers->caller)
  	    if (node1->callers->next_caller || node1->needed)
  	      ok = false;
  	  if (ok)
  	    {
! 	      ninlined = cgraph_inlined_into (node, inlined);
! 	      if (cgraph_check_inline_limits
! 		  (node->callers->caller, node, inlined, ninlined))
! 		{
! 		  cgraph_mark_inline (node->callers->caller, node, inlined,
! 				      ninlined);
! 		  announce_function (node->decl);
! 		}
! 	      for (y = 0; y < ninlined; y++)
! 		inlined[y]->output = 0, node->aux = 0;
  	    }
  	}
      }
+ 
+   free (order);
+   free (inlined);
+ }
+ 
+ /* Return true when CALLER_DECL should be inlined into CALLEE_DECL.  */
+ bool
+ cgraph_inline_p (tree caller_decl, tree callee_decl)
+ {
+   struct cgraph_node *caller = cgraph_node (caller_decl);
+   struct cgraph_node *callee = cgraph_node (callee_decl);
+   struct cgraph_edge *e;
+ 
+   for (e = caller->callees; e; e = e->next_callee)
+     if (e->callee == callee)
+       return e->inline_call;
+   abort ();
+ }
+ 
+ /* Expand all functions that must be output. 
+   
+    Attempt to topologically sort the nodes so function is output when
+    all called functions are already assembled to allow data to be
+    propagated accross the callgraph.  Use a stack to get smaller distance
+    between a function and it's callees (later we may choose to use a more
+    sophisticated algorithm for function reordering; we will likely want
+    to use subsections to make the output functions appear in top-down
+    order).  */
+ 
+ static void
+ cgraph_expand_functions ()
+ {
+   struct cgraph_node *node;
+   struct cgraph_node **order =
+     xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+   int order_pos = 0;
+   int i;
+ 
+   cgraph_mark_functions_to_output ();
+ 
+   order_pos = cgraph_postorder (order);
+ 
+   for (i = order_pos - 1; i >= 0; i--)
+     {
+       node = order[i];
+       if (node->output)
+ 	{
+ 	  if (!node->reachable)
+ 	    abort ();
+ 	  node->output = 0;
+ 	  cgraph_expand_function (node);
+ 	}
+     }
+   free (order);
  }
  
+ /* Mark all local functions.
+    We can not use node->needed directly as it is modified during
+    execution of cgraph_optimize.  */
+ 
+ static void
+ cgraph_mark_local_functions ()
+ {
+   struct cgraph_node *node;
+ 
+   if (!quiet_flag)
+     fprintf (stderr, "\n\nMarking local functions:");
+ 
+   /* Figure out functions we want to assemble.  */
+   for (node = cgraph_nodes; node; node = node->next)
+     {
+       node->local.local = (!node->needed
+ 		           && DECL_SAVED_TREE (node->decl)
+ 			   && !DECL_COMDAT (node->decl)
+ 		           && !TREE_PUBLIC (node->decl));
+       if (node->local.local)
+ 	announce_function (node->decl);
+     }
+ }
  
  /* Perform simple optimizations based on callgraph.  */
  
  void
  cgraph_optimize ()
  {
    cgraph_mark_local_functions ();
  
!   cgraph_decide_inlining ();
  
    cgraph_global_info_ready = true;
    if (!quiet_flag)
      fprintf (stderr, "\n\nAssembling functions:");
  
!   /* Output everything.   */
    cgraph_expand_functions ();
  }
Index: params.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/params.def,v
retrieving revision 1.25
diff -c -3 -p -r1.25 params.def
*** params.def	4 Jun 2003 07:51:41 -0000	1.25
--- params.def	25 Jun 2003 21:52:26 -0000
*************** DEFPARAM(PARAM_MAX_PENDING_LIST_LENGTH,
*** 152,157 ****
--- 152,170 ----
  	 "The maximum length of scheduling's pending operations list",
  	 32)
  
+ DEFPARAM(PARAM_LARGE_FUNCTION_INSNS,
+ 	 "large-function-insns",
+ 	 "The size of function body to be considered large",
+ 	 30000)
+ DEFPARAM(PARAM_LARGE_FUNCTION_GROWTH,
+ 	 "large-function-growth",
+ 	 "Maximal growth due to inlining of large function (in percents)",
+ 	 200)
+ DEFPARAM(PARAM_INLINE_ABSTRACTION_PENALTY,
+ 	 "inline-abstraction-penalty",
+ 	 "how large of function body will be expected to be elliminated by optimizing after inlining (in percents)",
+ 	 50)
+ 
  /* The GCSE optimization will be disabled if it would require
     significantly more memory than this value.  */
  DEFPARAM(PARAM_MAX_GCSE_MEMORY,
Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.787
diff -c -3 -p -r1.787 toplev.c
*** toplev.c	25 Jun 2003 20:43:10 -0000	1.787
--- toplev.c	25 Jun 2003 21:52:26 -0000
*************** parse_options_and_default_flags (int arg
*** 4685,4690 ****
--- 4685,4691 ----
        flag_delete_null_pointer_checks = 1;
        flag_reorder_blocks = 1;
        flag_reorder_functions = 1;
+       flag_unit_at_a_time = 1;
      }
  
    if (optimize >= 3)
*************** parse_options_and_default_flags (int arg
*** 4692,4698 ****
        flag_inline_functions = 1;
        flag_rename_registers = 1;
        flag_unswitch_loops = 1;
-       flag_unit_at_a_time = 1;
      }
  
    if (optimize < 2 || optimize_size)
--- 4693,4698 ----
Index: tree-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-inline.c,v
retrieving revision 1.63
diff -c -3 -p -r1.63 tree-inline.c
*** tree-inline.c	20 Jun 2003 19:55:23 -0000	1.63
--- tree-inline.c	25 Jun 2003 21:52:27 -0000
*************** typedef struct inline_data
*** 106,111 ****
--- 106,112 ----
    htab_t tree_pruner;
    /* Decl of function we are inlining into.  */
    tree decl;
+   tree current_decl;
  } inline_data;
  
  /* Prototypes.  */
*************** expand_call_inline (tp, walk_subtrees, d
*** 1197,1205 ****
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if ((!flag_unit_at_a_time || !DECL_SAVED_TREE (fn)
!        || !cgraph_global_info (fn)->inline_once)
!       && !inlinable_function_p (fn, id, 0))
      {
        if (warn_inline && DECL_INLINE (fn) && !DID_INLINE_FUNC (fn)
  	  && !DECL_IN_SYSTEM_HEADER (fn))
--- 1198,1206 ----
  
    /* Don't try to inline functions that are not well-suited to
       inlining.  */
!   if (!DECL_SAVED_TREE (fn)
!       || (flag_unit_at_a_time && !cgraph_inline_p (id->current_decl, fn))
!       || (!flag_unit_at_a_time && !inlinable_function_p (fn, id, 0)))
      {
        if (warn_inline && DECL_INLINE (fn) && !DID_INLINE_FUNC (fn)
  	  && !DECL_IN_SYSTEM_HEADER (fn))
*************** expand_call_inline (tp, walk_subtrees, d
*** 1441,1447 ****
      }
  
    /* Recurse into the body of the just inlined function.  */
!   expand_calls_inline (inlined_body, id);
    VARRAY_POP (id->fns);
  
    /* If we've returned to the top level, clear out the record of how
--- 1442,1453 ----
      }
  
    /* Recurse into the body of the just inlined function.  */
!   {
!     tree old_decl = id->current_decl;
!     id->current_decl = fn;
!     expand_calls_inline (inlined_body, id);
!     id->current_decl = old_decl;
!   }
    VARRAY_POP (id->fns);
  
    /* If we've returned to the top level, clear out the record of how
*************** optimize_inline_calls (fn)
*** 1487,1492 ****
--- 1493,1499 ----
    memset (&id, 0, sizeof (id));
  
    id.decl = fn;
+   id.current_decl = fn;
    /* Don't allow recursion into FN.  */
    VARRAY_TREE_INIT (id.fns, 32, "fns");
    VARRAY_PUSH_TREE (id.fns, fn);
Index: cp/Make-lang.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/Make-lang.in,v
retrieving revision 1.154
diff -c -3 -p -r1.154 Make-lang.in
*** cp/Make-lang.in	23 Jun 2003 20:52:12 -0000	1.154
--- cp/Make-lang.in	25 Jun 2003 21:52:27 -0000
*************** cp/decl.o: cp/decl.c $(CXX_TREE_H) $(TM_
*** 241,247 ****
    cp/operators.def $(TM_P_H) tree-inline.h diagnostic.h c-pragma.h \
    debug.h gt-cp-decl.h gtype-cp.h timevar.h
  cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) flags.h cp/lex.h cp/decl.h $(EXPR_H) \
!   output.h except.h toplev.h $(RTL_H) c-common.h gt-cp-decl2.h
  cp/typeck2.o: cp/typeck2.c $(CXX_TREE_H) $(TM_H) flags.h toplev.h output.h $(TM_P_H) \
     diagnostic.h
  cp/typeck.o: cp/typeck.c $(CXX_TREE_H) $(TM_H) flags.h $(RTL_H) $(EXPR_H) toplev.h \
--- 241,247 ----
    cp/operators.def $(TM_P_H) tree-inline.h diagnostic.h c-pragma.h \
    debug.h gt-cp-decl.h gtype-cp.h timevar.h
  cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) flags.h cp/lex.h cp/decl.h $(EXPR_H) \
!   output.h except.h toplev.h $(RTL_H) c-common.h gt-cp-decl2.h cgraph.h
  cp/typeck2.o: cp/typeck2.c $(CXX_TREE_H) $(TM_H) flags.h toplev.h output.h $(TM_P_H) \
     diagnostic.h
  cp/typeck.o: cp/typeck.c $(CXX_TREE_H) $(TM_H) flags.h $(RTL_H) $(EXPR_H) toplev.h \
*************** cp/repo.o: cp/repo.c $(CXX_TREE_H) $(TM_
*** 272,278 ****
    gt-cp-repo.h
  cp/semantics.o: cp/semantics.c $(CXX_TREE_H) $(TM_H) cp/lex.h except.h toplev.h \
    flags.h debug.h output.h $(RTL_H) $(TIMEVAR_H) $(EXPR_H) \
!   tree-inline.h
  cp/dump.o: cp/dump.c $(CXX_TREE_H) $(TM_H) tree-dump.h
  cp/optimize.o: cp/optimize.c $(CXX_TREE_H) $(TM_H) rtl.h integrate.h insn-config.h \
    input.h $(PARAMS_H) debug.h tree-inline.h
--- 272,278 ----
    gt-cp-repo.h
  cp/semantics.o: cp/semantics.c $(CXX_TREE_H) $(TM_H) cp/lex.h except.h toplev.h \
    flags.h debug.h output.h $(RTL_H) $(TIMEVAR_H) $(EXPR_H) \
!   tree-inline.h cgraph.h
  cp/dump.o: cp/dump.c $(CXX_TREE_H) $(TM_H) tree-dump.h
  cp/optimize.o: cp/optimize.c $(CXX_TREE_H) $(TM_H) rtl.h integrate.h insn-config.h \
    input.h $(PARAMS_H) debug.h tree-inline.h
Index: cp/cp-lang.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/cp-lang.c,v
retrieving revision 1.51
diff -c -3 -p -r1.51 cp-lang.c
*** cp/cp-lang.c	24 Jun 2003 11:54:00 -0000	1.51
--- cp/cp-lang.c	25 Jun 2003 21:52:27 -0000
*************** static bool cp_var_mod_type_p (tree);
*** 148,153 ****
--- 148,158 ----
  #undef LANG_HOOKS_PREPARE_ASSEMBLE_VARIABLE 
  #define LANG_HOOKS_PREPARE_ASSEMBLE_VARIABLE prepare_assemble_variable
  
+ #undef LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION
+ #define LANG_HOOKS_CALLGRAPH_EXPAND_FUNCTION really_expand_body
+ #undef LANG_HOOKS_CALLGRAPH_LOWER_FUNCTION
+ #define LANG_HOOKS_CALLGRAPH_LOWER_FUNCTION lower_function
+ 
  #undef LANG_HOOKS_MAKE_TYPE
  #define LANG_HOOKS_MAKE_TYPE cxx_make_type
  #undef LANG_HOOKS_TYPE_FOR_MODE
Index: cp/cp-tree.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/cp-tree.h,v
retrieving revision 1.859
diff -c -3 -p -r1.859 cp-tree.h
*** cp/cp-tree.h	24 Jun 2003 11:54:01 -0000	1.859
--- cp/cp-tree.h	25 Jun 2003 21:52:27 -0000
*************** struct lang_decl GTY(())
*** 1746,1752 ****
    ((at_eof && TREE_PUBLIC (DECL) && !DECL_COMDAT (DECL))	\
     || (DECL_ASSEMBLER_NAME_SET_P (DECL)				\
         && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (DECL)))	\
!    || (flag_syntax_only && TREE_USED (DECL)))
  
  /* For a FUNCTION_DECL or a VAR_DECL, the language linkage for the
     declaration.  Some entities (like a member function in a local
--- 1746,1752 ----
    ((at_eof && TREE_PUBLIC (DECL) && !DECL_COMDAT (DECL))	\
     || (DECL_ASSEMBLER_NAME_SET_P (DECL)				\
         && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (DECL)))	\
!    || (((flag_syntax_only || flag_unit_at_a_time) && TREE_USED (DECL))))
  
  /* For a FUNCTION_DECL or a VAR_DECL, the language linkage for the
     declaration.  Some entities (like a member function in a local
*************** extern tree get_guard (tree);
*** 3799,3804 ****
--- 3799,3805 ----
  extern tree get_guard_cond (tree);
  extern tree set_guard (tree);
  extern void prepare_assemble_variable (tree);
+ extern void lower_function (tree);
  
  extern void cp_error_at		(const char *msgid, ...);
  extern void cp_warning_at	(const char *msgid, ...);
*************** extern void clear_out_block             
*** 4153,4158 ****
--- 4154,4160 ----
  extern tree begin_global_stmt_expr              (void);
  extern tree finish_global_stmt_expr             (tree);
  extern tree check_template_template_default_arg (tree);
+ extern void really_expand_body			(tree);
  
  /* in tree.c */
  extern void lang_check_failed			(const char *, int,
Index: cp/decl2.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/decl2.c,v
retrieving revision 1.632
diff -c -3 -p -r1.632 decl2.c
*** cp/decl2.c	24 Jun 2003 11:54:01 -0000	1.632
--- cp/decl2.c	25 Jun 2003 21:52:27 -0000
*************** Boston, MA 02111-1307, USA.  */
*** 46,51 ****
--- 46,53 ----
  #include "cpplib.h"
  #include "target.h"
  #include "c-common.h"
+ #include "cgraph.h"
+ #include "tree-inline.h"
  extern cpp_reader *parse_in;
  
  /* This structure contains information about the initializations
*************** defer_fn (tree fn)
*** 1192,1197 ****
--- 1194,1200 ----
    if (DECL_DEFERRED_FN (fn))
      return;
    DECL_DEFERRED_FN (fn) = 1;
+   DECL_DEFER_OUTPUT (fn) = 1;
    if (!deferred_fns)
      VARRAY_TREE_INIT (deferred_fns, 32, "deferred_fns");
  
*************** mark_vtable_entries (tree decl)
*** 1419,1424 ****
--- 1422,1429 ----
  void
  comdat_linkage (tree decl)
  {
+   bool needed = TREE_PUBLIC (decl) && !DECL_COMDAT (decl);
+ 
    if (flag_weak)
      make_decl_one_only (decl);
    else if (TREE_CODE (decl) == FUNCTION_DECL 
*************** comdat_linkage (tree decl)
*** 1463,1468 ****
--- 1468,1482 ----
  
    if (DECL_LANG_SPECIFIC (decl))
      DECL_COMDAT (decl) = 1;
+ 
+   /* Explicit instantiation:  inform middle end that the decl is needed
+      even tought it is COMDAT. 
+      ??? This would be cleaner if we had a flag expressing that decl
+      is weak, but it is not COMDAT in a sense that it can be emit
+      when not needed.  */
+   if (flag_unit_at_a_time && at_eof && needed
+       && TREE_CODE (decl) == VAR_DECL)
+     cgraph_varpool_mark_needed_node (cgraph_varpool_node (decl));
  }
  
  /* For win32 we also want to put explicit instantiations in
*************** prepare_assemble_variable (tree vars)
*** 1610,1615 ****
--- 1624,1649 ----
  {
    tree parent;
    rtx child_rtx, parent_rtx;
+   tree vtbl= NULL;
+ 
+   if (!flag_vtable_gc)
+     return;
+   /* Recognize virtual tables.  
+      Both VTTs and vtables are array types with context set to the parent
+      class.  */
+   if (TREE_CODE (TREE_TYPE (vars)) != ARRAY_TYPE
+       || !DECL_CONTEXT (vars)
+       || !CLASS_TYPE_P (DECL_CONTEXT (vars)))
+     return;
+   /* Look trought the list of VTTs and vtables of the class to see whether
+      we are seeing one.  */
+   for (vtbl = CLASSTYPE_VTABLES (DECL_CONTEXT (vars));
+        vtbl; vtbl = TREE_CHAIN (vtbl))
+     if (vtbl == vars)
+       break;
+ 
+   if (!vtbl)
+     return;
  
    if (!flag_vtable_gc || TREE_CODE (vars) != VAR_DECL
        || !DECL_VTABLE_OR_VTT_P (vars))
*************** generate_ctor_and_dtor_functions_for_pri
*** 2555,2560 ****
--- 2589,2614 ----
    return 0;
  }
  
+ /* Callgraph code does not understand the member pointers.  Mark the methods
+    referenced as used.  */
+ static tree
+ mark_member_pointers (tree *tp, int *walk_subtrees ATTRIBUTE_UNUSED,
+ 		      void *data ATTRIBUTE_UNUSED)
+ {
+   if (TREE_CODE (*tp) == PTRMEM_CST)
+     cgraph_mark_needed_node (cgraph_node (PTRMEM_CST_MEMBER (*tp)), 1);
+   return 0;
+ }
+ 
+ /* Called via LANGHOOK_CALLGRAPH_LOWER_FUNCTION.  It is supposed to lower
+    frontend specific constructs that would otherwise confuse the middle end.  */
+ void
+ lower_function (tree fn)
+ {
+   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn), mark_member_pointers,
+ 				NULL);
+ }
+ 
  /* This routine is called from the last rule in yyparse ().
     Its job is to create all the code needed to initialize and
     destroy the global aggregates.  We do the destruction
*************** finish_file ()
*** 2765,2771 ****
  	     instantiation "static", which will result in errors about
  	     the use of undefined functions if there is no body for
  	     the function.  */
! 	  if (!DECL_SAVED_TREE (decl))
  	    continue;
  
  	  import_export_decl (decl);
--- 2819,2825 ----
  	     instantiation "static", which will result in errors about
  	     the use of undefined functions if there is no body for
  	     the function.  */
! 	  if (!DECL_SAVED_TREE (decl) || !DECL_DEFER_OUTPUT (decl))
  	    continue;
  
  	  import_export_decl (decl);
*************** finish_file ()
*** 2794,2811 ****
  	      && DECL_SAVED_TREE (decl)
  	      && !TREE_ASM_WRITTEN (decl))
  	    {
! 	      int saved_not_really_extern;
! 
! 	      /* When we call finish_function in expand_body, it will
! 		 try to reset DECL_NOT_REALLY_EXTERN so we save and
! 		 restore it here.  */
! 	      saved_not_really_extern = DECL_NOT_REALLY_EXTERN (decl);
  	      /* Generate RTL for this function now that we know we
  		 need it.  */
  	      expand_body (decl);
- 	      /* Undo the damage done by finish_function.  */
- 	      DECL_EXTERNAL (decl) = 0;
- 	      DECL_NOT_REALLY_EXTERN (decl) = saved_not_really_extern;
  	      /* If we're compiling -fsyntax-only pretend that this
  		 function has been written out so that we don't try to
  		 expand it again.  */
--- 2848,2859 ----
  	      && DECL_SAVED_TREE (decl)
  	      && !TREE_ASM_WRITTEN (decl))
  	    {
! 	      /* We will output the function; no longer consider it in this
! 		 loop.  */
! 	      DECL_DEFER_OUTPUT (decl) = 0;
  	      /* Generate RTL for this function now that we know we
  		 need it.  */
  	      expand_body (decl);
  	      /* If we're compiling -fsyntax-only pretend that this
  		 function has been written out so that we don't try to
  		 expand it again.  */
*************** finish_file ()
*** 2815,2824 ****
  	    }
  	}
  
-       if (deferred_fns_used
- 	  && wrapup_global_declarations (&VARRAY_TREE (deferred_fns, 0),
- 					 deferred_fns_used))
- 	reconsider = true;
        if (walk_namespaces (wrapup_globals_for_namespace, /*data=*/0))
  	reconsider = true;
  
--- 2863,2868 ----
*************** finish_file ()
*** 2887,2892 ****
--- 2931,2942 ----
    /* We're done with static constructors, so we can go back to "C++"
       linkage now.  */
    pop_lang_context ();
+ 
+   if (flag_unit_at_a_time)
+     {
+       cgraph_finalize_compilation_unit ();
+       cgraph_optimize ();
+     }
  
    /* Now, issue warnings about static, but not defined, functions,
       etc., and emit debugging information.  */
Index: cp/semantics.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cp/semantics.c,v
retrieving revision 1.314
diff -c -3 -p -r1.314 semantics.c
*** cp/semantics.c	24 Jun 2003 15:40:05 -0000	1.314
--- cp/semantics.c	25 Jun 2003 21:52:28 -0000
***************
*** 41,46 ****
--- 41,47 ----
  #include "output.h"
  #include "timevar.h"
  #include "debug.h"
+ #include "cgraph.h"
  
  /* There routines provide a modular interface to perform many parsing
     operations.  They may therefore be used during actual parsing, or
*************** emit_associated_thunks (tree fn)
*** 2275,2347 ****
  /* Generate RTL for FN.  */
  
  void
! expand_body (tree fn)
  {
    location_t saved_loc;
    tree saved_function;
! 
!   /* When the parser calls us after finishing the body of a template
!      function, we don't really want to expand the body.  When we're
!      processing an in-class definition of an inline function,
!      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
!      to look at the function itself.  */
!   if (processing_template_decl
!       || (DECL_LANG_SPECIFIC (fn) 
! 	  && DECL_TEMPLATE_INFO (fn)
! 	  && uses_template_parms (DECL_TI_ARGS (fn))))
!     {
!       /* Normally, collection only occurs in rest_of_compilation.  So,
! 	 if we don't collect here, we never collect junk generated
! 	 during the processing of templates until we hit a
! 	 non-template function.  */
!       ggc_collect ();
!       return;
!     }
! 
!   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
!   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
! 				simplify_aggr_init_exprs_r,
! 				NULL);
! 
!   /* If this is a constructor or destructor body, we have to clone
!      it.  */
!   if (maybe_clone_body (fn))
!     {
!       /* We don't want to process FN again, so pretend we've written
! 	 it out, even though we haven't.  */
!       TREE_ASM_WRITTEN (fn) = 1;
!       return;
!     }
! 
!   /* There's no reason to do any of the work here if we're only doing
!      semantic analysis; this code just generates RTL.  */
!   if (flag_syntax_only)
!     return;
! 
!   /* If possible, avoid generating RTL for this function.  Instead,
!      just record it as an inline function, and wait until end-of-file
!      to decide whether to write it out or not.  */
!   if (/* We have to generate RTL if it's not an inline function.  */
!       (DECL_INLINE (fn) || DECL_COMDAT (fn))
!       /* Or if we have to emit code for inline functions anyhow.  */
!       && !flag_keep_inline_functions
!       /* Or if we actually have a reference to the function.  */
!       && !DECL_NEEDED_P (fn))
!     {
!       /* Set DECL_EXTERNAL so that assemble_external will be called as
! 	 necessary.  We'll clear it again in finish_file.  */
!       if (!DECL_EXTERNAL (fn))
! 	{
! 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
! 	  DECL_EXTERNAL (fn) = 1;
! 	}
!       /* Remember this function.  In finish_file we'll decide if
! 	 we actually need to write this function out.  */
!       defer_fn (fn);
!       /* Let the back-end know that this function exists.  */
!       (*debug_hooks->deferred_inline_function) (fn);
!       return;
!     }
  
    /* Compute the appropriate object-file linkage for inline
       functions.  */
--- 2276,2288 ----
  /* Generate RTL for FN.  */
  
  void
! really_expand_body (tree fn)
  {
    location_t saved_loc;
    tree saved_function;
!   
!   if (flag_unit_at_a_time && !cgraph_global_info_ready)
!     abort ();
  
    /* Compute the appropriate object-file linkage for inline
       functions.  */
*************** expand_body (tree fn)
*** 2411,2416 ****
--- 2352,2459 ----
  
    /* Emit any thunks that should be emitted at the same time as FN.  */
    emit_associated_thunks (fn);
+ }
+ 
+ /* Generate RTL for FN.  */
+ 
+ void
+ expand_body (fn)
+      tree fn;
+ {
+   /* When the parser calls us after finishing the body of a template
+      function, we don't really want to expand the body.  When we're
+      processing an in-class definition of an inline function,
+      PROCESSING_TEMPLATE_DECL will no longer be set here, so we have
+      to look at the function itself.  */
+   if (processing_template_decl
+       || (DECL_LANG_SPECIFIC (fn) 
+ 	  && DECL_TEMPLATE_INFO (fn)
+ 	  && uses_template_parms (DECL_TI_ARGS (fn))))
+     {
+       /* Normally, collection only occurs in rest_of_compilation.  So,
+ 	 if we don't collect here, we never collect junk generated
+ 	 during the processing of templates until we hit a
+ 	 non-template function.  */
+       ggc_collect ();
+       return;
+     }
+ 
+   /* Replace AGGR_INIT_EXPRs with appropriate CALL_EXPRs.  */
+   walk_tree_without_duplicates (&DECL_SAVED_TREE (fn),
+ 				simplify_aggr_init_exprs_r,
+ 				NULL);
+ 
+   /* If this is a constructor or destructor body, we have to clone
+      it.  */
+   if (maybe_clone_body (fn))
+     {
+       /* We don't want to process FN again, so pretend we've written
+ 	 it out, even though we haven't.  */
+       TREE_ASM_WRITTEN (fn) = 1;
+       return;
+     }
+ 
+   /* There's no reason to do any of the work here if we're only doing
+      semantic analysis; this code just generates RTL.  */
+   if (flag_syntax_only)
+     return;
+ 
+   if (flag_unit_at_a_time && cgraph_global_info_ready)
+     abort ();
+ 
+   if (flag_unit_at_a_time && !cgraph_global_info_ready)
+     {
+       if (at_eof)
+ 	{
+ 	  /* Compute the appropriate object-file linkage for inline
+ 	     functions.  */
+ 	  if (DECL_DECLARED_INLINE_P (fn))
+ 	    import_export_decl (fn);
+ 	  cgraph_finalize_function (fn, DECL_SAVED_TREE (fn));
+ 	}
+       else
+ 	{
+ 	  if (!DECL_EXTERNAL (fn))
+ 	    {
+ 	      DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	      DECL_EXTERNAL (fn) = 1;
+ 	    }
+ 	  /* Remember this function.  In finish_file we'll decide if
+ 	     we actually need to write this function out.  */
+ 	  defer_fn (fn);
+ 	  /* Let the back-end know that this function exists.  */
+ 	  (*debug_hooks->deferred_inline_function) (fn);
+ 	}
+       return;
+     }
+ 
+ 
+   /* If possible, avoid generating RTL for this function.  Instead,
+      just record it as an inline function, and wait until end-of-file
+      to decide whether to write it out or not.  */
+   if (/* We have to generate RTL if it's not an inline function.  */
+       (DECL_INLINE (fn) || DECL_COMDAT (fn))
+       /* Or if we have to emit code for inline functions anyhow.  */
+       && !flag_keep_inline_functions
+       /* Or if we actually have a reference to the function.  */
+       && !DECL_NEEDED_P (fn))
+     {
+       /* Set DECL_EXTERNAL so that assemble_external will be called as
+ 	 necessary.  We'll clear it again in finish_file.  */
+       if (!DECL_EXTERNAL (fn))
+ 	{
+ 	  DECL_NOT_REALLY_EXTERN (fn) = 1;
+ 	  DECL_EXTERNAL (fn) = 1;
+ 	}
+       /* Remember this function.  In finish_file we'll decide if
+ 	 we actually need to write this function out.  */
+       defer_fn (fn);
+       /* Let the back-end know that this function exists.  */
+       (*debug_hooks->deferred_inline_function) (fn);
+       return;
+     }
+ 
+   really_expand_body (fn);
  }
  
  /* Helper function for walk_tree, used by finish_function to override all


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]