This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[RFC/patch] Callgraph based inlining heuristics
- From: Jan Hubicka <jh at suse dot cz>
- To: gcc at gcc dot gnu dot org, mark at codesourcery dot com, rth at redhat dot com,aj at suse dot de
- Date: Mon, 23 Jun 2003 01:20:32 +0200
- Subject: [RFC/patch] Callgraph based inlining heuristics
Hi,
I finally got C++ unit-at-a-time into shape so it don't consume all my free
time and got into implementing first try of my proposed inlining heuristics
that uses callgraph to decide on inlining top-down while inline using current
code bottom-up hopefully getting better from the both worlds.
I am attaching somewhat raw implementation of the idea. At the moment I am
using cgraph edges to flag on whether particular inlining will happen so
function can be inlinined into once caller but not inlined into another, but
the decisions must be transitive.
(that is not the case of attribute leafify proposed - doing the datastructure
in full generality allowing each particular walk in the callgraph to be either
collapsed or not is probably too expensive so attribute leafify can be
implemented as a special case).
I sort the functions topologically and then do 3 stages - in the first stage I
set inline to all callers of always_inline functions watching for the cycles.
After flag is set the size of function bodies is update by adding the size of
inlined body minus 1. (this is the same as estimation in tree-inline.c. Not
very good. We can consider choosing functions that may inline and trying to
analyze them harder, in the extreme case even produce rtl and profile and
copmpute average amount of insns executed).
In the second stage I proceed topologically again inline small functions
updating the size of bodies and in the third stage I decide on inlining once.
I've added new constant specifying large functions (30000 statements) and I am
limiting growth of the large functions by factor of 2. This avoids quadratic
bottleneck seen on insn-recog during bootstrap.
I tested libstdc++ compilation with -O2 and get 10% savings of the .so library
size (with debug info in) and about 5% compiler speedup. I also tried PR/8362 testcase.
With -O2 -fno-unit-at-a-time I get:
garbage collection : 14.43 (11%) usr 0.03 ( 1%) sys 14.49 (11%) wall
cfg construction : 1.83 ( 1%) usr 0.00 ( 0%) sys 1.83 ( 1%) wall
cfg cleanup : 2.79 ( 2%) usr 0.01 ( 0%) sys 2.80 ( 2%) wall
trivially dead code : 1.95 ( 1%) usr 0.00 ( 0%) sys 1.95 ( 1%) wall
life analysis : 4.47 ( 3%) usr 0.00 ( 0%) sys 4.47 ( 3%) wall
life info update : 1.91 ( 1%) usr 0.00 ( 0%) sys 1.91 ( 1%) wall
alias analysis : 3.03 ( 2%) usr 0.05 ( 2%) sys 3.08 ( 2%) wall
register scan : 1.16 ( 1%) usr 0.01 ( 0%) sys 1.17 ( 1%) wall
rebuild jump labels : 0.80 ( 1%) usr 0.00 ( 0%) sys 0.81 ( 1%) wall
preprocessing : 0.39 ( 0%) usr 0.06 ( 2%) sys 0.44 ( 0%) wall
parser : 17.02 (13%) usr 1.23 (43%) sys 18.32 (13%) wall
name lookup : 8.08 ( 6%) usr 0.84 (29%) sys 8.92 ( 6%) wall
expand : 13.09 (10%) usr 0.12 ( 4%) sys 13.26 (10%) wall
varconst : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
integration : 4.55 ( 3%) usr 0.03 ( 1%) sys 4.60 ( 3%) wall
jump : 0.76 ( 1%) usr 0.01 ( 0%) sys 0.77 ( 1%) wall
CSE : 13.96 (10%) usr 0.01 ( 0%) sys 14.01 (10%) wall
global CSE : 7.38 ( 5%) usr 0.01 ( 0%) sys 7.39 ( 5%) wall
loop analysis : 1.00 ( 1%) usr 0.01 ( 0%) sys 1.01 ( 1%) wall
bypass jumps : 1.12 ( 1%) usr 0.01 ( 0%) sys 1.13 ( 1%) wall
CSE 2 : 5.69 ( 4%) usr 0.01 ( 0%) sys 5.72 ( 4%) wall
branch prediction : 1.45 ( 1%) usr 0.01 ( 0%) sys 1.46 ( 1%) wall
flow analysis : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
combiner : 3.54 ( 3%) usr 0.00 ( 0%) sys 3.55 ( 3%) wall
if-conversion : 0.37 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
regmove : 0.84 ( 1%) usr 0.00 ( 0%) sys 0.84 ( 1%) wall
local alloc : 2.53 ( 2%) usr 0.02 ( 1%) sys 2.55 ( 2%) wall
global alloc : 5.47 ( 4%) usr 0.05 ( 2%) sys 5.52 ( 4%) wall
reload CSE regs : 2.24 ( 2%) usr 0.00 ( 0%) sys 2.24 ( 2%) wall
flow 2 : 0.64 ( 0%) usr 0.00 ( 0%) sys 0.64 ( 0%) wall
if-conversion 2 : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
peephole 2 : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall
rename registers : 3.18 ( 2%) usr 0.01 ( 0%) sys 3.19 ( 2%) wall
scheduling 2 : 3.23 ( 2%) usr 0.10 ( 3%) sys 3.34 ( 2%) wall
reorder blocks : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall
shorten branches : 0.56 ( 0%) usr 0.02 ( 1%) sys 0.58 ( 0%) wall
final : 1.02 ( 1%) usr 0.16 ( 6%) sys 1.18 ( 1%) wall
symout : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
rest of compilation : 2.53 ( 2%) usr 0.04 ( 1%) sys 2.58 ( 2%) wall
TOTAL : 134.67 2.86 137.79
With -O2 -funit-at-a-time I get:
Execution times (seconds)
garbage collection : 10.21 (11%) usr 0.05 ( 2%) sys 10.29 (11%) wall
cfg construction : 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 1%) wall
cfg cleanup : 1.38 ( 1%) usr 0.01 ( 0%) sys 1.39 ( 1%) wall
trivially dead code : 1.03 ( 1%) usr 0.01 ( 0%) sys 1.04 ( 1%) wall
life analysis : 3.29 ( 4%) usr 0.00 ( 0%) sys 3.35 ( 3%) wall
life info update : 1.52 ( 2%) usr 0.01 ( 0%) sys 1.53 ( 2%) wall
alias analysis : 1.76 ( 2%) usr 0.01 ( 0%) sys 1.77 ( 2%) wall
register scan : 0.64 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall
rebuild jump labels : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall
preprocessing : 0.42 ( 0%) usr 0.05 ( 2%) sys 0.49 ( 1%) wall
parser : 18.16 (20%) usr 1.20 (38%) sys 19.48 (20%) wall
name lookup : 7.03 ( 8%) usr 1.08 (34%) sys 8.14 ( 8%) wall
expand : 4.37 ( 5%) usr 0.05 ( 2%) sys 4.43 ( 5%) wall
varconst : 2.86 ( 3%) usr 0.43 (13%) sys 3.31 ( 3%) wall
integration : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
jump : 0.37 ( 0%) usr 0.01 ( 0%) sys 0.37 ( 0%) wall
CSE : 8.19 ( 9%) usr 0.00 ( 0%) sys 8.21 ( 9%) wall
global CSE : 3.19 ( 3%) usr 0.00 ( 0%) sys 3.20 ( 3%) wall
loop analysis : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 1%) wall
bypass jumps : 0.78 ( 1%) usr 0.00 ( 0%) sys 0.79 ( 1%) wall
CSE 2 : 4.07 ( 4%) usr 0.00 ( 0%) sys 4.07 ( 4%) wall
branch prediction : 0.87 ( 1%) usr 0.00 ( 0%) sys 0.87 ( 1%) wall
flow analysis : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
combiner : 2.15 ( 2%) usr 0.01 ( 0%) sys 2.16 ( 2%) wall
if-conversion : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
regmove : 0.52 ( 1%) usr 0.00 ( 0%) sys 0.52 ( 1%) wall
local alloc : 1.79 ( 2%) usr 0.00 ( 0%) sys 1.79 ( 2%) wall
global alloc : 4.15 ( 4%) usr 0.01 ( 0%) sys 4.16 ( 4%) wall
reload CSE regs : 1.81 ( 2%) usr 0.00 ( 0%) sys 2.17 ( 2%) wall
flow 2 : 0.40 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall
if-conversion 2 : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
peephole 2 : 0.43 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 0%) wall
rename registers : 3.05 ( 3%) usr 0.00 ( 0%) sys 3.05 ( 3%) wall
scheduling 2 : 2.87 ( 3%) usr 0.01 ( 0%) sys 2.88 ( 3%) wall
reorder blocks : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall
shorten branches : 0.28 ( 0%) usr 0.02 ( 1%) sys 0.30 ( 0%) wall
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
final : 0.93 ( 1%) usr 0.19 ( 6%) sys 1.13 ( 1%) wall
symout : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
rest of compilation : 1.33 ( 1%) usr 0.00 ( 0%) sys 1.33 ( 1%) wall
TOTAL : 92.61 3.19 96.50
And I get 24% fewer instructions produced.
I also tested it on some of C SPEC2000 sources and it does not seem to supress
any important inlining. I hope to do some behchmarking tomorrow as I need some
sleep now.
So :) any comments are welcome. Also I would like to know about other
testcases where inliner behaves in broken way.
Honza
*** cgraph.c.in Sun Jun 22 23:47:35 2003
--- cgraph.c Mon Jun 23 00:26:38 2003
*************** create_edge (caller, callee)
*** 160,165 ****
--- 160,179 ----
struct cgraph_node *caller, *callee;
{
struct cgraph_edge *edge = xmalloc (sizeof (struct cgraph_edge));
+ struct cgraph_edge *edge2;
+
+ edge->inline_call = false;
+ /* At the moment we don't associate calls with specific CALL_EXPRs
+ as we probably ought to, so we must preserve inline_call flags to
+ be the same in all copies of the same edge. */
+ if (cgraph_global_info_ready)
+ for (edge2 = caller->callees; edge2; edge2 = edge2->next_caller)
+ if (edge2->callee == callee)
+ {
+ edge->inline_call = edge2->inline_call;
+ break;
+ }
+
edge->caller = caller;
edge->callee = callee;
*** cgraph.h.in Sun Jun 22 21:56:33 2003
--- cgraph.h Mon Jun 23 01:36:16 2003
*************** struct cgraph_local_info
*** 30,40 ****
/* Set when function function is visiable in current compilation unit only
and it's address is never taken. */
bool local;
! /* Set when function is small enought to be inlinable many times. */
! bool inline_many;
! /* Set when function can be inlined once (false only for functions calling
! alloca, using varargs and so on). */
! bool can_inline_once;
};
/* Information about the function that needs to be computed globally
--- 30,42 ----
/* Set when function function is visiable in current compilation unit only
and it's address is never taken. */
bool local;
!
! /* False when there is something making inlining impossible (such as va_arg) */
! bool inlinable;
! /* True when function should be inlined independently on it's size. */
! bool disgread_inline_limits;
! /* Size of the function before inlining. */
! int self_insns;
};
/* Information about the function that needs to be computed globally
*************** struct cgraph_global_info
*** 44,49 ****
--- 46,54 ----
{
/* Set when the function will be inlined exactly once. */
bool inline_once;
+
+ /* Estimated size of the function after inlining. */
+ int insns;
};
/* Information about the function that is propagated by the RTL backend.
*************** struct cgraph_edge
*** 95,100 ****
--- 100,106 ----
struct cgraph_node *caller, *callee;
struct cgraph_edge *next_caller;
struct cgraph_edge *next_callee;
+ bool inline_call;
};
extern struct cgraph_node *cgraph_nodes;
*************** void cgraph_finalize_compilation_unit PA
*** 120,124 ****
--- 126,131 ----
void cgraph_create_edges PARAMS ((tree, tree));
void cgraph_optimize PARAMS ((void));
void cgraph_mark_needed_node PARAMS ((struct cgraph_node *, int));
+ bool cgraph_inline_p PARAMS ((tree, tree));
#endif /* GCC_CGRAPH_H */
*** cgraphunit.c.in Sun Jun 22 21:56:26 2003
--- cgraphunit.c Mon Jun 23 01:38:47 2003
*************** Software Foundation, 59 Temple Place - S
*** 35,47 ****
#include "cgraph.h"
#include "varpool.h"
#include "diagnostic.h"
static void cgraph_expand_functions PARAMS ((void));
static void cgraph_mark_functions_to_output PARAMS ((void));
static void cgraph_expand_function PARAMS ((struct cgraph_node *));
static tree record_call_1 PARAMS ((tree *, int *, void *));
static void cgraph_mark_local_functions PARAMS ((void));
- static void cgraph_mark_functions_to_inline_once PARAMS ((void));
static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
/* Analyze function once it is parsed. Set up the local information
--- 35,53 ----
#include "cgraph.h"
#include "varpool.h"
#include "diagnostic.h"
+ #include "c-common.h"
+ #include "params.h"
+
+ #define INSNS_PER_STMT 10
+ #define INSNS_PER_CALL 10
+ #define LARGE_FUNCTION_INSNS 30000
+ #define LARGE_FUNCTION_GROWTH 200
static void cgraph_expand_functions PARAMS ((void));
static void cgraph_mark_functions_to_output PARAMS ((void));
static void cgraph_expand_function PARAMS ((struct cgraph_node *));
static tree record_call_1 PARAMS ((tree *, int *, void *));
static void cgraph_mark_local_functions PARAMS ((void));
static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
/* Analyze function once it is parsed. Set up the local information
*************** cgraph_finalize_function (decl, body)
*** 75,88 ****
cgraph_mark_needed_node (node, 1);
}
! if (!node->needed && !DECL_COMDAT (node->decl))
! node->local.can_inline_once = tree_inlinable_function_p (decl, 1);
! else
! node->local.can_inline_once = 0;
! if (flag_inline_trees)
! node->local.inline_many = tree_inlinable_function_p (decl, 0);
! else
! node->local.inline_many = 0;
(*debug_hooks->deferred_inline_function) (decl);
}
--- 81,91 ----
cgraph_mark_needed_node (node, 1);
}
! node->local.inlinable = tree_inlinable_function_p (decl, 1);
! node->local.self_insns = DECL_NUM_STMTS (decl) * INSNS_PER_STMT;
! if (node->local.inlinable)
! node->local.disgread_inline_limits
! = (*lang_hooks.tree_inlining.disregard_inline_limits) (decl);
(*debug_hooks->deferred_inline_function) (decl);
}
*************** cgraph_mark_functions_to_output ()
*** 231,244 ****
for (node = cgraph_nodes; node; node = node->next)
{
tree decl = node->decl;
/* We need to output all local functions that are used and not
always inlined, as well as those that are reachable from
outside the current compilation unit. */
if (DECL_SAVED_TREE (decl)
&& (node->needed
! || (!node->local.inline_many && !node->global.inline_once
! && node->reachable)
|| (DECL_ASSEMBLER_NAME_SET_P (decl)
&& TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
&& !TREE_ASM_WRITTEN (decl) && !node->origin
--- 234,251 ----
for (node = cgraph_nodes; node; node = node->next)
{
tree decl = node->decl;
+ struct cgraph_edge *e;
+
+ for (e = node->callers; e; e = e->next_caller)
+ if (!e->inline_call)
+ break;
/* We need to output all local functions that are used and not
always inlined, as well as those that are reachable from
outside the current compilation unit. */
if (DECL_SAVED_TREE (decl)
&& (node->needed
! || (e && node->reachable)
|| (DECL_ASSEMBLER_NAME_SET_P (decl)
&& TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
&& !TREE_ASM_WRITTEN (decl) && !node->origin
*************** cgraph_optimize_function (node)
*** 255,260 ****
--- 262,269 ----
{
tree decl = node->decl;
+ /* optimize_inline_calls avoid inlining of current_function_decl. */
+ current_function_decl = 0;
if (flag_inline_trees)
optimize_inline_calls (decl);
if (node->nested)
*************** cgraph_expand_function (node)
*** 271,276 ****
--- 280,286 ----
struct cgraph_node *node;
{
tree decl = node->decl;
+ struct cgraph_edge *e;
announce_function (decl);
*************** cgraph_expand_function (node)
*** 280,320 ****
via lang_expand_decl_stmt. */
(*lang_hooks.callgraph.expand_function) (decl);
! /* When we decided to inline the function once, we never ever should
! need to output it separately. */
! if (node->global.inline_once)
! abort ();
! if (!node->local.inline_many
! || !node->callers)
DECL_SAVED_TREE (decl) = NULL;
current_function_decl = NULL;
}
!
! /* Expand all functions that must be output.
!
! Attempt to topologically sort the nodes so function is output when
! all called functions are already assembled to allow data to be
! propagated accross the callgraph. Use a stack to get smaller distance
! between a function and it's callees (later we may choose to use a more
! sophisticated algorithm for function reordering; we will likely want
! to use subsections to make the output functions appear in top-down
! order. */
!
! static void
! cgraph_expand_functions ()
{
struct cgraph_node *node, *node2;
- struct cgraph_node **stack =
- xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
- struct cgraph_node **order =
- xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
int stack_size = 0;
int order_pos = 0;
struct cgraph_edge *edge, last;
- int i;
! cgraph_mark_functions_to_output ();
/* We have to deal with cycles nicely, so use a depth first traversal
output algorithm. Ignore the fact that some functions won't need
--- 290,315 ----
via lang_expand_decl_stmt. */
(*lang_hooks.callgraph.expand_function) (decl);
! for (e = node->callers; e; e = e->next_caller)
! if (e->inline_call)
! break;
! if (e)
DECL_SAVED_TREE (decl) = NULL;
current_function_decl = NULL;
}
! /* Fill array order with all nodes with output flag set in the reverse
! topological order. */
! static int
! cgraph_postorder (struct cgraph_node **order)
{
struct cgraph_node *node, *node2;
int stack_size = 0;
int order_pos = 0;
struct cgraph_edge *edge, last;
! struct cgraph_node **stack =
! xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
/* We have to deal with cycles nicely, so use a depth first traversal
output algorithm. Ignore the fact that some functions won't need
*************** cgraph_expand_functions ()
*** 360,487 ****
}
}
}
! for (i = order_pos - 1; i >= 0; i--)
{
! node = order[i];
! if (node->output)
{
! if (!node->reachable)
! abort ();
! node->output = 0;
! cgraph_expand_function (node);
}
}
free (stack);
! free (order);
}
! /* Mark all local functions.
! We can not use node->needed directly as it is modified during
! execution of cgraph_optimize. */
static void
! cgraph_mark_local_functions ()
{
struct cgraph_node *node;
! if (!quiet_flag)
! fprintf (stderr, "\n\nMarking local functions:");
- /* Figure out functions we want to assemble. */
for (node = cgraph_nodes; node; node = node->next)
{
! node->local.local = (!node->needed
! && DECL_SAVED_TREE (node->decl)
! && !DECL_COMDAT (node->decl)
! && !TREE_PUBLIC (node->decl));
! if (node->local.local)
! announce_function (node->decl);
}
! }
! /* Decide what function should be inlined because they are invoked once
! (so inlining won't result in duplication of the code). */
! static void
! cgraph_mark_functions_to_inline_once ()
! {
! struct cgraph_node *node, *node1;
if (!quiet_flag)
fprintf (stderr, "\n\nMarking functions to inline once:");
! /* Now look for function called only once and mark them to inline.
! From this point number of calls to given function won't grow. */
! for (node = cgraph_nodes; node; node = node->next)
{
if (node->callers && !node->callers->next_caller && !node->needed
! && node->local.can_inline_once)
{
bool ok = true;
/* Verify that we won't duplicate the caller. */
for (node1 = node->callers->caller;
! node1->local.inline_many
! && node1->callers
! && ok;
! node1 = node1->callers->caller)
if (node1->callers->next_caller || node1->needed)
ok = false;
if (ok)
{
! node->global.inline_once = true;
! announce_function (node->decl);
}
}
}
}
/* Perform simple optimizations based on callgraph. */
void
cgraph_optimize ()
{
- struct cgraph_node *node;
- bool changed = true;
-
cgraph_mark_local_functions ();
! cgraph_mark_functions_to_inline_once ();
cgraph_global_info_ready = true;
if (!quiet_flag)
fprintf (stderr, "\n\nAssembling functions:");
! /* Output everything.
! ??? Our inline heuristic may decide to not inline functions previously
! marked as inlinable thus adding new function bodies that must be output.
! Later we should move all inlining decisions to callgraph code to make
! this impossible. */
cgraph_expand_functions ();
- if (!quiet_flag)
- fprintf (stderr, "\n\nAssembling functions that failed to inline:");
- while (changed && !errorcount && !sorrycount)
- {
- changed = false;
- for (node = cgraph_nodes; node; node = node->next)
- {
- tree decl = node->decl;
- if (!node->origin
- && !TREE_ASM_WRITTEN (decl)
- && DECL_SAVED_TREE (decl)
- && !DECL_EXTERNAL (decl))
- {
- struct cgraph_edge *edge;
-
- for (edge = node->callers; edge; edge = edge->next_caller)
- if (TREE_ASM_WRITTEN (edge->caller->decl))
- {
- changed = true;
- cgraph_expand_function (node);
- break;
- }
- }
- }
- }
}
--- 355,818 ----
}
}
}
! free (stack);
! return order_pos;
! }
!
! #define INLINED_TIMES(node) ((size_t)(node)->aux)
! #define SET_INLINED_TIMES(node,times) ((node)->aux = (void *)(times))
!
! /* Return list of nodes we decided to inline NODE into, set their output
! flag and compute INLINED_TIMES.
!
! We do simple backgracing to get INLINED_TIMES right. This should not be
! expensive as we limit amount of inlining. Alternatively we may first
! discover set of nodes, to topological sort on these and propagate
! information. */
! static int
! cgraph_inlined_into (struct cgraph_node *node, struct cgraph_node **array)
! {
! int nfound = 0;
! struct cgraph_edge **stack;
! struct cgraph_edge *e, *e1;
! int sp;
! int i;
!
! /* Fast path: since we traverse in mostly topological order, we will likely
! find no edges. */
! for (e = node->callers; e; e = e->next_caller)
! if (e->inline_call)
! break;
!
! if (!e)
! return 0;
!
! /* Allocate stack for back-tracking up callgraph. */
! stack = xmalloc ((cgraph_n_nodes + 1) * sizeof (struct cgraph_edge));
! sp = 0;
!
! /* Push the first edge on to the stack. */
! stack[sp++] = e;
!
! while (sp)
{
! struct cgraph_node *caller;
!
! /* Look at the edge on the top of the stack. */
! e = stack[sp - 1];
! caller = e->caller;
!
! /* Check if the caller destination has been visited yet. */
! if (!caller->output)
{
! array[nfound++] = e->caller;
! /* Mark that we have visited the destination. */
! caller->output = true;
! SET_INLINED_TIMES (caller, 0);
! }
! SET_INLINED_TIMES (caller, INLINED_TIMES (caller) + 1);
!
! for (e1 = caller->callers; e1; e1 = e1->next_caller)
! if (e1->inline_call)
! break;
! if (e1)
! stack[sp++] = e1;
! else
! {
! while (true)
! {
! for (e1 = e->next_caller; e1; e1 = e1->next_caller)
! if (e1->inline_call)
! break;
!
! if (e1)
! {
! stack[sp - 1] = e1;
! break;
! }
! else
! {
! sp--;
! if (!sp)
! break;
! e = stack[sp - 1];
! }
! }
}
}
+
free (stack);
!
!
! if (!quiet_flag)
! {
! fprintf (stderr, "\nFound inline predecesors of ");
! announce_function (node->decl);
! fprintf (stderr, ":");
! for (i = 0; i < nfound; i++)
! {
! announce_function (array[i]->decl);
! if (INLINED_TIMES (array[i]) != 1)
! fprintf (stderr, " (%i times)", INLINED_TIMES (array[i]));
! }
! fprintf (stderr, "\n");
! }
!
! return nfound;
}
! /* Estimate size of the function after inlining WHAT into TO. */
! static int
! cgraph_estimate_size_after_inlining (int times,
! struct cgraph_node *to,
! struct cgraph_node *what)
! {
! return to->global.insns + (what->global.insns - INSNS_PER_CALL * times);
! }
+ /* Update insn sizes after inlining WHAT into TO that is already inlined into
+ all nodes in INLINED array. */
static void
! cgraph_mark_inline (struct cgraph_node *to,
! struct cgraph_node *what,
! struct cgraph_node **inlined, int ninlined)
! {
! int i;
! int times = 0;
! struct cgraph_edge *e;
!
! for (e = to->callees; e; e = e->next_callee)
! if (e->callee == what)
! {
! if (e->inline_call)
! abort ();
! e->inline_call = true;
! times++;
! }
! if (!times)
! abort ();
!
! to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
! for (i = 0; i < ninlined; i++)
! inlined[i]->global.insns =
! cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) * times,
! inlined[i], what);
! }
!
! /* Return false when inlining WHAT into TO is not good idea as it would cause
! too large growth of function bodies. */
! static bool
! cgraph_check_inline_limits (struct cgraph_node *to,
! struct cgraph_node *what,
! struct cgraph_node **inlined, int ninlined)
! {
! int i;
! int times = 0;
! struct cgraph_edge *e;
! int newsize;
!
! for (e = to->callees; e; e = e->next_callee)
! if (e->callee == what)
! times++;
!
! to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
! newsize = cgraph_estimate_size_after_inlining (times, to, what);
! if (newsize > LARGE_FUNCTION_INSNS
! && newsize > to->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
! return false;
! for (i = 0; i < ninlined; i++)
! {
! newsize =
! cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) *
! times, inlined[i], what);
! if (newsize > LARGE_FUNCTION_INSNS
! && newsize >
! inlined[i]->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
! return false;
! }
! return true;
! }
!
! /* Return true when function N is small enought to be inlined. */
! static bool
! cgraph_default_inline_p (struct cgraph_node *n)
! {
! if (!DECL_INLINE (n->decl))
! return false;
! if (DID_INLINE_FUNC (n->decl))
! return n->global.insns < MAX_INLINE_INSNS_AUTO;
! else
! return n->global.insns < MAX_INLINE_INSNS_SINGLE;
! }
!
! static void
! cgraph_decide_inlining (void)
{
struct cgraph_node *node;
+ int nnodes;
+ struct cgraph_node **order =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ struct cgraph_node **inlined =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ int ninlined;
+ int i, y;
! /* Cgraph_postorder ignore nodes with no node->output set. */
! for (node = cgraph_nodes; node; node = node->next)
! {
! node->global.insns = node->local.self_insns;
! if (DECL_SAVED_TREE (node->decl))
! node->output = 1;
! }
!
! nnodes = cgraph_postorder (order);
for (node = cgraph_nodes; node; node = node->next)
+ node->output = 0;
+
+ /* In the first pass mark all always_inline edges. Do this with a priority
+ so no our decisions makes this impossible. */
+ if (!quiet_flag)
+ fprintf (stderr, "\n\nDeciding on always_inline functions:");
+ for (i = nnodes - 1; i >= 0; i--)
{
! struct cgraph_edge *e;
!
! node = order[i];
!
! for (e = node->callees; e; e = e->next_callee)
! if (e->callee->local.disgread_inline_limits)
! break;
! if (!e)
! continue;
! ninlined = cgraph_inlined_into (order[i], inlined);
! for (; e; e = e->next_callee)
! {
! if (e->inline_call || !e->callee->local.disgread_inline_limits)
! continue;
! if (e->callee->output || e->callee == node)
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Always inline flag ignored in respect to `%s' to avoid cycle");
! continue;
! }
! if (!quiet_flag)
! {
! fprintf (stderr, "Always inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "\n");
! }
! cgraph_mark_inline (node, e->callee, inlined, ninlined);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0;
}
! /* In the first pass mark all always_inline edges. Do this with a priority
! so no our decisions makes this impossible. */
! if (!quiet_flag)
! fprintf (stderr, "\n\nDeciding on inlining:");
! for (i = nnodes - 1; i >= 0; i--)
! {
! struct cgraph_edge *e;
! node = order[i];
! for (e = node->callees; e; e = e->next_callee)
! if (!e->inline_call && cgraph_default_inline_p (e->callee))
! break;
! if (!e)
! continue;
! ninlined = cgraph_inlined_into (order[i], inlined);
! for (; e; e = e->next_callee)
! {
! if (e->inline_call || !cgraph_default_inline_p (e->callee))
! continue;
! if (e->callee == node)
! continue;
! if (e->callee->output)
! {
! if (warn_inline && DECL_INLINE (e->callee->decl))
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Not inlining to `%s' to avoid cycle");
! }
! continue;
! }
! if (!cgraph_check_inline_limits
! (node, e->callee, inlined, ninlined))
! {
! if (!quiet_flag)
! {
! fprintf (stderr, "Not inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "as inline limits exceeds\n");
! }
! if (warn_inline && DECL_INLINE (e->callee->decl))
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Not inlining to `%s' to avoid too large function bodies");
! }
! continue;
! }
! if (!quiet_flag)
! {
! fprintf (stderr, "Inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "\n");
! }
! cgraph_mark_inline (node, e->callee, inlined, ninlined);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0;
! }
if (!quiet_flag)
fprintf (stderr, "\n\nMarking functions to inline once:");
! for (i = nnodes - 1; i >= 0; i--)
{
+ node = order[i];
+
if (node->callers && !node->callers->next_caller && !node->needed
! && node->local.inlinable && !node->callers->inline_call)
{
bool ok = true;
+ struct cgraph_node *node1;
/* Verify that we won't duplicate the caller. */
for (node1 = node->callers->caller;
! node1->callers && node1->callers->inline_call
! && ok; node1 = node1->callers->caller)
if (node1->callers->next_caller || node1->needed)
ok = false;
if (ok)
{
! ninlined = cgraph_inlined_into (node, inlined);
! if (!cgraph_check_inline_limits
! (node->callers->caller, node, inlined, ninlined))
! {
! if (!quiet_flag)
! {
! fprintf (stderr, "Not inlining once");
! announce_function (node->decl);
! fprintf (stderr, " to");
! announce_function (node1->decl);
! fprintf (stderr, "as inline limits exceeds\n");
! }
! }
! else
! {
! cgraph_mark_inline (node->callers->caller, node, inlined,
! ninlined);
! announce_function (node->decl);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0;
}
}
}
+
+ free (order);
+ free (inlined);
+ }
+
+ /* Return true when CALLER_DECL should be inlined into CALLEE_DECL. */
+ bool
+ cgraph_inline_p (tree caller_decl, tree callee_decl)
+ {
+ struct cgraph_node *caller = cgraph_node (caller_decl);
+ struct cgraph_node *callee = cgraph_node (callee_decl);
+ struct cgraph_edge *e;
+
+ for (e = caller->callees; e; e = e->next_callee)
+ if (e->callee == callee)
+ return e->inline_call;
+ abort ();
+ }
+
+ /* Expand all functions that must be output.
+
+ Attempt to topologically sort the nodes so function is output when
+ all called functions are already assembled to allow data to be
+ propagated accross the callgraph. Use a stack to get smaller distance
+ between a function and it's callees (later we may choose to use a more
+ sophisticated algorithm for function reordering; we will likely want
+ to use subsections to make the output functions appear in top-down
+ order. */
+
+ static void
+ cgraph_expand_functions ()
+ {
+ struct cgraph_node *node;
+ struct cgraph_node **order =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ int order_pos = 0;
+ int i;
+
+ cgraph_mark_functions_to_output ();
+
+ order_pos = cgraph_postorder (order);
+
+ for (i = order_pos - 1; i >= 0; i--)
+ {
+ node = order[i];
+ if (node->output)
+ {
+ if (!node->reachable)
+ abort ();
+ node->output = 0;
+ cgraph_expand_function (node);
+ }
+ }
+ free (order);
}
+ /* Mark all local functions.
+ We can not use node->needed directly as it is modified during
+ execution of cgraph_optimize. */
+
+ static void
+ cgraph_mark_local_functions ()
+ {
+ struct cgraph_node *node;
+
+ if (!quiet_flag)
+ fprintf (stderr, "\n\nMarking local functions:");
+
+ /* Figure out functions we want to assemble. */
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ node->local.local = (!node->needed
+ && DECL_SAVED_TREE (node->decl)
+ && !DECL_COMDAT (node->decl)
+ && !TREE_PUBLIC (node->decl));
+ if (node->local.local)
+ announce_function (node->decl);
+ }
+ }
/* Perform simple optimizations based on callgraph. */
void
cgraph_optimize ()
{
cgraph_mark_local_functions ();
! cgraph_decide_inlining ();
cgraph_global_info_ready = true;
if (!quiet_flag)
fprintf (stderr, "\n\nAssembling functions:");
! /* Output everything. */
cgraph_expand_functions ();
}
*** tree-inline.c.in Mon Jun 23 00:50:08 2003
--- tree-inline.c Mon Jun 23 01:24:02 2003
*************** typedef struct inline_data
*** 106,111 ****
--- 106,112 ----
htab_t tree_pruner;
/* Decl of function we are inlining into. */
tree decl;
+ tree current_decl;
} inline_data;
/* Prototypes. */
*************** expand_call_inline (tp, walk_subtrees, d
*** 1204,1212 ****
/* Don't try to inline functions that are not well-suited to
inlining. */
! if ((!flag_unit_at_a_time || !DECL_SAVED_TREE (fn)
! || !cgraph_global_info (fn)->inline_once)
! && !inlinable_function_p (fn, id, 0))
{
if (warn_inline && DECL_INLINE (fn))
{
--- 1205,1213 ----
/* Don't try to inline functions that are not well-suited to
inlining. */
! if (!DECL_SAVED_TREE (fn)
! || (flag_unit_at_a_time && !cgraph_inline_p (id->current_decl, fn))
! || (!flag_unit_at_a_time && !inlinable_function_p (fn, id, 0)))
{
if (warn_inline && DECL_INLINE (fn))
{
*************** expand_call_inline (tp, walk_subtrees, d
*** 1451,1457 ****
}
/* Recurse into the body of the just inlined function. */
! expand_calls_inline (inlined_body, id);
VARRAY_POP (id->fns);
/* If we've returned to the top level, clear out the record of how
--- 1452,1463 ----
}
/* Recurse into the body of the just inlined function. */
! {
! tree old_decl = id->current_decl;
! id->current_decl = fn;
! expand_calls_inline (inlined_body, id);
! id->current_decl = old_decl;
! }
VARRAY_POP (id->fns);
/* If we've returned to the top level, clear out the record of how
*************** optimize_inline_calls (fn)
*** 1497,1502 ****
--- 1503,1509 ----
memset (&id, 0, sizeof (id));
id.decl = fn;
+ id.current_decl = fn;
/* Don't allow recursion into FN. */
VARRAY_TREE_INIT (id.fns, 32, "fns");
VARRAY_PUSH_TREE (id.fns, fn);