This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC/patch] Callgraph based inlining heuristics
> On Mon, 23 Jun 2003, Jan Hubicka wrote:
> > I tested libstdc++ compilation with -O2 and get 10% savings of the .so
> > library size (with debug info in) and about 5% compiler speedup. I also
> > tried PR/8362 testcase. With -O2 -fno-unit-at-a-time I get:
>
> That's 8361, right?
Yes... it is that screen trick again (I tend to pres ^a^n in my vi and
it increase nearest integer when screen is not running. Sometimes
anoying bug to discover when you do it to random source)
>
> > TOTAL : 134.67 2.86 137.79
> >
> > With -O2 -funit-at-a-time I get:
> >
> > TOTAL : 92.61 3.19 96.50
> >
> > And I get 24% fewer instructions produced.
>
> Wow, that's impressive! How's the situation for -O3 (because only for
> -O3 do we full inlining, as far as I understood)?
Just tiny increase in compile time
Execution times (seconds)
garbage collection : 11.31 (12%) usr 0.07 ( 2%) sys 11.53 (11%) wall
cfg construction : 0.96 ( 1%) usr 0.02 ( 1%) sys 0.98 ( 1%) wall
cfg cleanup : 1.54 ( 2%) usr 0.04 ( 1%) sys 1.59 ( 2%) wall
trivially dead code : 0.86 ( 1%) usr 0.00 ( 0%) sys 0.86 ( 1%) wall
life analysis : 3.18 ( 3%) usr 0.00 ( 0%) sys 3.21 ( 3%) wall
life info update : 1.42 ( 1%) usr 0.00 ( 0%) sys 1.42 ( 1%) wall
alias analysis : 1.55 ( 2%) usr 0.06 ( 2%) sys 1.62 ( 2%) wall
register scan : 0.58 ( 1%) usr 0.00 ( 0%) sys 0.58 ( 1%) wall
rebuild jump labels : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.34 ( 0%) wall
preprocessing : 0.34 ( 0%) usr 0.06 ( 2%) sys 0.41 ( 0%) wall
parser : 18.22 (19%) usr 1.19 (33%) sys 19.89 (20%) wall
name lookup : 7.33 ( 8%) usr 1.05 (29%) sys 8.45 ( 8%) wall
expand : 5.35 ( 6%) usr 0.16 ( 4%) sys 5.57 ( 5%) wall
varconst : 3.18 ( 3%) usr 0.64 (18%) sys 3.90 ( 4%) wall
integration : 0.47 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall
jump : 0.48 ( 0%) usr 0.00 ( 0%) sys 0.48 ( 0%) wall
CSE : 9.10 ( 9%) usr 0.02 ( 1%) sys 9.15 ( 9%) wall
global CSE : 3.07 ( 3%) usr 0.01 ( 0%) sys 3.08 ( 3%) wall
loop analysis : 1.22 ( 1%) usr 0.00 ( 0%) sys 1.23 ( 1%) wall
bypass jumps : 0.72 ( 1%) usr 0.02 ( 1%) sys 0.74 ( 1%) wall
CSE 2 : 4.13 ( 4%) usr 0.00 ( 0%) sys 4.15 ( 4%) wall
branch prediction : 0.98 ( 1%) usr 0.00 ( 0%) sys 0.98 ( 1%) wall
flow analysis : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
combiner : 1.97 ( 2%) usr 0.03 ( 1%) sys 2.01 ( 2%) wall
if-conversion : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
regmove : 0.49 ( 1%) usr 0.00 ( 0%) sys 0.49 ( 0%) wall
local alloc : 1.75 ( 2%) usr 0.01 ( 0%) sys 1.76 ( 2%) wall
global alloc : 3.59 ( 4%) usr 0.05 ( 1%) sys 3.64 ( 4%) wall
reload CSE regs : 1.44 ( 1%) usr 0.00 ( 0%) sys 1.44 ( 1%) wall
flow 2 : 0.56 ( 1%) usr 0.01 ( 0%) sys 0.57 ( 1%) wall
if-conversion 2 : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
peephole 2 : 0.49 ( 1%) usr 0.00 ( 0%) sys 0.49 ( 0%) wall
rename registers : 3.70 ( 4%) usr 0.00 ( 0%) sys 3.75 ( 4%) wall
scheduling 2 : 2.97 ( 3%) usr 0.06 ( 2%) sys 3.03 ( 3%) wall
reorder blocks : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall
shorten branches : 0.36 ( 0%) usr 0.01 ( 0%) sys 0.37 ( 0%) wall
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
final : 1.01 ( 1%) usr 0.10 ( 3%) sys 1.11 ( 1%) wall
symout : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
rest of compilation : 1.43 ( 1%) usr 0.01 ( 0%) sys 1.47 ( 1%) wall
TOTAL : 96.95 3.65 101.66
and proportional increse of code size (it is still smaller than without
unit-at-a-time and -O2). The compile time also seems to scale better
with increasing inline limites. With set it to 1000 instead of 300
I get with unit-at-a-time:
Execution times (seconds)
garbage collection : 14.64 (12%) usr 0.06 ( 2%) sys 14.71 (11%) wall
cfg construction : 1.38 ( 1%) usr 0.02 ( 1%) sys 1.40 ( 1%) wall
cfg cleanup : 2.80 ( 2%) usr 0.02 ( 1%) sys 2.82 ( 2%) wall
trivially dead code : 1.62 ( 1%) usr 0.00 ( 0%) sys 1.62 ( 1%) wall
life analysis : 3.92 ( 3%) usr 0.00 ( 0%) sys 3.92 ( 3%) wall
life info update : 1.91 ( 2%) usr 0.00 ( 0%) sys 1.91 ( 1%) wall
alias analysis : 2.42 ( 2%) usr 0.18 ( 5%) sys 2.60 ( 2%) wall
register scan : 1.02 ( 1%) usr 0.00 ( 0%) sys 1.02 ( 1%) wall
rebuild jump labels : 0.51 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%) wall
preprocessing : 0.55 ( 0%) usr 0.10 ( 3%) sys 0.66 ( 1%) wall
parser : 18.12 (15%) usr 1.26 (34%) sys 19.38 (15%) wall
name lookup : 7.29 ( 6%) usr 0.88 (24%) sys 8.21 ( 6%) wall
expand : 9.25 ( 7%) usr 0.14 ( 4%) sys 9.39 ( 7%) wall
varconst : 4.92 ( 4%) usr 0.52 (14%) sys 5.44 ( 4%) wall
integration : 0.79 ( 1%) usr 0.00 ( 0%) sys 0.79 ( 1%) wall
jump : 0.64 ( 1%) usr 0.01 ( 0%) sys 0.65 ( 1%) wall
CSE : 13.38 (11%) usr 0.03 ( 1%) sys 13.41 (10%) wall
global CSE : 5.34 ( 4%) usr 0.01 ( 0%) sys 5.35 ( 4%) wall
loop analysis : 1.84 ( 1%) usr 0.01 ( 0%) sys 1.86 ( 1%) wall
bypass jumps : 1.03 ( 1%) usr 0.02 ( 1%) sys 1.05 ( 1%) wall
CSE 2 : 5.37 ( 4%) usr 0.01 ( 0%) sys 5.38 ( 4%) wall
branch prediction : 1.21 ( 1%) usr 0.00 ( 0%) sys 1.21 ( 1%) wall
flow analysis : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
combiner : 3.13 ( 3%) usr 0.02 ( 1%) sys 3.16 ( 2%) wall
if-conversion : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
regmove : 0.69 ( 1%) usr 0.00 ( 0%) sys 0.69 ( 1%) wall
local alloc : 2.05 ( 2%) usr 0.04 ( 1%) sys 2.09 ( 2%) wall
global alloc : 4.46 ( 4%) usr 0.02 ( 1%) sys 4.68 ( 4%) wall
reload CSE regs : 1.82 ( 1%) usr 0.02 ( 1%) sys 1.84 ( 1%) wall
flow 2 : 0.52 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall
if-conversion 2 : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
peephole 2 : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.41 ( 0%) wall
rename registers : 3.78 ( 3%) usr 0.06 ( 2%) sys 3.86 ( 3%) wall
scheduling 2 : 3.03 ( 2%) usr 0.07 ( 2%) sys 3.10 ( 2%) wall
reorder blocks : 0.47 ( 0%) usr 0.00 ( 0%) sys 0.47 ( 0%) wall
shorten branches : 0.52 ( 0%) usr 0.02 ( 1%) sys 0.54 ( 0%) wall
final : 1.04 ( 1%) usr 0.13 ( 4%) sys 1.42 ( 1%) wall
symout : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
rest of compilation : 2.25 ( 2%) usr 0.01 ( 0%) sys 2.26 ( 2%) wall
TOTAL : 124.77 3.68 129.00
Without I get:
Execution times (seconds)
garbage collection : 34.31 ( 7%) usr 0.09 ( 1%) sys 34.41 ( 7%) wall
cfg construction : 6.70 ( 1%) usr 0.18 ( 2%) sys 6.88 ( 1%) wall
cfg cleanup : 15.77 ( 3%) usr 0.27 ( 3%) sys 16.04 ( 3%) wall
trivially dead code : 7.48 ( 2%) usr 0.00 ( 0%) sys 7.48 ( 2%) wall
life analysis : 13.09 ( 3%) usr 0.04 ( 0%) sys 13.13 ( 3%) wall
life info update : 9.73 ( 2%) usr 0.01 ( 0%) sys 9.74 ( 2%) wall
alias analysis : 9.57 ( 2%) usr 0.54 ( 5%) sys 10.14 ( 2%) wall
register scan : 3.76 ( 1%) usr 0.01 ( 0%) sys 3.77 ( 1%) wall
rebuild jump labels : 2.92 ( 1%) usr 0.00 ( 0%) sys 2.93 ( 1%) wall
preprocessing : 0.57 ( 0%) usr 0.08 ( 1%) sys 0.65 ( 0%) wall
parser : 17.02 ( 4%) usr 1.33 (13%) sys 18.38 ( 4%) wall
name lookup : 7.49 ( 2%) usr 0.87 ( 8%) sys 8.35 ( 2%) wall
expand : 81.62 (17%) usr 1.25 (12%) sys 83.00 (17%) wall
varconst : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
integration : 11.94 ( 3%) usr 0.35 ( 3%) sys 12.30 ( 3%) wall
jump : 6.41 ( 1%) usr 0.99 (10%) sys 7.79 ( 2%) wall
CSE : 52.91 (11%) usr 0.09 ( 1%) sys 53.01 (11%) wall
global CSE : 76.72 (16%) usr 1.41 (14%) sys 78.20 (16%) wall
loop analysis : 9.77 ( 2%) usr 0.93 ( 9%) sys 10.71 ( 2%) wall
bypass jumps : 3.46 ( 1%) usr 0.25 ( 2%) sys 3.72 ( 1%) wall
CSE 2 : 17.70 ( 4%) usr 0.02 ( 0%) sys 17.72 ( 4%) wall
branch prediction : 4.69 ( 1%) usr 0.06 ( 1%) sys 4.75 ( 1%) wall
flow analysis : 0.62 ( 0%) usr 0.00 ( 0%) sys 0.62 ( 0%) wall
combiner : 8.90 ( 2%) usr 0.07 ( 1%) sys 8.99 ( 2%) wall
if-conversion : 1.06 ( 0%) usr 0.00 ( 0%) sys 1.06 ( 0%) wall
regmove : 2.50 ( 1%) usr 0.01 ( 0%) sys 2.51 ( 1%) wall
local alloc : 7.84 ( 2%) usr 0.25 ( 2%) sys 8.09 ( 2%) wall
global alloc : 18.89 ( 4%) usr 0.17 ( 2%) sys 19.07 ( 4%) wall
reload CSE regs : 5.16 ( 1%) usr 0.18 ( 2%) sys 5.35 ( 1%) wall
flow 2 : 1.77 ( 0%) usr 0.01 ( 0%) sys 1.78 ( 0%) wall
if-conversion 2 : 0.40 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall
peephole 2 : 1.10 ( 0%) usr 0.00 ( 0%) sys 1.10 ( 0%) wall
rename registers : 9.39 ( 2%) usr 0.08 ( 1%) sys 9.49 ( 2%) wall
scheduling 2 : 9.71 ( 2%) usr 0.37 ( 4%) sys 10.10 ( 2%) wall
reorder blocks : 1.41 ( 0%) usr 0.02 ( 0%) sys 1.43 ( 0%) wall
shorten branches : 1.25 ( 0%) usr 0.05 ( 0%) sys 1.30 ( 0%) wall
final : 2.19 ( 0%) usr 0.21 ( 2%) sys 2.42 ( 0%) wall
symout : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
rest of compilation : 9.21 ( 2%) usr 0.06 ( 1%) sys 9.51 ( 2%) wall
TOTAL : 475.30 10.28 486.64
>
> > I also tested it on some of C SPEC2000 sources and it does not seem to
> > supress any important inlining. I hope to do some behchmarking tomorrow
> > as I need some sleep now.
>
> If you send me (privately) the full set of patches required for this, I'll
> happily perform testsuite runs and benchmarks for the full application
> which contains the code of PR/8361.
Will try to do so. I've managed to break my local tree again, so I will
prepare cleaner version this afternoon. (I just downloaded fresh tree
and will apply my patches these one by one)
Note that the patch from yesterday has also two important bugs - it does
not inline functions that are output separately (as I simply release
them by mistake after expanding and always expanding before expanding
the callers) and it over estimate the size of function body after
inlining.
Fixing these two causes decrese in number of call instruction on the
testcase (without unit-at-a-time I get 6000, same I get with the
yersterday path, with today patch I get 4000) With limit set to 1000 I
get 4300 calls. Call instructions account 8%, 9% and 6% of code size
respectivly.
I now get only about 7% decrease on libstdc++ code size with maintaining
the same call density.
I attached updated patch.
>
> This looks very exciting!
Hope it is real :) Still it is possible that I did simple thinko
somewhere, but few SPEC benchmarks works just fine (on the other hand
there are just few places in the benchmarks that rely on inlining and
all of them are very easy to decide on)
Honza
>
> Gerald
*** cgraph.c.in Sun Jun 22 23:47:35 2003
--- cgraph.c Mon Jun 23 00:26:38 2003
*************** create_edge (caller, callee)
*** 160,165 ****
--- 160,179 ----
struct cgraph_node *caller, *callee;
{
struct cgraph_edge *edge = xmalloc (sizeof (struct cgraph_edge));
+ struct cgraph_edge *edge2;
+
+ edge->inline_call = false;
+ /* At the moment we don't associate calls with specific CALL_EXPRs
+ as we probably ought to, so we must preserve inline_call flags to
+ be the same in all copies of the same edge. */
+ if (cgraph_global_info_ready)
+ for (edge2 = caller->callees; edge2; edge2 = edge2->next_caller)
+ if (edge2->callee == callee)
+ {
+ edge->inline_call = edge2->inline_call;
+ break;
+ }
+
edge->caller = caller;
edge->callee = callee;
*** cgraph.h.in Sun Jun 22 21:56:33 2003
--- cgraph.h Mon Jun 23 01:36:16 2003
*************** struct cgraph_local_info
*** 30,40 ****
/* Set when function function is visiable in current compilation unit only
and it's address is never taken. */
bool local;
! /* Set when function is small enought to be inlinable many times. */
! bool inline_many;
! /* Set when function can be inlined once (false only for functions calling
! alloca, using varargs and so on). */
! bool can_inline_once;
};
/* Information about the function that needs to be computed globally
--- 30,42 ----
/* Set when function function is visiable in current compilation unit only
and it's address is never taken. */
bool local;
!
! /* False when there is something making inlining impossible (such as va_arg) */
! bool inlinable;
! /* True when function should be inlined independently on it's size. */
! bool disgread_inline_limits;
! /* Size of the function before inlining. */
! int self_insns;
};
/* Information about the function that needs to be computed globally
*************** struct cgraph_global_info
*** 44,49 ****
--- 46,54 ----
{
/* Set when the function will be inlined exactly once. */
bool inline_once;
+
+ /* Estimated size of the function after inlining. */
+ int insns;
};
/* Information about the function that is propagated by the RTL backend.
*************** struct cgraph_edge
*** 95,100 ****
--- 100,106 ----
struct cgraph_node *caller, *callee;
struct cgraph_edge *next_caller;
struct cgraph_edge *next_callee;
+ bool inline_call;
};
extern struct cgraph_node *cgraph_nodes;
*************** void cgraph_finalize_compilation_unit PA
*** 120,124 ****
--- 126,131 ----
void cgraph_create_edges PARAMS ((tree, tree));
void cgraph_optimize PARAMS ((void));
void cgraph_mark_needed_node PARAMS ((struct cgraph_node *, int));
+ bool cgraph_inline_p PARAMS ((tree, tree));
#endif /* GCC_CGRAPH_H */
*** cgraphunit.c.in Sun Jun 22 21:56:26 2003
--- cgraphunit.c Mon Jun 23 14:53:24 2003
*************** Software Foundation, 59 Temple Place - S
*** 35,47 ****
#include "cgraph.h"
#include "varpool.h"
#include "diagnostic.h"
static void cgraph_expand_functions PARAMS ((void));
static void cgraph_mark_functions_to_output PARAMS ((void));
static void cgraph_expand_function PARAMS ((struct cgraph_node *));
static tree record_call_1 PARAMS ((tree *, int *, void *));
static void cgraph_mark_local_functions PARAMS ((void));
- static void cgraph_mark_functions_to_inline_once PARAMS ((void));
static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
/* Analyze function once it is parsed. Set up the local information
--- 35,53 ----
#include "cgraph.h"
#include "varpool.h"
#include "diagnostic.h"
+ #include "c-common.h"
+ #include "params.h"
+
+ #define INSNS_PER_STMT 10
+ #define INSNS_PER_CALL 10
+ #define LARGE_FUNCTION_INSNS 30000
+ #define LARGE_FUNCTION_GROWTH 200
static void cgraph_expand_functions PARAMS ((void));
static void cgraph_mark_functions_to_output PARAMS ((void));
static void cgraph_expand_function PARAMS ((struct cgraph_node *));
static tree record_call_1 PARAMS ((tree *, int *, void *));
static void cgraph_mark_local_functions PARAMS ((void));
static void cgraph_optimize_function PARAMS ((struct cgraph_node *));
/* Analyze function once it is parsed. Set up the local information
*************** cgraph_finalize_function (decl, body)
*** 75,88 ****
cgraph_mark_needed_node (node, 1);
}
! if (!node->needed && !DECL_COMDAT (node->decl))
! node->local.can_inline_once = tree_inlinable_function_p (decl, 1);
! else
! node->local.can_inline_once = 0;
! if (flag_inline_trees)
! node->local.inline_many = tree_inlinable_function_p (decl, 0);
! else
! node->local.inline_many = 0;
(*debug_hooks->deferred_inline_function) (decl);
}
--- 81,91 ----
cgraph_mark_needed_node (node, 1);
}
! node->local.inlinable = tree_inlinable_function_p (decl, 1);
! node->local.self_insns = DECL_NUM_STMTS (decl) * INSNS_PER_STMT;
! if (node->local.inlinable)
! node->local.disgread_inline_limits
! = (*lang_hooks.tree_inlining.disregard_inline_limits) (decl);
(*debug_hooks->deferred_inline_function) (decl);
}
*************** cgraph_mark_functions_to_output ()
*** 231,246 ****
for (node = cgraph_nodes; node; node = node->next)
{
tree decl = node->decl;
/* We need to output all local functions that are used and not
always inlined, as well as those that are reachable from
outside the current compilation unit. */
if (DECL_SAVED_TREE (decl)
&& (node->needed
! || (!node->local.inline_many && !node->global.inline_once
! && node->reachable)
! || (DECL_ASSEMBLER_NAME_SET_P (decl)
! && TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))))
&& !TREE_ASM_WRITTEN (decl) && !node->origin
&& !DECL_EXTERNAL (decl))
node->output = 1;
--- 234,253 ----
for (node = cgraph_nodes; node; node = node->next)
{
tree decl = node->decl;
+ struct cgraph_edge *e;
+ if (node->output)
+ abort ();
+
+ for (e = node->callers; e; e = e->next_caller)
+ if (!e->inline_call)
+ break;
/* We need to output all local functions that are used and not
always inlined, as well as those that are reachable from
outside the current compilation unit. */
if (DECL_SAVED_TREE (decl)
&& (node->needed
! || (e && node->reachable))
&& !TREE_ASM_WRITTEN (decl) && !node->origin
&& !DECL_EXTERNAL (decl))
node->output = 1;
*************** cgraph_optimize_function (node)
*** 255,260 ****
--- 262,269 ----
{
tree decl = node->decl;
+ /* optimize_inline_calls avoid inlining of current_function_decl. */
+ current_function_decl = 0;
if (flag_inline_trees)
optimize_inline_calls (decl);
if (node->nested)
*************** cgraph_expand_function (node)
*** 271,276 ****
--- 280,286 ----
struct cgraph_node *node;
{
tree decl = node->decl;
+ struct cgraph_edge *e;
announce_function (decl);
*************** cgraph_expand_function (node)
*** 280,320 ****
via lang_expand_decl_stmt. */
(*lang_hooks.callgraph.expand_function) (decl);
! /* When we decided to inline the function once, we never ever should
! need to output it separately. */
! if (node->global.inline_once)
! abort ();
! if (!node->local.inline_many
! || !node->callers)
DECL_SAVED_TREE (decl) = NULL;
current_function_decl = NULL;
}
!
! /* Expand all functions that must be output.
!
! Attempt to topologically sort the nodes so function is output when
! all called functions are already assembled to allow data to be
! propagated accross the callgraph. Use a stack to get smaller distance
! between a function and it's callees (later we may choose to use a more
! sophisticated algorithm for function reordering; we will likely want
! to use subsections to make the output functions appear in top-down
! order. */
!
! static void
! cgraph_expand_functions ()
{
struct cgraph_node *node, *node2;
- struct cgraph_node **stack =
- xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
- struct cgraph_node **order =
- xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
int stack_size = 0;
int order_pos = 0;
struct cgraph_edge *edge, last;
- int i;
! cgraph_mark_functions_to_output ();
/* We have to deal with cycles nicely, so use a depth first traversal
output algorithm. Ignore the fact that some functions won't need
--- 290,315 ----
via lang_expand_decl_stmt. */
(*lang_hooks.callgraph.expand_function) (decl);
! for (e = node->callers; e; e = e->next_caller)
! if (e->inline_call)
! break;
! if (!e)
DECL_SAVED_TREE (decl) = NULL;
current_function_decl = NULL;
}
! /* Fill array order with all nodes with output flag set in the reverse
! topological order. */
! static int
! cgraph_postorder (struct cgraph_node **order)
{
struct cgraph_node *node, *node2;
int stack_size = 0;
int order_pos = 0;
struct cgraph_edge *edge, last;
! struct cgraph_node **stack =
! xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
/* We have to deal with cycles nicely, so use a depth first traversal
output algorithm. Ignore the fact that some functions won't need
*************** cgraph_expand_functions ()
*** 323,329 ****
for (node = cgraph_nodes; node; node = node->next)
node->aux = NULL;
for (node = cgraph_nodes; node; node = node->next)
! if (node->output && !node->aux)
{
node2 = node;
if (!node->callers)
--- 318,324 ----
for (node = cgraph_nodes; node; node = node->next)
node->aux = NULL;
for (node = cgraph_nodes; node; node = node->next)
! if (!node->aux)
{
node2 = node;
if (!node->callers)
*************** cgraph_expand_functions ()
*** 360,487 ****
}
}
}
! for (i = order_pos - 1; i >= 0; i--)
{
! node = order[i];
! if (node->output)
{
! if (!node->reachable)
! abort ();
! node->output = 0;
! cgraph_expand_function (node);
}
}
free (stack);
! free (order);
}
! /* Mark all local functions.
! We can not use node->needed directly as it is modified during
! execution of cgraph_optimize. */
static void
! cgraph_mark_local_functions ()
{
struct cgraph_node *node;
! if (!quiet_flag)
! fprintf (stderr, "\n\nMarking local functions:");
- /* Figure out functions we want to assemble. */
for (node = cgraph_nodes; node; node = node->next)
{
! node->local.local = (!node->needed
! && DECL_SAVED_TREE (node->decl)
! && !DECL_COMDAT (node->decl)
! && !TREE_PUBLIC (node->decl));
! if (node->local.local)
! announce_function (node->decl);
}
! }
! /* Decide what function should be inlined because they are invoked once
! (so inlining won't result in duplication of the code). */
! static void
! cgraph_mark_functions_to_inline_once ()
! {
! struct cgraph_node *node, *node1;
if (!quiet_flag)
fprintf (stderr, "\n\nMarking functions to inline once:");
! /* Now look for function called only once and mark them to inline.
! From this point number of calls to given function won't grow. */
! for (node = cgraph_nodes; node; node = node->next)
{
if (node->callers && !node->callers->next_caller && !node->needed
! && node->local.can_inline_once)
{
bool ok = true;
/* Verify that we won't duplicate the caller. */
for (node1 = node->callers->caller;
! node1->local.inline_many
! && node1->callers
! && ok;
! node1 = node1->callers->caller)
if (node1->callers->next_caller || node1->needed)
ok = false;
if (ok)
{
! node->global.inline_once = true;
! announce_function (node->decl);
}
}
}
}
/* Perform simple optimizations based on callgraph. */
void
cgraph_optimize ()
{
- struct cgraph_node *node;
- bool changed = true;
-
cgraph_mark_local_functions ();
! cgraph_mark_functions_to_inline_once ();
cgraph_global_info_ready = true;
if (!quiet_flag)
fprintf (stderr, "\n\nAssembling functions:");
! /* Output everything.
! ??? Our inline heuristic may decide to not inline functions previously
! marked as inlinable thus adding new function bodies that must be output.
! Later we should move all inlining decisions to callgraph code to make
! this impossible. */
cgraph_expand_functions ();
- if (!quiet_flag)
- fprintf (stderr, "\n\nAssembling functions that failed to inline:");
- while (changed && !errorcount && !sorrycount)
- {
- changed = false;
- for (node = cgraph_nodes; node; node = node->next)
- {
- tree decl = node->decl;
- if (!node->origin
- && !TREE_ASM_WRITTEN (decl)
- && DECL_SAVED_TREE (decl)
- && !DECL_EXTERNAL (decl))
- {
- struct cgraph_edge *edge;
-
- for (edge = node->callers; edge; edge = edge->next_caller)
- if (TREE_ASM_WRITTEN (edge->caller->decl))
- {
- changed = true;
- cgraph_expand_function (node);
- break;
- }
- }
- }
- }
}
--- 355,813 ----
}
}
}
! free (stack);
! return order_pos;
! }
!
! #define INLINED_TIMES(node) ((size_t)(node)->aux)
! #define SET_INLINED_TIMES(node,times) ((node)->aux = (void *)(times))
!
! /* Return list of nodes we decided to inline NODE into, set their output
! flag and compute INLINED_TIMES.
!
! We do simple backgracing to get INLINED_TIMES right. This should not be
! expensive as we limit amount of inlining. Alternatively we may first
! discover set of nodes, to topological sort on these and propagate
! information. */
! static int
! cgraph_inlined_into (struct cgraph_node *node, struct cgraph_node **array)
! {
! int nfound = 0;
! struct cgraph_edge **stack;
! struct cgraph_edge *e, *e1;
! int sp;
! int i;
!
! /* Fast path: since we traverse in mostly topological order, we will likely
! find no edges. */
! for (e = node->callers; e; e = e->next_caller)
! if (e->inline_call)
! break;
!
! if (!e)
! return 0;
!
! /* Allocate stack for back-tracking up callgraph. */
! stack = xmalloc ((cgraph_n_nodes + 1) * sizeof (struct cgraph_edge));
! sp = 0;
!
! /* Push the first edge on to the stack. */
! stack[sp++] = e;
!
! while (sp)
{
! struct cgraph_node *caller;
!
! /* Look at the edge on the top of the stack. */
! e = stack[sp - 1];
! caller = e->caller;
!
! /* Check if the caller destination has been visited yet. */
! if (!caller->output)
{
! array[nfound++] = e->caller;
! /* Mark that we have visited the destination. */
! caller->output = true;
! SET_INLINED_TIMES (caller, 0);
! }
! SET_INLINED_TIMES (caller, INLINED_TIMES (caller) + 1);
!
! for (e1 = caller->callers; e1; e1 = e1->next_caller)
! if (e1->inline_call)
! break;
! if (e1)
! stack[sp++] = e1;
! else
! {
! while (true)
! {
! for (e1 = e->next_caller; e1; e1 = e1->next_caller)
! if (e1->inline_call)
! break;
!
! if (e1)
! {
! stack[sp - 1] = e1;
! break;
! }
! else
! {
! sp--;
! if (!sp)
! break;
! e = stack[sp - 1];
! }
! }
}
}
+
free (stack);
!
!
! if (!quiet_flag)
! {
! fprintf (stderr, "\nFound inline predecesors of ");
! announce_function (node->decl);
! fprintf (stderr, ":");
! for (i = 0; i < nfound; i++)
! {
! announce_function (array[i]->decl);
! if (INLINED_TIMES (array[i]) != 1)
! fprintf (stderr, " (%i times)", INLINED_TIMES (array[i]));
! }
! fprintf (stderr, "\n");
! }
!
! return nfound;
}
! /* Estimate size of the function after inlining WHAT into TO. */
! static int
! cgraph_estimate_size_after_inlining (int times,
! struct cgraph_node *to,
! struct cgraph_node *what)
! {
! return to->global.insns + (what->global.insns - INSNS_PER_CALL) * times;
! }
+ /* Update insn sizes after inlining WHAT into TO that is already inlined into
+ all nodes in INLINED array. */
static void
! cgraph_mark_inline (struct cgraph_node *to,
! struct cgraph_node *what,
! struct cgraph_node **inlined, int ninlined)
! {
! int i;
! int times = 0;
! struct cgraph_edge *e;
!
! for (e = to->callees; e; e = e->next_callee)
! if (e->callee == what)
! {
! if (e->inline_call)
! abort ();
! e->inline_call = true;
! times++;
! }
! if (!times)
! abort ();
!
! to->global.insns = cgraph_estimate_size_after_inlining (times, to, what);
! for (i = 0; i < ninlined; i++)
! inlined[i]->global.insns =
! cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) * times,
! inlined[i], what);
! }
!
! /* Return false when inlining WHAT into TO is not good idea as it would cause
! too large growth of function bodies. */
! static bool
! cgraph_check_inline_limits (struct cgraph_node *to,
! struct cgraph_node *what,
! struct cgraph_node **inlined, int ninlined)
! {
! int i;
! int times = 0;
! struct cgraph_edge *e;
! int newsize;
!
! for (e = to->callees; e; e = e->next_callee)
! if (e->callee == what)
! times++;
!
! newsize = cgraph_estimate_size_after_inlining (times, to, what);
! if (newsize > LARGE_FUNCTION_INSNS
! && newsize > to->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
! return false;
! for (i = 0; i < ninlined; i++)
! {
! newsize =
! cgraph_estimate_size_after_inlining (INLINED_TIMES (inlined[i]) *
! times, inlined[i], what);
! if (newsize > LARGE_FUNCTION_INSNS
! && newsize >
! inlined[i]->local.self_insns * LARGE_FUNCTION_GROWTH / 100)
! return false;
! }
! return true;
! }
!
! /* Return true when function N is small enought to be inlined. */
! static bool
! cgraph_default_inline_p (struct cgraph_node *n)
! {
! if (!DECL_INLINE (n->decl))
! return false;
! if (DID_INLINE_FUNC (n->decl))
! return n->global.insns < MAX_INLINE_INSNS_AUTO;
! else
! return n->global.insns < MAX_INLINE_INSNS_SINGLE;
! }
!
! static void
! cgraph_decide_inlining (void)
{
struct cgraph_node *node;
+ int nnodes;
+ struct cgraph_node **order =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ struct cgraph_node **inlined =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ int ninlined;
+ int i, y;
! for (node = cgraph_nodes; node; node = node->next)
! node->global.insns = node->local.self_insns;
!
! nnodes = cgraph_postorder (order);
for (node = cgraph_nodes; node; node = node->next)
+ node->aux = 0;
+
+ /* In the first pass mark all always_inline edges. Do this with a priority
+ so no our decisions makes this impossible. */
+ if (!quiet_flag)
+ fprintf (stderr, "\n\nDeciding on always_inline functions:");
+ for (i = nnodes - 1; i >= 0; i--)
{
! struct cgraph_edge *e;
!
! node = order[i];
!
! for (e = node->callees; e; e = e->next_callee)
! if (e->callee->local.disgread_inline_limits)
! break;
! if (!e)
! continue;
! ninlined = cgraph_inlined_into (order[i], inlined);
! for (; e; e = e->next_callee)
! {
! if (e->inline_call || !e->callee->local.disgread_inline_limits)
! continue;
! if (e->callee->output || e->callee == node)
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Always inline flag ignored in respect to `%s' to avoid cycle");
! continue;
! }
! if (!quiet_flag)
! {
! fprintf (stderr, "Always inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "\n");
! }
! cgraph_mark_inline (node, e->callee, inlined, ninlined);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0, node->aux = 0;
}
! /* In the first pass mark all always_inline edges. Do this with a priority
! so no our decisions makes this impossible. */
! if (!quiet_flag)
! fprintf (stderr, "\n\nDeciding on inlining:");
! for (i = nnodes - 1; i >= 0; i--)
! {
! struct cgraph_edge *e;
! node = order[i];
! for (e = node->callees; e; e = e->next_callee)
! if (!e->inline_call && cgraph_default_inline_p (e->callee))
! break;
! if (!e)
! continue;
! ninlined = cgraph_inlined_into (order[i], inlined);
! for (; e; e = e->next_callee)
! {
! if (e->inline_call || !cgraph_default_inline_p (e->callee))
! continue;
! if (e->callee == node)
! continue;
! if (e->callee->output)
! {
! if (warn_inline && DECL_INLINE (e->callee->decl))
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Not inlining to `%s' to avoid cycle");
! }
! continue;
! }
! if (!cgraph_check_inline_limits
! (node, e->callee, inlined, ninlined))
! {
! if (!quiet_flag)
! {
! fprintf (stderr, "Not inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "as inline limits exceeds\n");
! }
! if (warn_inline && DECL_INLINE (e->callee->decl))
! {
! current_function_decl = e->callee->decl;
! warning_with_decl (node->decl,
! "Not inlining to `%s' to avoid too large function bodies");
! }
! continue;
! }
! if (!quiet_flag)
! {
! fprintf (stderr, "Inlining");
! announce_function (e->callee->decl);
! fprintf (stderr, " to");
! announce_function (node->decl);
! fprintf (stderr, "\n");
! }
! cgraph_mark_inline (node, e->callee, inlined, ninlined);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0, node->aux = 0;
! }
if (!quiet_flag)
fprintf (stderr, "\n\nMarking functions to inline once:");
! for (i = nnodes - 1; i >= 0; i--)
{
+ node = order[i];
+
if (node->callers && !node->callers->next_caller && !node->needed
! && node->local.inlinable && !node->callers->inline_call
! && !DECL_EXTERNAL (node->decl) && !DECL_COMDAT (node->decl))
{
bool ok = true;
+ struct cgraph_node *node1;
/* Verify that we won't duplicate the caller. */
for (node1 = node->callers->caller;
! node1->callers && node1->callers->inline_call
! && ok; node1 = node1->callers->caller)
if (node1->callers->next_caller || node1->needed)
ok = false;
if (ok)
{
! ninlined = cgraph_inlined_into (node, inlined);
! if (!cgraph_check_inline_limits
! (node->callers->caller, node, inlined, ninlined))
! {
! if (!quiet_flag)
! {
! fprintf (stderr, "\nNot inlining once");
! announce_function (node->decl);
! fprintf (stderr, " to");
! announce_function (node1->decl);
! fprintf (stderr, "as inline limits exceeds\n");
! }
! }
! else
! {
! cgraph_mark_inline (node->callers->caller, node, inlined,
! ninlined);
! announce_function (node->decl);
! }
! for (y = 0; y < ninlined; y++)
! inlined[y]->output = 0, node->aux = 0;
}
}
}
+
+ free (order);
+ free (inlined);
}
+ /* Return true when CALLER_DECL should be inlined into CALLEE_DECL. */
+ bool
+ cgraph_inline_p (tree caller_decl, tree callee_decl)
+ {
+ struct cgraph_node *caller = cgraph_node (caller_decl);
+ struct cgraph_node *callee = cgraph_node (callee_decl);
+ struct cgraph_edge *e;
+
+ for (e = caller->callees; e; e = e->next_callee)
+ if (e->callee == callee)
+ return e->inline_call;
+ abort ();
+ }
+
+ /* Expand all functions that must be output.
+
+ Attempt to topologically sort the nodes so function is output when
+ all called functions are already assembled to allow data to be
+ propagated accross the callgraph. Use a stack to get smaller distance
+ between a function and it's callees (later we may choose to use a more
+ sophisticated algorithm for function reordering; we will likely want
+ to use subsections to make the output functions appear in top-down
+ order. */
+
+ static void
+ cgraph_expand_functions ()
+ {
+ struct cgraph_node *node;
+ struct cgraph_node **order =
+ xcalloc (sizeof (struct cgraph_node *), cgraph_n_nodes);
+ int order_pos = 0;
+ int i;
+
+ cgraph_mark_functions_to_output ();
+
+ order_pos = cgraph_postorder (order);
+
+ for (i = order_pos - 1; i >= 0; i--)
+ {
+ node = order[i];
+ if (node->output)
+ {
+ if (!node->reachable)
+ abort ();
+ node->output = 0;
+ cgraph_expand_function (node);
+ }
+ }
+ free (order);
+ }
+
+ /* Mark all local functions.
+ We can not use node->needed directly as it is modified during
+ execution of cgraph_optimize. */
+
+ static void
+ cgraph_mark_local_functions ()
+ {
+ struct cgraph_node *node;
+
+ if (!quiet_flag)
+ fprintf (stderr, "\n\nMarking local functions:");
+
+ /* Figure out functions we want to assemble. */
+ for (node = cgraph_nodes; node; node = node->next)
+ {
+ node->local.local = (!node->needed
+ && DECL_SAVED_TREE (node->decl)
+ && !DECL_COMDAT (node->decl)
+ && !TREE_PUBLIC (node->decl));
+ if (node->local.local)
+ announce_function (node->decl);
+ }
+ }
/* Perform simple optimizations based on callgraph. */
void
cgraph_optimize ()
{
cgraph_mark_local_functions ();
! cgraph_decide_inlining ();
cgraph_global_info_ready = true;
if (!quiet_flag)
fprintf (stderr, "\n\nAssembling functions:");
! /* Output everything. */
cgraph_expand_functions ();
}
*** tree-inline.c.in Mon Jun 23 00:50:08 2003
--- tree-inline.c Mon Jun 23 01:24:02 2003
*************** typedef struct inline_data
*** 106,111 ****
--- 106,112 ----
htab_t tree_pruner;
/* Decl of function we are inlining into. */
tree decl;
+ tree current_decl;
} inline_data;
/* Prototypes. */
*************** expand_call_inline (tp, walk_subtrees, d
*** 1204,1212 ****
/* Don't try to inline functions that are not well-suited to
inlining. */
! if ((!flag_unit_at_a_time || !DECL_SAVED_TREE (fn)
! || !cgraph_global_info (fn)->inline_once)
! && !inlinable_function_p (fn, id, 0))
{
if (warn_inline && DECL_INLINE (fn))
{
--- 1205,1213 ----
/* Don't try to inline functions that are not well-suited to
inlining. */
! if (!DECL_SAVED_TREE (fn)
! || (flag_unit_at_a_time && !cgraph_inline_p (id->current_decl, fn))
! || (!flag_unit_at_a_time && !inlinable_function_p (fn, id, 0)))
{
if (warn_inline && DECL_INLINE (fn))
{
*************** expand_call_inline (tp, walk_subtrees, d
*** 1451,1457 ****
}
/* Recurse into the body of the just inlined function. */
! expand_calls_inline (inlined_body, id);
VARRAY_POP (id->fns);
/* If we've returned to the top level, clear out the record of how
--- 1452,1463 ----
}
/* Recurse into the body of the just inlined function. */
! {
! tree old_decl = id->current_decl;
! id->current_decl = fn;
! expand_calls_inline (inlined_body, id);
! id->current_decl = old_decl;
! }
VARRAY_POP (id->fns);
/* If we've returned to the top level, clear out the record of how
*************** optimize_inline_calls (fn)
*** 1497,1502 ****
--- 1503,1509 ----
memset (&id, 0, sizeof (id));
id.decl = fn;
+ id.current_decl = fn;
/* Don't allow recursion into FN. */
VARRAY_TREE_INIT (id.fns, 32, "fns");
VARRAY_PUSH_TREE (id.fns, fn);