This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Rewrite of inlining heuristics
- From: Jan Hubicka <jh at suse dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Tue, 24 May 2005 21:00:39 +0200
- Subject: Rewrite of inlining heuristics
Hi,
this patch merge inlining heuristics rewrite from tree-profiling-branch. Main
goal was to make profile driven inlining easy, but it affects the default
heuristics too. Basically the old heuristics ordered all inline candidates
(functions) in priority queue depending on estimated costs of the inlining and
inlined as many of those as possible until reaching code growth limits.
The rewrite orders the call sites (cgraph edges) instead of functions, so
different calls of the same function can have different priorities. Without
profiling current it takes into account loop depth that seems to have some
improvements on gap and Gerald's application benchmarks.
The profile driven inlining is in too. It requires -ftree-based-profiling to
be active and if you want RTL land profile guided optimizations, you need
-fno-loop-optimize. I plan to benchmark it and if this combination is going to
be a win (I hope so), I will change -fprofile-generate/-fprofile-use to imply
these flags.
I am still running some benchmarks, so I will post them later today or
tomorrow, but I am pusing out the patch so people might try it or comment on
before I commit it.
The SPEC scores are of -O3 are:
Size of binaries:
164.gzip: Base: 71377 bytes
164.gzip: Peak: 71377 bytes
176.gcc: Base: 1886381 bytes
176.gcc: Peak: 1830505 bytes
181.mcf: Base: 30495 bytes
181.mcf: Peak: 26879 bytes
186.crafty: Base: 237732 bytes
186.crafty: Peak: 237732 bytes
197.parser: Base: 203166 bytes
197.parser: Peak: 187870 bytes
253.perlbmk: Base: 748770 bytes
253.perlbmk: Peak: 747612 bytes
254.gap: Base: 610495 bytes
254.gap: Peak: 606207 bytes
255.vortex: Base: 684446 bytes
255.vortex: Peak: 684446 bytes
256.bzip2: Base: 63018 bytes
256.bzip2: Peak: 63018 bytes
300.twolf: Base: 239745 bytes
300.twolf: Peak: 239745 bytes
=============================
Total: Base: 4775625 bytes
Total: Peak: 4695391 bytes
Total time for base compilation: 275 s
Total time for peak compilation: 270 s
GCC was configured as: configure --enable-threads=posix --enable-languages="c,c++,f95"
GCC bootstrap times for 'make -j1 bootstrap && make install':
Base compiler: 2519 s
Peak compiler: 2516 s
164.gzip 1400 156 900 * 1400 156 899 *
175.vpr X X
176.gcc 1100 101 1089 * 1100 101 1086 *
181.mcf 1800 431 417 * 1800 432 417 *
186.crafty 1000 63.5 1574 * 1000 63.7 1569 *
197.parser 1800 267 673 * 1800 262 688 *
252.eon X X
253.perlbmk 1800 166 1086 * 1800 165 1088 *
254.gap 1100 128 856 * 1100 125 878 *
255.vortex 1900 147 1294 * 1900 146 1304 *
256.bzip2 1500 176 852 * 1500 176 853 *
300.twolf 3000 324 927 * 3000 325 922 *
Est. SPECint_base2000 915
Est. SPECint2000 919
Size of binaries:
168.wupwise: Base: 45287 bytes
168.wupwise: Peak: 45279 bytes
171.swim: Base: 23114 bytes
171.swim: Peak: 23106 bytes
172.mgrid: Base: 27538 bytes
172.mgrid: Peak: 27530 bytes
173.applu: Base: 92251 bytes
173.applu: Peak: 92243 bytes
177.mesa: Base: 639871 bytes
177.mesa: Peak: 635785 bytes
179.art: Base: 31472 bytes
179.art: Peak: 31472 bytes
183.equake: Base: 42823 bytes
183.equake: Peak: 42823 bytes
187.facerec: Base: 83208 bytes
187.facerec: Peak: 83200 bytes
188.ammp: Base: 174610 bytes
188.ammp: Peak: 174642 bytes
189.lucas: Base: 84715 bytes
189.lucas: Peak: 84707 bytes
191.fma3d: Base: 1286905 bytes
191.fma3d: Peak: 1286897 bytes
200.sixtrack: Base: 1048817 bytes
200.sixtrack: Peak: 1048809 bytes
301.apsi: Base: 150880 bytes
301.apsi: Peak: 150872 bytes
=============================
Total: Base: 3731491 bytes
Total: Peak: 3727365 bytes
Total time for base compilation: 276 s
Total time for peak compilation: 272 s
168.wupwise X X
171.swim X X
172.mgrid X X
173.applu X X
177.mesa 1400 113 1238 * 1400 113 1235 *
178.galgel X X
179.art 2600 341 762 * 2600 344 756 *
183.equake 1300 161 807 * 1300 161 808 *
187.facerec X X
188.ammp 2200 231 954 * 2200 231 954 *
189.lucas X X
191.fma3d X X
200.sixtrack X X
301.apsi X X
Est. SPECfp_base2000 923
Est. SPECfp2000 921
Of course the patch is interesting with profile feedback and IMA.
Unforutnately mainline is quite broken there so only few benchmarks actually
works and those are those more trivial and thus less interesting. On
tree-profiling the speedups are somewhere in 2-3% at considerable code size
savings (and I am testing fix for the gzip failure, it is unrelated latent bug)
164.gzip: Base: 80321 bytes
181.mcf: Base: 33875 bytes
181.mcf: Peak: 29779 bytes
186.crafty: Base: 296462 bytes
186.crafty: Peak: 267790 bytes
197.parser: Base: 246457 bytes
197.parser: Peak: 180857 bytes
254.gap: Base: 665521 bytes
254.gap: Peak: 559098 bytes
256.bzip2: Base: 71182 bytes
256.bzip2: Peak: 58766 bytes
=============================
Total: Base: 1393818 bytes
Total: Peak: 1096290 bytes
Total time for base compilation: 380 s
Total time for peak compilation: 370 s
164.gzip 1400 150 934 *
175.vpr X X
176.gcc X X
181.mcf 1800 427 422 * 1800 427 422 *
186.crafty 1000 64.6 1549 * 1000 62.2 1608 *
197.parser 1800 222 810 * 1800 222 810 *
252.eon X X
253.perlbmk X X
254.gap 1100 126 875 * 1100 123 895 *
255.vortex X X
256.bzip2 1500 181 827 * 1500 182 826 *
300.twolf X X
Est. SPECint_base2000 843
Est. SPECint2000 835
More benchmarks to come ;)
The testcase works only with Janis's tree-profiling testsuite bits.
/* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-inline -ftree-based-profiling" } */
int a;
int b[100];
void abort (void);
inline void
cold_function ()
{
int i;
for (i = 0; i < 99; i++)
if (b[i] / (b[i+1] + 1))
abort ();
}
inline void
hot_function ()
{
int i;
for (i = 0; i < 99; i++)
if (b[i] / (b[i+1] + 1))
abort ();
}
main ()
{
if (a)
cold_function ();
else
hot_function ();
}
/* cold function should be inlined, while hot function should not.
Look for "cold_function () [tail call];" call statement not for the
declaration or other apperances of the string in dump. */
/* { dg-final-use { scan-tree-dump "cold_function .. .tail" "optimized"} } */
/* { dg-final-use { scan-tree-dump-not "hot_function .. .tail" "optimized"} } */
2005-05-24 Jan Hubicka <jh@suse.cz>
* Makefile.in (ipa-inline.o): Add COEVERAGE_H dependency.
* cgraph.c (cgraph_create_node): Reset estimated_growth.
* cgraph.h (cgraph_global_info): Add estimated_growth.
* ipa-inline.c: Include coverage.h
(max_insns, max_count): New static variables.
(cgraph_estimate_size_after_inlining): Cache the result.
(cgraph_estimate_growth):
* passes.c (rest_of_clean_state): Kill coverage_end_function.
* timevar.def (TV_INLINE_HEURISTICS): New timevar.
* tree-optimize.c (init_tree_optimization_passes): Move profiling before
inlining.
(ipa_passes): Initialize bitmaps.
Index: Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1489
diff -c -3 -p -r1.1489 Makefile.in
*** Makefile.in 20 May 2005 21:17:40 -0000 1.1489
--- Makefile.in 23 May 2005 09:49:04 -0000
*************** cgraphunit.o : cgraphunit.c $(CONFIG_H)
*** 2063,2069 ****
ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H)
ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
! $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h
coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
--- 2063,2070 ----
ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H)
ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(TREE_H) langhooks.h tree-inline.h $(FLAGS_H) $(CGRAPH_H) intl.h \
! $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \
! $(COVERAGE_H)
coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
function.h toplev.h $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \
Index: cgraph.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.c,v
retrieving revision 1.72
diff -c -3 -p -r1.72 cgraph.c
*** cgraph.c 19 May 2005 10:38:37 -0000 1.72
--- cgraph.c 23 May 2005 09:49:04 -0000
*************** cgraph_create_node (void)
*** 169,174 ****
--- 169,175 ----
if (cgraph_nodes)
cgraph_nodes->previous = node;
node->previous = NULL;
+ node->global.estimated_growth = INT_MIN;
cgraph_nodes = node;
cgraph_n_nodes++;
return node;
Index: cgraph.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraph.h,v
retrieving revision 1.53
diff -c -3 -p -r1.53 cgraph.h
*** cgraph.h 19 May 2005 10:38:37 -0000 1.53
--- cgraph.h 23 May 2005 09:49:04 -0000
*************** struct cgraph_global_info GTY(())
*** 69,74 ****
--- 69,77 ----
/* Estimated size of the function after inlining. */
int insns;
+ /* Estimated growth after inlining. INT_MIN if not computed. */
+ int estimated_growth;
+
/* Set iff the function has been inlined at least once. */
bool inlined;
};
Index: ipa-inline.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ipa-inline.c,v
retrieving revision 2.4
diff -c -3 -p -r2.4 ipa-inline.c
*** ipa-inline.c 19 May 2005 10:38:38 -0000 2.4
--- ipa-inline.c 23 May 2005 09:49:04 -0000
*************** Software Foundation, 59 Temple Place - S
*** 78,89 ****
--- 78,92 ----
#include "fibheap.h"
#include "intl.h"
#include "tree-pass.h"
+ #include "coverage.h"
/* Statistics we collect about inlining algorithm. */
static int ncalls_inlined;
static int nfunctions_inlined;
static int initial_insns;
static int overall_insns;
+ static int max_insns;
+ static gcov_type max_count;
/* Estimate size of the function after inlining WHAT into TO. */
*************** static int
*** 91,102 ****
cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
struct cgraph_node *what)
{
! tree fndecl = what->decl;
! tree arg;
int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
call_insns += estimate_move_cost (TREE_TYPE (arg));
! return (what->global.insns - call_insns) * times + to->global.insns;
}
/* E is expected to be an edge being inlined. Clone destination node of
--- 94,108 ----
cgraph_estimate_size_after_inlining (int times, struct cgraph_node *to,
struct cgraph_node *what)
{
! int size;
! tree fndecl = what->decl, arg;
int call_insns = PARAM_VALUE (PARAM_INLINE_CALL_COST);
+
for (arg = DECL_ARGUMENTS (fndecl); arg; arg = TREE_CHAIN (arg))
call_insns += estimate_move_cost (TREE_TYPE (arg));
! size = (what->global.insns - call_insns) * times + to->global.insns;
! gcc_assert (size >= 0);
! return size;
}
/* E is expected to be an edge being inlined. Clone destination node of
*************** cgraph_estimate_growth (struct cgraph_no
*** 209,214 ****
--- 215,222 ----
{
int growth = 0;
struct cgraph_edge *e;
+ if (node->global.estimated_growth != INT_MIN)
+ return node->global.estimated_growth;
for (e = node->callers; e; e = e->next_caller)
if (e->inline_failed)
*************** cgraph_estimate_growth (struct cgraph_no
*** 221,226 ****
--- 229,235 ----
if (!node->needed && !DECL_EXTERNAL (node->decl))
growth -= node->global.insns;
+ node->global.estimated_growth = growth;
return growth;
}
*************** cgraph_recursive_inlining_p (struct cgra
*** 298,349 ****
return recursive;
}
! /* Recompute heap nodes for each of callees. */
static void
! update_callee_keys (fibheap_t heap, struct fibnode **heap_node,
! struct cgraph_node *node)
{
struct cgraph_edge *e;
for (e = node->callees; e; e = e->next_callee)
! if (e->inline_failed && heap_node[e->callee->uid])
! fibheap_replace_key (heap, heap_node[e->callee->uid],
! cgraph_estimate_growth (e->callee));
else if (!e->inline_failed)
! update_callee_keys (heap, heap_node, e->callee);
}
! /* Enqueue all recursive calls from NODE into queue linked via aux pointers
! in between FIRST and LAST. WHERE is used for bookkeeping while looking
! int calls inlined within NODE. */
static void
lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! struct cgraph_edge **first, struct cgraph_edge **last)
{
struct cgraph_edge *e;
for (e = where->callees; e; e = e->next_callee)
if (e->callee == node)
{
! if (!*first)
! *first = e;
! else
! (*last)->aux = e;
! *last = e;
}
for (e = where->callees; e; e = e->next_callee)
if (!e->inline_failed)
! lookup_recursive_calls (node, e->callee, first, last);
}
/* Decide on recursive inlining: in the case function has recursive calls,
inline until body size reaches given argument. */
! static void
cgraph_decide_recursive_inlining (struct cgraph_node *node)
{
int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
! struct cgraph_edge *first_call = NULL, *last_call = NULL;
! struct cgraph_edge *last_in_current_depth;
struct cgraph_edge *e;
struct cgraph_node *master_clone;
int depth = 0;
--- 307,451 ----
return recursive;
}
! /* Return true if the call can be hot. */
! static bool
! cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
! {
! if (profile_info && flag_branch_probabilities
! && (edge->count
! <= profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
! return false;
! return true;
! }
!
! /* A cost model driving the inlining heuristics in a way so the edges with
! smallest badness are inlined first. After each inlining is performed
! the costs of all caller edges of nodes affected are recompted so the
! metrics may accurately depend on values such as number of inlinable callers
! of the function or function body size.
!
! For the moment we use estimated growth caused by inlining callee into all
! it's callers for driving the inlining but once we have loop depth or
! frequency information readilly available we should do better.
!
! With profiling we use number of executions of each edge to drive the cost.
! We also should distinguish hot and cold calls where the cold calls are
! inlined into only when code size is overall improved.
!
! Value INT_MAX can be returned to prevent function from being inlined.
! */
!
! static int
! cgraph_edge_badness (struct cgraph_edge *edge)
! {
! if (max_count)
! {
! int growth =
! cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
! growth -= edge->caller->global.insns;
!
! /* Always preffer inlining saving code size. */
! if (growth <= 0)
! return INT_MIN - growth;
! return ((int)((double)edge->count * INT_MIN / max_count)) / growth;
! }
! else
! {
! int nest = MIN (edge->loop_nest, 8);
! int badness = cgraph_estimate_growth (edge->callee) * 256;
!
! badness >>= nest;
!
! /* Make recursive inlining happen always after other inlining is done. */
! if (cgraph_recursive_inlining_p (edge->caller, edge->callee, NULL))
! return badness + 1;
! else
! return badness;
! }
! }
!
! /* Recompute heap nodes for each of caller edge. */
!
static void
! update_caller_keys (fibheap_t heap, struct cgraph_node *node,
! bitmap updated_nodes)
! {
! struct cgraph_edge *edge;
!
! if (!node->local.inlinable || node->local.disregard_inline_limits
! || node->global.inlined_to)
! return;
! if (bitmap_bit_p (updated_nodes, node->uid))
! return;
! bitmap_set_bit (updated_nodes, node->uid);
!
! for (edge = node->callers; edge; edge = edge->next_caller)
! if (edge->inline_failed)
! {
! int badness = cgraph_edge_badness (edge);
! if (edge->aux)
! {
! fibnode_t n = edge->aux;
! gcc_assert (n->data == edge);
! if (n->key == badness)
! continue;
!
! /* fibheap_replace_key only increase the keys. */
! if (fibheap_replace_key (heap, n, badness))
! continue;
! fibheap_delete_node (heap, edge->aux);
! }
! edge->aux = fibheap_insert (heap, badness, edge);
! }
! }
!
! /* Recompute heap nodes for each of caller edges of each of callees. */
!
! static void
! update_callee_keys (fibheap_t heap, struct cgraph_node *node,
! bitmap updated_nodes)
{
struct cgraph_edge *e;
+ node->global.estimated_growth = INT_MIN;
for (e = node->callees; e; e = e->next_callee)
! if (e->inline_failed)
! update_caller_keys (heap, e->callee, updated_nodes);
else if (!e->inline_failed)
! update_callee_keys (heap, e->callee, updated_nodes);
}
! /* Enqueue all recursive calls from NODE into priority queue depending on
! how likely we want to recursivly inline the call. */
!
static void
lookup_recursive_calls (struct cgraph_node *node, struct cgraph_node *where,
! fibheap_t heap)
{
+ static int priority;
struct cgraph_edge *e;
for (e = where->callees; e; e = e->next_callee)
if (e->callee == node)
{
! /* FIXME: Once counts and frequencies are available we should drive the
! order by these. For now force the order to be simple queue since
! we get order dependent on recursion depth for free by this. */
! fibheap_insert (heap, priority++, e);
}
for (e = where->callees; e; e = e->next_callee)
if (!e->inline_failed)
! lookup_recursive_calls (node, e->callee, heap);
}
/* Decide on recursive inlining: in the case function has recursive calls,
inline until body size reaches given argument. */
!
! static bool
cgraph_decide_recursive_inlining (struct cgraph_node *node)
{
int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
! fibheap_t heap;
struct cgraph_edge *e;
struct cgraph_node *master_clone;
int depth = 0;
*************** cgraph_decide_recursive_inlining (struct
*** 358,371 ****
/* Make sure that function is small enough to be considered for inlining. */
if (!max_depth
|| cgraph_estimate_size_after_inlining (1, node, node) >= limit)
! return;
! lookup_recursive_calls (node, node, &first_call, &last_call);
! if (!first_call)
! return;
if (dump_file)
fprintf (dump_file,
! "\nPerforming recursive inlining on %s\n",
cgraph_node_name (node));
/* We need original clone to copy around. */
--- 460,477 ----
/* Make sure that function is small enough to be considered for inlining. */
if (!max_depth
|| cgraph_estimate_size_after_inlining (1, node, node) >= limit)
! return false;
! heap = fibheap_new ();
! lookup_recursive_calls (node, node, heap);
! if (fibheap_empty (heap))
! {
! fibheap_delete (heap);
! return false;
! }
if (dump_file)
fprintf (dump_file,
! " Performing recursive inlining on %s\n",
cgraph_node_name (node));
/* We need original clone to copy around. */
*************** cgraph_decide_recursive_inlining (struct
*** 376,407 ****
cgraph_clone_inlined_nodes (e, true);
/* Do the inlining and update list of recursive call during process. */
! last_in_current_depth = last_call;
! while (first_call
&& cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
{
! struct cgraph_edge *curr = first_call;
! first_call = first_call->aux;
! curr->aux = NULL;
cgraph_redirect_edge_callee (curr, master_clone);
cgraph_mark_inline_edge (curr);
! lookup_recursive_calls (node, curr->callee, &first_call, &last_call);
!
! if (last_in_current_depth
! && ++depth >= max_depth)
! break;
n++;
}
! /* Cleanup queue pointers. */
! while (first_call)
! {
! struct cgraph_edge *next = first_call->aux;
! first_call->aux = NULL;
! first_call = next;
! }
if (dump_file)
fprintf (dump_file,
"\n Inlined %i times, body grown from %i to %i insns\n", n,
--- 482,511 ----
cgraph_clone_inlined_nodes (e, true);
/* Do the inlining and update list of recursive call during process. */
! while (!fibheap_empty (heap)
&& cgraph_estimate_size_after_inlining (1, node, master_clone) <= limit)
{
! struct cgraph_edge *curr = fibheap_extract_min (heap);
! struct cgraph_node *node;
! depth = 0;
! for (node = curr->caller;
! node; node = node->global.inlined_to)
! if (node->decl == curr->callee->decl)
! depth++;
! if (depth > max_depth)
! continue;
+ if (dump_file)
+ fprintf (dump_file,
+ " Inlining call of depth %i\n", depth);
cgraph_redirect_edge_callee (curr, master_clone);
cgraph_mark_inline_edge (curr);
! lookup_recursive_calls (node, curr->callee, heap);
n++;
}
! fibheap_delete (heap);
if (dump_file)
fprintf (dump_file,
"\n Inlined %i times, body grown from %i to %i insns\n", n,
*************** cgraph_decide_recursive_inlining (struct
*** 415,420 ****
--- 519,525 ----
if (node->global.inlined_to == master_clone)
cgraph_remove_node (node);
cgraph_remove_node (master_clone);
+ return true;
}
/* Set inline_failed for all callers of given function to REASON. */
*************** static void
*** 442,452 ****
cgraph_decide_inlining_of_small_functions (void)
{
struct cgraph_node *node;
fibheap_t heap = fibheap_new ();
! struct fibnode **heap_node =
! xcalloc (cgraph_max_uid, sizeof (struct fibnode *));
! int max_insns = ((HOST_WIDEST_INT) initial_insns
! * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
/* Put all inline candidates into the heap. */
--- 547,558 ----
cgraph_decide_inlining_of_small_functions (void)
{
struct cgraph_node *node;
+ struct cgraph_edge *edge;
fibheap_t heap = fibheap_new ();
! bitmap updated_nodes = BITMAP_ALLOC (NULL);
!
! if (dump_file)
! fprintf (dump_file, "\nDeciding on smaller functions:\n");
/* Put all inline candidates into the heap. */
*************** cgraph_decide_inlining_of_small_function
*** 455,541 ****
if (!node->local.inlinable || !node->callers
|| node->local.disregard_inline_limits)
continue;
if (!cgraph_default_inline_p (node))
{
cgraph_set_inline_failed (node,
N_("--param max-inline-insns-single limit reached"));
continue;
}
- heap_node[node->uid] =
- fibheap_insert (heap, cgraph_estimate_growth (node), node);
- }
! if (dump_file)
! fprintf (dump_file, "\nDeciding on smaller functions:\n");
! while (overall_insns <= max_insns && (node = fibheap_extract_min (heap)))
{
- struct cgraph_edge *e, *next;
int old_insns = overall_insns;
- heap_node[node->uid] = NULL;
if (dump_file)
- fprintf (dump_file,
- "\nConsidering %s with %i insns\n"
- " Estimated growth is %+i insns.\n",
- cgraph_node_name (node), node->global.insns,
- cgraph_estimate_growth (node));
- if (!cgraph_default_inline_p (node))
{
! cgraph_set_inline_failed (node,
! N_("--param max-inline-insns-single limit reached after inlining into the callee"));
! continue;
}
! for (e = node->callers; e; e = next)
{
! next = e->next_caller;
! if (e->inline_failed)
{
! struct cgraph_node *where;
!
! if (cgraph_recursive_inlining_p (e->caller, e->callee,
! &e->inline_failed)
! || !cgraph_check_inline_limits (e->caller, e->callee,
! &e->inline_failed))
! {
! if (dump_file)
! fprintf (dump_file, " Not inlining into %s:%s.\n",
! cgraph_node_name (e->caller), e->inline_failed);
! continue;
! }
! next = cgraph_mark_inline (e);
! where = e->caller;
! if (where->global.inlined_to)
! where = where->global.inlined_to;
!
! if (heap_node[where->uid])
! fibheap_replace_key (heap, heap_node[where->uid],
! cgraph_estimate_growth (where));
!
if (dump_file)
! fprintf (dump_file,
! " Inlined into %s which now has %i insns.\n",
! cgraph_node_name (e->caller),
! e->caller->global.insns);
}
}
! cgraph_decide_recursive_inlining (node);
!
! /* Similarly all functions called by the function we just inlined
! are now called more times; update keys. */
! update_callee_keys (heap, heap_node, node);
if (dump_file)
fprintf (dump_file,
" Inlined for a net change of %+i insns.\n",
overall_insns - old_insns);
}
! while ((node = fibheap_extract_min (heap)) != NULL)
! if (!node->local.disregard_inline_limits)
! cgraph_set_inline_failed (node, N_("--param inline-unit-growth limit reached"));
fibheap_delete (heap);
! free (heap_node);
}
/* Decide on the inlining. We do so in the topological order to avoid
--- 561,721 ----
if (!node->local.inlinable || !node->callers
|| node->local.disregard_inline_limits)
continue;
+ if (dump_file)
+ fprintf (dump_file, "Considering inline candidate %s.\n", cgraph_node_name (node));
+ node->global.estimated_growth = INT_MIN;
if (!cgraph_default_inline_p (node))
{
cgraph_set_inline_failed (node,
N_("--param max-inline-insns-single limit reached"));
continue;
}
! for (edge = node->callers; edge; edge = edge->next_caller)
! if (edge->inline_failed)
! {
! gcc_assert (!edge->aux);
! edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge), edge);
! }
! }
! while (overall_insns <= max_insns && (edge = fibheap_extract_min (heap)))
{
int old_insns = overall_insns;
+ struct cgraph_node *where;
+ int growth =
+ cgraph_estimate_size_after_inlining (1, edge->caller, edge->callee);
+
+ growth -= edge->caller->global.insns;
if (dump_file)
{
! fprintf (dump_file,
! "\nConsidering %s with %i insns to be inlined into %s\n"
! " Estimated growth after inlined into all callees is %+i insns.\n"
! " Estimated badness is %i.\n",
! cgraph_node_name (edge->callee),
! edge->callee->global.insns,
! cgraph_node_name (edge->caller),
! cgraph_estimate_growth (edge->callee),
! cgraph_edge_badness (edge));
! if (edge->count)
! fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
}
! gcc_assert (edge->aux);
! edge->aux = NULL;
! if (!edge->inline_failed)
! continue;
!
! /* When not having profile info ready we don't weight by any way the
! possition of call in procedure itself. This means if call of
! function A from function B seems profitable to inline, the recursive
! call of function A in inline copy of A in B will look profitable too
! and we end up inlining until reaching maximal function growth. This
! is not good idea so prohibit the recursive inlining.
!
! ??? When the frequencies are taken into account we might not need this
! restriction. */
! if (!max_count)
{
! where = edge->caller;
! while (where->global.inlined_to)
{
! if (where->decl == edge->callee->decl)
! break;
! where = where->callers->caller;
! }
! if (where->global.inlined_to)
! {
! edge->inline_failed
! = (edge->callee->local.disregard_inline_limits ? N_("recursive inlining") : "");
if (dump_file)
! fprintf (dump_file, " inline_failed:Recursive inlining perfomed only for function itself.\n");
! continue;
}
}
! if (!cgraph_maybe_hot_edge_p (edge) && growth > 0)
! {
! if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! &edge->inline_failed))
! {
! edge->inline_failed =
! N_("call is unlikely");
! if (dump_file)
! fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! }
! continue;
! }
! if (!cgraph_default_inline_p (edge->callee))
! {
! if (!cgraph_recursive_inlining_p (edge->caller, edge->callee,
! &edge->inline_failed))
! {
! edge->inline_failed =
! N_("--param max-inline-insns-single limit reached after inlining into the callee");
! if (dump_file)
! fprintf (dump_file, " inline_failed:%s.\n", edge->inline_failed);
! }
! continue;
! }
! if (cgraph_recursive_inlining_p (edge->caller, edge->callee,
! &edge->inline_failed))
! {
! where = edge->caller;
! if (where->global.inlined_to)
! where = where->global.inlined_to;
! if (!cgraph_decide_recursive_inlining (where))
! continue;
! update_callee_keys (heap, where, updated_nodes);
! }
! else
! {
! if (!cgraph_check_inline_limits (edge->caller, edge->callee,
! &edge->inline_failed))
! {
! if (dump_file)
! fprintf (dump_file, " Not inlining into %s:%s.\n",
! cgraph_node_name (edge->caller), edge->inline_failed);
! continue;
! }
! cgraph_mark_inline_edge (edge);
! update_callee_keys (heap, edge->callee, updated_nodes);
! }
! where = edge->caller;
! if (where->global.inlined_to)
! where = where->global.inlined_to;
!
! /* Our profitability metric can depend on local properties
! such as number of inlinable calls and size of the function body.
! After inlining these properties might change for the function we
! inlined into (since it's body size changed) and for the functions
! called by function we inlined (since number of it inlinable callers
! might change). */
! update_caller_keys (heap, where, updated_nodes);
! bitmap_clear (updated_nodes);
if (dump_file)
fprintf (dump_file,
+ " Inlined into %s which now has %i insns.\n",
+ cgraph_node_name (edge->caller),
+ edge->caller->global.insns);
+ if (dump_file)
+ fprintf (dump_file,
" Inlined for a net change of %+i insns.\n",
overall_insns - old_insns);
}
! while ((edge = fibheap_extract_min (heap)) != NULL)
! {
! gcc_assert (edge->aux);
! edge->aux = NULL;
! if (!edge->callee->local.disregard_inline_limits && edge->inline_failed
! && !cgraph_recursive_inlining_p (edge->caller, edge->callee,
! &edge->inline_failed))
! edge->inline_failed = N_("--param inline-unit-growth limit reached");
! }
fibheap_delete (heap);
! BITMAP_FREE (updated_nodes);
}
/* Decide on the inlining. We do so in the topological order to avoid
*************** cgraph_decide_inlining (void)
*** 551,559 ****
int old_insns = 0;
int i;
for (node = cgraph_nodes; node; node = node->next)
! initial_insns += node->local.self_insns;
overall_insns = initial_insns;
nnodes = cgraph_postorder (order);
--- 731,751 ----
int old_insns = 0;
int i;
+ timevar_push (TV_INLINE_HEURISTICS);
+ max_count = 0;
for (node = cgraph_nodes; node; node = node->next)
! {
! struct cgraph_edge *e;
! initial_insns += node->local.self_insns;
! for (e = node->callees; e; e = e->next_callee)
! if (max_count < e->count)
! max_count = e->count;
! }
overall_insns = initial_insns;
+ gcc_assert (!max_count || (profile_info && flag_branch_probabilities));
+
+ max_insns = ((HOST_WIDEST_INT) overall_insns
+ * (100 + PARAM_VALUE (PARAM_INLINE_UNIT_GROWTH)) / 100);
nnodes = cgraph_postorder (order);
*************** cgraph_decide_inlining (void)
*** 668,673 ****
--- 860,866 ----
/* We will never output extern functions we didn't inline.
??? Perhaps we can prevent accounting of growth of external
inline functions. */
+
cgraph_remove_unreachable_nodes (false, dump_file);
if (dump_file)
*************** cgraph_decide_inlining (void)
*** 677,682 ****
--- 870,876 ----
ncalls_inlined, nfunctions_inlined, initial_insns,
overall_insns);
free (order);
+ timevar_pop (TV_INLINE_HEURISTICS);
}
/* Decide on the inlining. We do so in the topological order to avoid
Index: passes.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/passes.c,v
retrieving revision 2.90
diff -c -3 -p -r2.90 passes.c
*** passes.c 27 Apr 2005 21:35:20 -0000 2.90
--- passes.c 23 May 2005 09:49:05 -0000
*************** static void
*** 1427,1433 ****
rest_of_clean_state (void)
{
rtx insn, next;
- coverage_end_function ();
/* It is very important to decompose the RTL instruction chain here:
debug information keeps pointing into CODE_LABEL insns inside the function
--- 1427,1432 ----
Index: profile.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/profile.c,v
retrieving revision 1.157
diff -c -3 -p -r1.157 profile.c
*** profile.c 21 Apr 2005 09:17:19 -0000 1.157
--- profile.c 23 May 2005 09:49:05 -0000
*************** branch_prob (void)
*** 1141,1146 ****
--- 1141,1147 ----
free_edge_list (el);
if (flag_branch_probabilities)
profile_status = PROFILE_READ;
+ coverage_end_function ();
}
/* Union find algorithm implementation for the basic blocks using
Index: timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.48
diff -c -3 -p -r1.48 timevar.def
*** timevar.def 17 May 2005 20:28:22 -0000 1.48
--- timevar.def 23 May 2005 09:49:05 -0000
*************** DEFTIMEVAR (TV_CPP , "preprocessin
*** 60,65 ****
--- 60,66 ----
DEFTIMEVAR (TV_LEX , "lexical analysis")
DEFTIMEVAR (TV_PARSE , "parser")
DEFTIMEVAR (TV_NAME_LOOKUP , "name lookup")
+ DEFTIMEVAR (TV_INLINE_HEURISTICS , "inline heuristics")
DEFTIMEVAR (TV_INTEGRATION , "integration")
DEFTIMEVAR (TV_TREE_GIMPLIFY , "tree gimplify")
DEFTIMEVAR (TV_TREE_EH , "tree eh")
Index: tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.101
diff -c -3 -p -r2.101 tree-optimize.c
*** tree-optimize.c 19 May 2005 10:38:42 -0000 2.101
--- tree-optimize.c 23 May 2005 09:49:05 -0000
*************** init_tree_optimization_passes (void)
*** 377,387 ****
NEXT_PASS (pass_build_cfg);
NEXT_PASS (pass_pre_expand);
NEXT_PASS (pass_warn_function_return);
*p = NULL;
p = &all_passes;
NEXT_PASS (pass_fixup_cfg);
- NEXT_PASS (pass_tree_profile);
NEXT_PASS (pass_init_datastructures);
NEXT_PASS (pass_all_optimizations);
NEXT_PASS (pass_warn_function_noreturn);
--- 377,387 ----
NEXT_PASS (pass_build_cfg);
NEXT_PASS (pass_pre_expand);
NEXT_PASS (pass_warn_function_return);
+ NEXT_PASS (pass_tree_profile);
*p = NULL;
p = &all_passes;
NEXT_PASS (pass_fixup_cfg);
NEXT_PASS (pass_init_datastructures);
NEXT_PASS (pass_all_optimizations);
NEXT_PASS (pass_warn_function_noreturn);
*************** tree_lowering_passes (tree fn)
*** 682,688 ****
void
ipa_passes (void)
{
! execute_pass_list (all_ipa_passes);
}
--- 682,690 ----
void
ipa_passes (void)
{
! bitmap_obstack_initialize (NULL);
! execute_pass_list (all_ipa_passes);
! bitmap_obstack_release (NULL);
}