This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC/patch] Callgraph based inlining heuristics
- From: Jan Hubicka <jh at suse dot cz>
- To: Steven Bosscher <s dot bosscher at student dot tudelft dot nl>
- Cc: Jan Hubicka <jh at suse dot cz>, gcc at gcc dot gnu dot org, mark at codesourcery dot com,rth at redhat dot com, aj at suse dot de
- Date: Mon, 23 Jun 2003 15:51:48 +0200
- Subject: Re: [RFC/patch] Callgraph based inlining heuristics
- References: <20030622232032.GC3266@kam.mff.cuni.cz> <3EF6FCF2.6080301@student.tudelft.nl>
> Jan Hubicka wrote:
>
> >Hi,
> >I finally got C++ unit-at-a-time into shape so it don't consume all my free
> >time and got into implementing first try of my proposed inlining heuristics
> >that uses callgraph to decide on inlining top-down while inline using
> >current
> >code bottom-up hopefully getting better from the both worlds.
> >
> >I am attaching somewhat raw implementation of the idea. At the moment I am
> >using cgraph edges to flag on whether particular inlining will happen so
> >function can be inlinined into once caller but not inlined into another,
> >but
> >the decisions must be transitive.
> >
> I am not sure what exactly you're saying in the last sentence here.
Assume that function A calls B that calls C.
If we decide to make A leaf, we must also inline all calls to C into B
in the other incarnations of B as there is only one edge in between A
and B.
>
> >(that is not the case of attribute leafify proposed - doing the
> >datastructure
> >in full generality allowing each particular walk in the callgraph to be
> >either
> >collapsed or not is probably too expensive so attribute leafify can be
> >implemented as a special case).
> >
> Assuming we really want such an attribute...
It would be better to do it wihout, yes :)
I am thinking about making the inline limits less strict when doing so
would make function leaf and making less strict limit for inlining leaf
functions.
I implemented the second and it seems to reduce call density.
> >After flag is set the size of function bodies is update by adding the size
> >of
> >inlined body minus 1. (this is the same as estimation in tree-inline.c.
> >Not
> >very good. We can consider choosing functions that may inline and trying
> >to
> >analyze them harder, in the extreme case even produce rtl and profile and
> >copmpute average amount of insns executed).
> >
> If you use the same estimate for the function size as in tree-inline.c,
> can't you just share that code with tree-inline? I was thinking about
> splitting up inlinable_function_p() into a function that looks for
> inline inhibiting functions and a function to estimate the size of the
> inlinable function body (to get rid of that ugly "nolimit" flag). Then
> if we come up with a better way to estimate the function body size, we
> only have to rework the code in one place.
Actually I am dreaming about simply droipping the current heuristics in
tree-inline, making inlinable_function_p to always do "nolimit" so it
just return whether function X can be inlined and moving everything to
unit-at-a-time code.
>
> (BTW tree-inline.c needs to be cleaned up in other ways too, e.g. move
> walk_tree out to tree.c...)
>
> >In the second stage I proceed topologically again inline small functions
> >updating the size of bodies and in the third stage I decide on inlining
> >once.
> >
> Does that mean we no longer inline large functions into small ones if
> the large function is called more than once? That is one of the things
> people have suggested many times to reduce the compile time cost of
> inlining.
I do something like this.
When X is marked always_inline, inline it to all callers but avoid
cycles
when X is small *after* we inline into it, inline it to all callers that
don't grow up too much (to avoid cases where relatively small function
is inlined 1000000 times into another function).
when X is called just once, it is local, then inline it into caller but
also watch for the size of caller.
>
> >I've added new constant specifying large functions (30000 statements) and
> >I am
> >limiting growth of the large functions by factor of 2. This avoids
> >quadratic
> >bottleneck seen on insn-recog during bootstrap.
> >
> 30000 statements still seems rather large. How did you get to this
> number? Are you still always inlining really small functions even if
Just dreamed it up. It was first shot and I didn't tried to change it.
> the large function is larger than 2 times its original size?
>
> >I tested libstdc++ compilation with -O2 and get 10% savings of the .so
> >library
> >size (with debug info in) and about 5% compiler speedup. I also tried
> >PR/8362 testcase.
> >
> PR 8361 :-)
> >
> That looks interesting. "expand" goes down from 13 to 4 seconds, so
> you're producing fewer insns there already, very nice!
> Do you know where the difference comes from, ie. which functions don't
> get inlined with your code that used to be inlined before?
>From the dump, the heuristics preventing large function bodies does not
fire until I increase limit of small functions to 1000, so it is always
the case wher function body after inlining grows over the limit.
>
> >I also tested it on some of C SPEC2000 sources and it does not seem to
> >supress
> >any important inlining. I hope to do some behchmarking tomorrow as I need
> >some
> >sleep now.
> >
> >So :) any comments are welcome. Also I would like to know about other
> >testcases where inliner behaves in broken way.
> >
> PR10160 is a good one. You can of course try POOMA and Deal II, and
> maybe you can get someone to try Linux kernel without the always_inline
> hack they now need for GCC 3.3.
To make always_inline go away, we actually need to increase inlining
limit from current 300. As I was mentioning in the other post, I hope I
will be able to do that as new heuristics does not get crazy then.
> >+ #define INSNS_PER_STMT 10
> >+ #define INSNS_PER_CALL 10
> >
> IIRC there is some code in integrate.c that takes into account the
> number of arguments and how they are passed in estimating the size of a
> call. I have no idea if there's any benefit in a more accurate
> estimate, though, as long as we estimate 10 insns for all statements no
> matter what.
>
> Why do you think there should be a separate INSNS_PER_CALL anyway? You
> estimate 10 insns for calls in the function body you're expanding
> inline, so why have a different number (potentially) for the call you're
> inlining??
>
> Oh, and again, it would be nice to just have this in tree-inline.c
At the moment i just use the same code as tree-inline.c to make these
two comparable. The way of computing statements to estimate size is
just crazy. With gimple we should be able to do better, I can try to
put together something for current trees, but definitly I want to do
> >+ #define LARGE_FUNCTION_INSNS 30000
> >+ #define LARGE_FUNCTION_GROWTH 200
> >
> Maybe add comment here that these are multiples of 100. Actually, why
> is LARGE_FUNCTION_GROWTH not just 2? Every time you multiply something
It is in percents as I think something like 120% would make more sense.
These two should go into params.def but I didn't get time to do that
yet.
> >
> Are you ever supposed to get here if flag_inline_trees is not set?
> Languages that do not use the tree inliner can never do unit-at-a-time
> either, and unit-at-a-time implies flag_inline_functions, doesn't it?
Hmm, probably yes :)
> >!
> >! We do simple backgracing to get INLINED_TIMES right. This should not
> >be
> >
> What is backgracing?
backtracing.
> >! expensive as we limit amount of inlining. Alternatively we may first
> >
> _the_ amount of ...
>
> >! discover set of nodes, to topological sort on these and propagate
> >! information. */
> >
> discover the set of nodes, topologically sort these and propagate
> information.
>
> What information exactly BTW?
The amount of times function WHAT will be inlined into TO.
> >! struct cgraph_node *caller;
> >!
> >! /* Look at the edge on the top of the stack. */
> >! e = stack[sp - 1];
> >! caller = e->caller;
> >
> Purely cosmetic: If you always use (sp - 1) to get the top of the stack,
> you might as well start with sp=-1 and check for sp<0 to see if
> something is left on the stack....
I just cut&pasted this from same code in cfganal.c, perhaps I can
cleanup both :)
>
> >! for (e1 = e->next_caller; e1; e1 = e1->next_caller)
> >! if (e1->inline_call)
> >! break;
> >
> I see these loops all the time. Maybe this deserves a macro...
These are supposed to get different once I start to do something smart,
but perhaps I can do that...
>
> >! /* Estimate size of the function after inlining WHAT into TO. */
> >! static int
> >! cgraph_estimate_size_after_inlining (int times,
> >! struct cgraph_node *to,
> >! struct cgraph_node *what)
> >! {
> >! return to->global.insns + (what->global.insns - INSNS_PER_CALL *
> >times);
> >! }
> >
> Should be in tree-inline
It depends in cgraph datastructures that are not available in
tree-inline all the time :(
> >! static bool
> >! cgraph_default_inline_p (struct cgraph_node *n)
> >! {
> >! if (!DECL_INLINE (n->decl))
> >! return false;
> >! if (DID_INLINE_FUNC (n->decl))
> >! return n->global.insns < MAX_INLINE_INSNS_AUTO;
> >! else
> >! return n->global.insns < MAX_INLINE_INSNS_SINGLE;
> >! }
> >
> Here you don't seem to take care of really small functions, ie.
> one-liners. Is that deliberate? In tree-inline, we have
> MIN_INLINE_INSNS for that, and at its current value, inlining such
> functions is almost always beneficial (because 100 of the 130 insns for
> MIN_INLINE_INSNS are for statements that don't produce insns, ie.
> SCOPE_STMT, stuff like that).
I think I don't need MIN_INLINE_INSNS as I don't decrease the inline
limits with inlining depth as tree inlining does, so when
MAX_INLINE_INSNS_AUTO/SINGLE is more than 20, I will always inline one
statement functions.
> Not sure what you're doing here. First you set node->output for
> functions with a DECL_SAVED_TREE, ie. deferred functions. Isn't that
> true for all functions in unit-at-a-time? Then you reset node->output
No, callgraph also contains functions that are not in the current unit
(external functions). These don't have DECL_SAVED_TREE. I am not
interested in seeing these as I can't inline them anyway.
> for all functions, but cgraph_postorder already seems to reset it for
> functions which have it set... Maybe needs a comment?
I've already cleaned this up in another version of patch I sent this
morning.
> >! warning_with_decl (node->decl,
> >! "Always inline flag ignored in respect to
> >`%s' to avoid cycle");
> >
> Can't you instead do:
>
> warning ("%Halways_inline attribute ignored for `%D' to allow recursion",
> &DECL_SOURCE_LOCATION (node->decl), decl);
Perhaps, but it is not exactly the case - I don't ignore it in all
calees, just in the calees where I would get into infinite recursion
while inlining.
>
> >! /* In the first pass mark all always_inline edges. Do this with a
> >priority
> >! so no our decisions makes this impossible. */
> >
>
> This is the same comment as before. What's the idea here, is this the
> second pass?? It's hard to tell from the code without any comments below...
Cut&pasting :)
>
> Looks like you've worked really hard on this, and that it could be a
> real improvement. Do you have some memory statistics as well, because
> having all function bodies hanging around as trees may be expensive.
With the 8361 the memory peak is about 20% lower than without
unit-at-a-time. C++ in most cases keeps the bodies around anyway so
this should not affect that much, but I will try to test it more.
I don't know about the usual case as I am not quite sure how to measure
it....
Honza
>
> Gr.
> Steven
>
>