This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Inlining heuristics for C++

To: Carlo Wood <carlo at alinoe dot com>
Subject: Re: Inlining heuristics for C++
From: Daniel Berlin <dan at cgsoftware dot com>
Date: Tue, 10 Jul 2001 00:25:58 -0400
Cc: Daniel Berlin <dan at cgsoftware dot com>, gcc at gcc dot gnu dot org
References: <87r8vpo8rw.fsf@cgsoftware.com> <20010710045607.B30081@alinoe.com><87d779o3y5.fsf@cgsoftware.com> <20010710060551.C30081@alinoe.com>

Carlo Wood <carlo@alinoe.com> writes:

> On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote:
>> > And suppose that f1() is *only* called from within f2().
>> > Then does the size of f1 matter?
>> 
>> You are missing something important.
>> 
>> Whatever calls f2 will recursively inline f1 back into f2, for that
>> particular inlining.
>> 
>> We recursively inline.
> 
> I understood that.  Thats the whole reason why I say that the size
> of f1 doesn't matter: all you lose is one call-overhead, without
> growing the size of the code at all.  Fergus also said:
> 
> "There is an exception to this.  Functions which are only called once,
> and which are defined in this translation unit and not exported, should
> almost always be inlined (after which the original copy of the function
> is dead code and can be eliminated)"
> 
> To which you also disagreed.  There must be a misunderstanding here :),
> I think Fergus is perfectly right and actually better worded what I wanted
> to say.
> 
>> > Despite the fact that we have to compare "one million times a call overhead"
>> > with the different object code growths, the sizes of f1, f2 and f3 relative
>> > to eachother are totally irrelevant.
>> 
>> No, they aren't.
>> 
>> Once again, we recursively inline starting at some base function.
>> 
>> We will inline f3 and into main.
>> f2 won't make the cut off for inlining into the f3 inline that goes
>> into main.
> 
> That makes, thus, no sense.  Because inlining f2 into the f3 inline gains
> *again* one million call-overheads.

However, since f1 was inlined into f2, the call overhead just became
less important.

>  It is true that it makes the code
> effectively grow 1000 lines, but that is an absolute number and not something
> that is relevant to compare relatively (with the size of f3 in this case,
> or with the size of main+f3 for that matter).

You can't do it perfectly no matter what you try, because we don't
have the architecture.

>  The only thing that matters
> is the actual size of f2 (1000 lines) and the number of times it is called
> (1,000,000 times). 
We don't know how many times it's called at this point. 

> The sizes of main and f3 don't come into the picture,
> and hence the proposed rule can't be correct.
Sure, it's not a *rule*, it's a heuristic.
When we have the architecture to determine the above, i'll change it
to take the call counts, loop depth, etc, into account.

> 
>> However, we'll inline f1 into f2 when we generate the non-inline code
>> for f2 (which we do for all non-statics).
>> 
>> f2 is a big enough function that inlining into f3 is going to increase
>> code size and processing time, but not buy you performance.
> 
> That depends on how often f3 is called.
And how much gets inlined into f2.
> 
>> 
>> People forget we recursively inline.
> 
> Please note that I did not forget that.  There must be some other misunderstanding
> by me or you (heh - is *this* the case were one puts 'me' upfront instead of
> second? ;).
:)

I just don't think the particular case above is as important as you
do.
And I think we get it sooooooo wrong soooooo often right now, that
the current heuristic is worse than my proposed one.

Given that we are spending tons of compilation time, and getting so
little out of it, i think whatever pessimization i perform in the
above particular case, is outweighed by the compile time factor.

These are parameters you can twiddle.
You can say you don't care, and watch your compile time shoot back up
to hell, along with your code size.
But i'm pretty sure it's not going to speed up more than 2% or so.

> 
>> So if they were in relative size compared to main, we'd now (with my
>> heuristic) inline main->f3->f2->f1 (IE main would call nothing
>> anymore).
> 
> This decision should be based on the fact that f3 is inside a loop, and not
> on the fact that main() has a considerable size, however.

We don't have this architecture.

> 
>> The problem is then that we also currently inline
>> 
>> f3->f2->f1
>> f2->f1
> 
> I understand.  In this case one can easily do the following reasoning:
> Knowing that f3 is _called_ (not inlined), it was apparently not important
> enough to inline it.  Hence we are not inside a loop, and thus it is not
> needed to inline f2 into f3.  Again however, I disagree that the size of
> f2 *relative* to f3 matters; the only thing that matters in this case is
> the absolute size of f2.  I'd change your heuristic rule to look at the
> size of function in absolute terms, not relative compared to the function
> that it is called from.
No, because then we'll do f2->f1 because the size doesn't relatively
matter, only the absolute size matters.
In fact, the current heuristic is that only absolute size of the top
level function matters.
We know this doesn't work well in most cases: good at some, and is
bad at a lot more.
I'm trying to having something that is good in most cases: bad at
some, and good at more than we are bad at.

> 
>> When generating code for those.
>> 
>> Think of all the hidden functions.
>> 
>> So if your constructor calls a function, which calls two functions, we
>> end up with
>> inlined:
>> main->your constructor->the function->two large functions->whatever they
>> main->call->till we hit 10000 insns or run out of stuff to inline
>> 
>> Then we inline, when outputting the constructor
>> 
>> constructor->the function->two functions->etc
>> 
>> then 
>> the function->two functions->etc
>> 
>> 
>> Consider if more than just the constructor calls the function.
>> 
>> Everything that calls it, as long as we don't hit 10000 insns for the
>> function at the top of the tree, will inline the the function->two functions->etc,
>> whether it makes sense or not.
>> 
>> 
>> What my heuristics would do is say
>> 
>> main->your constructor->... is fine
>> your constructor->two large functions->... by itself doesn't make much
>> sense.
> 
> I agree.  But based on the fact that it is unlikely that a constructor
> that calls two large functions is within a loop to begin with.  Fortunately
> here you talk about "a large function", instead of "a function large compared
> to the constructor size".  That is a difference that matter and that should
> be reflected in the patch.
I don't understand, please paint me a picture if i don't get it in the
morning. I'm slow at midnight.
:)

>> Whatever calls the constructor outside the compilation unit is
>> incurring enough call overhead anyway that inlining these two large
>> functions into the constructor just increases code size, not performance. 
>> 
>> each of the two large functions->... is fine
> 
> But... now I see something that I didn't see before :).
> Perhaps the call-overhead is much less important then additional
> optimizations that can be done AFTER inlining.
Yes.
>   By not inlining we
> might miss important optimizations.  This wouldn't speed up compile
> time, but what about to TRY inlining and then make the decision
> based on code size reduction that follows from it?

HP does this, and i have a paper on doing this.  You have a budget for
time to spend inlining a given function.  However, we don't have
anywhere near the architecture at the tree level to make use of it.

We can only go with simple heuristics right now, at least until
Diego's basic blocks on trees stuff.

I'm happy to build a great inliner that takes into account all kinds
of stuff, and performs well.

And when we have the architecture, i'll do it.
But i don't have time to build all that architecture all by myself,
when i know others are working on it.

> overhead is rather constant (a few assembly instructions), reductions
> as a result of actual inlining can be rather HUGE (also mentioned by
> Timothy J. Wood (Nice last name) in his example of calling large
> functions with constant parameters that cause large code blocks to
> drop out).

>   Ok, that is too much asked... Lets say, we do what your
> patch does but 1) look at the absolute size of the to-inline function,
> 2) inline like we inline now when any parameter passed is a
>    constant?

Sure, i'm happy to do this, once i understand #1.
#2 is easy.

We also inline when it says inline, so that's, once again, always a
fall back if the heuristic is too stupid.

Right now we inline *way* too much. I'd rather have people have to add
inline.

Theres no way to say "noinline"
:)


> 
> -- 
> Carlo Wood <carlo@alinoe.com>

-- 
"I have the world's largest collection of seashells.  I keep it
on all the beaches of the world...  Perhaps you've seen it.
"-Steven Wright

Follow-Ups:
- Re: Inlining heuristics for C++
  - From: Carlo Wood

References:
- Inlining heuristics for C++
  - From: Daniel Berlin
- Re: Inlining heuristics for C++
  - From: Carlo Wood
- Re: Inlining heuristics for C++
  - From: Daniel Berlin
- Re: Inlining heuristics for C++
  - From: Carlo Wood

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]