This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Inlining heuristics for C++


Carlo Wood <carlo@alinoe.com> writes:

> On Mon, Jul 09, 2001 at 09:46:59PM -0400, Daniel Berlin wrote:
>> This is effectively expressing the rule: "Small functions should be
>> inlined into larger ones.  Larger functions should not be inlined into
>> small ones".
> 
> This doesn't seem logical.  I think I can follow the reasoning behing it
> though:
> 
> - The overhead of a function call can be neglected for large functions,
>   therefore it only makes sense to worry about inlining and small functions.
>   As a result, only small functions are normally intented to be inlined
>   by the programmer.
> 
> This works perfectly in C, but for some reason C++ is different: there is
> being done a LOT more inlining.  Especially, many functions are inlined
> that are not even visible to the programmer.
> 
> Therefore I propose to let go the 'human' factor - and look at inlining
> from a mathematical point of view.  Suppose we have:
> 
> int f1(int i)
> {
>   // something
>   return result;
>}
> 
> int f2(int i)
> {
>   // uses f1() once
>   return result;
>}
> 
> And suppose that f1() is *only* called from within f2().
> Then does the size of f1 matter?

You are missing something important.

Whatever calls f2 will recursively inline f1 back into f2, for that
particular inlining.

We recursively inline.

So if you have

int f3(int i)
{
        ....
        f2(a);
}

f3 will inline f2, then expand f1 inline into f2.
The thing we *don't* want is to inline f1 into f2 for f2, if f1 was
much larger than f2.

> 
> So, back to the example:
> 
> int f1(int i)
> {
>   // 100 lines of code
>   return result;
>}
> 
> int f2(int i)
> {
>   // uses f1() once, 1000 lines of code
>   return result;
>}
> 
> int f3(int i)
> {
>   // uses f2 once, 10 lines of code
>   return result;
>}
> 
> int main(void)
> {
>   int s = 0;
>   for (int i = 0; i < 1000000; ++i)
>     s += f3(i); 
>   cout << s << '\n';
>   return 0;
>}
> 
> and,
> 
> by inlining f3 in main() we lose one million times a call overhead;

> by inlining recursively f2, we lose again one million times a call overhead;
> and by inlining recursively f1, we lose again one million times a call overhead.
> 
> Despite the fact that we have to compare "one million times a call overhead"
> with the different object code growths, the sizes of f1, f2 and f3 relative
> to eachother are totally irrelevant.

No, they aren't.

Once again, we recursively inline starting at some base function.

We will inline f3 and into main.
f2 won't make the cut off for inlining into the f3 inline that goes
into main.

However, we'll inline f1 into f2 when we generate the non-inline code
for f2 (which we do for all non-statics).

f2 is a big enough function that inlining into f3 is going to increase
code size and processing time, but not buy you performance.



People forget we recursively inline.

So if they were in relative size compared to main, we'd now (with my
heuristic) inline main->f3->f2->f1 (IE main would call nothing
anymore).

The problem is then that we also currently inline

f3->f2->f1
f2->f1


When generating code for those.

Think of all the hidden functions.

So if your constructor calls a function, which calls two functions, we
end up with
inlined:
main->your constructor->the function->two large functions->whatever they
main->call->till we hit 10000 insns or run out of stuff to inline

Then we inline, when outputting the constructor

constructor->the function->two functions->etc

then 
the function->two functions->etc


Consider if more than just the constructor calls the function.

Everything that calls it, as long as we don't hit 10000 insns for the
function at the top of the tree, will inline the the function->two functions->etc,
whether it makes sense or not.


What my heuristics would do is say

main->your constructor->... is fine
your constructor->two large functions->... by itself doesn't make much
sense. Whatever calls the constructor outside the compilation unit is
incurring enough call overhead anyway that inlining these two large
functions into the constructor just increases code size, not performance. 

each of the two large functions->... is fine



-- 
"I was going 70 miles an hour and got stopped by a cop who said,
"Do you know the speed limit is 55 miles per hour?"  "Yes,
officer, but I wasn't going to be out that long..."
"-Steven Wright


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]