This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Inlining fails on very simple code
- From: degger at fhm dot edu
- To: aoliva at redhat dot com
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 12 Dec 2001 00:48:07 +0100 (CET)
- Subject: Re: Inlining fails on very simple code
- Reply-to: degger at fhm dot edu
On 11 Dec, Alexandre Oliva wrote:
>> Would it be possible to use the bitfield variable behind the
>> TREE_USED macro to count the number of callers or would that already
>> be to late for the treeinliner to work with that data?
> Well, considering that it's a single bit, you wouldn't be able to take
> much information out of it.
I should have been a bit more verbose. Of course I was thinking of using
a complete type instead of the bit to do the counting. From what I've
seen it wouldn't even break existing code...
> Besides, it would only be sensible to use it after you get the whole
> translation unit processed, and GCC emits code for functions *before*
> it gets the whole translation unit.
That's what I feared, thanks for clarifying this.
> Also, sometimes optimizing a function may get you rid of references to
> other functions (think unreachable code), and then, you'd have to
> somehow go back and update the counter, or risk making imperfect
> choices.
So you don't think it's possible to reverse the growth of the counter?
> And what if you have n function calls that you could inline, do you
> try 2^n combinations of inlining, optimize them all the way through to
> assembly code, and then pick the best?
What about a depth limitation here to prevent exponential growth? If a
developer or user really wants superoptimized code she/he could simply
increase that limit, wait half a day and get the best distilled code
gcc can come up with.
> This may indeed get you a compiler that generates fast code, but it's
> going to be unbearably slow.
After all the current inlining code and the heuristic limit has some
severe drawbacks: There's probably no code on earth for which the limit
will be right; set it to low and the compiler will miss important
opportunities to speedup the code, set it to high and you'll have
bloated applications with suboptimal performance due to cache trashing
and number increased memory accesses. And what's even worse: If you have
more then just a pathetic testcase there's no chance to get it right
because modifying the limits will likely improve efficiency there while
deterioriating it in another place.
Still, I have no clue why I cannot get gcc to inline the functions in
the mentioned code since
a) it is declared inline
b) I use -O3 (which enables -finline-functions)
c) I use -Winline and don't get a warning that they couldn't be inlined
d) gcc 2.95.3 does inline it when using -O3
gccs doc says:
"If all calls to a given function are integrated, and the function is
declared @code{static}, then the function is normally not output as
assembler code in its own right." for -finline-functions.
And it also says:
"Defining inline functions (as fast as macros)."
which is simply not true because I cannot force gcc to inline
the function without increasing the general limit which I cannot
afford while with macros (although ugly) I can exactly control
which code I'd like to have expanded where. Maybe we can introduce some
__attribute__ ((i_really_want_this_inline)) to get the desired effect
of finegrained control if the compiler cannot be made clever enough
to even catch simple and obvious win cases.
--
Servus,
Daniel