This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 3.0 -finline-limit and c++
- To: mike stump <mrs at windriver dot com>
- Subject: Re: 3.0 -finline-limit and c++
- From: Teemu Torma <teemu at torma dot org>
- Date: 03 Jun 2001 22:49:24 +0200
- Cc: gcc at gcc dot gnu dot org
On 01 Jun 2001 23:30:24 -0700, mike stump wrote:
> For a recap of the original problem, lets say that a user has
> availible hundreds of thousands of really small building blocks, and
> uses them to build things. These building blocks use the same
> hundreds of thousands of building blocks to implement their
> functionality, at the core are really simple things int += int, and
> that's it. All the building blocks are exposed to the compiler as
> inline functions. The idea is that you want to inline the smallest
> core blocks to achieve a larger intermediate concrete block and have
> the code use that to run. This way, scheduling, register allocation
> and so on all have a chance to squeeze the most out of the CPU. The
> problem is that after a certain point, you kill the instruction cache,
> the memory bandwidth fetching instructions and so on beyond the point
> at which the efficiencies gained by the elimination of the call/return
> overhead, instruction scheduling, register allocation and so on can
> recover.
I agree. In many cases the inline function is so simple, that inlining
takes the same space or less than the function call itself would. That
is exactly why I don't like the idea to stop inlining completely at some
point, whilst bigger functions were inlined before.
> The question is how to best achieve selection of which routines to
> inline, and where, to achieve as optimal performance as we can?
>
> The tentative answer we have today, is to artificially pick an n, and
> just use the heuristic that we inline as long as the body we are
> generating into is less than n instructions long. I know this seem
> backwards are wrong from the normal C perspective, but it isn't, it is
> merely orthogonal and complementary. When you run the algorithm in
> your head with the testcase above, you will notice a certain, oh, that
> kinda works just nicely and even rather optimally in some cases and
> covers the complete solution space rather nicely.
Maybe currently the biggest problem is that _nothing_ is inlined. May I
suggest that we inline the functions that are very trivial, like simple
get-methods etc, where the code would be even smaller than the function
call itself. I would also appreciate if the current -finline-limit
would not drive both the size of the inlinable function and the context,
but only the size of inlinable, like in C front-end, and have a separate
option for the context size. That way everyone would at least drive the
inlining behavior if they encounter code generation or compiler
size/time problems.
> Hope this helps explain the issue. If someone wants to extend the
> document (or source code) out with the reason why... Better, if
> someone wants to think about the problem and submit a solution that is
> monotonically better than what we have and is at least as general...
I would think that bottom-up inlining, and starting the inlining from
smallest functions would do better job. Even better, if we could
somewhat optimize the code before inlining it to get more accurate idea
of the size and get dead code eliminated. I have no idea how feasible
this is, and whether it would make the compiler too slow.
Teemu