This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[PATCH] New c++ inliner


Hi,
here's a patch which changes the C++ AST inliner. There are a number of
changes, and a number of pieces of future work indicated. However, the
high lights are
a) no horrible compile time & memory degradation.
b) produced object code is faster and smaller

This algorithm does a bottom up inlining. What I do is
1) do non-inline optimizations on the current function
2) gather a list of all the inlinable call sites in the current
function
3) fully optimize each of those callees. This includes the
step of inlining into the body of the callee. There is a new
flag for FUNCTION_DECLS indicating whether it has already under
gone this optimization step.
4) Once that is done, I perform the inlining of each callee
into the current function -- provided it still meets the inlinability
criteria (it might have got bigger due to inlining into it's body).
5) If any inlining happened, repeat the non-inlining optimizations.

There are a number of issues with this algorithm.

Q) doesn't this mean there are lots of fully expanded AST's hanging
around for inline functions?
A) Yes, but this does not appear to be a problem. The memory usage is
still lower than it was before, and we have a compile time advantage in
that we only optimize the body of each function once, rather than
multiple times for each of its inlined copies.

Q) What about mutual & direct recursion?
A) Recursion is detected in step 3 by keeping a stack of the functions
being optimized. We check for a loop. If one is found, we have to decide
where to break the recursion i.e. which inlinable function do we _not_
inline. I pick the largest function, as that has the least chance
of being inlinable. [Actually, I should tweak that to highest cost]

Q) what other non-inline optimizations do you do?
A) None, I've marked the place where they should go with XXX, and
commented on the sort of thing I'd expect to go there.

Q) How is the inlining cost measured?
A) The inlining cost is based on the size of the function in terms
of AST nodes. There are two parameters, max-inline-ast and
arg-inline-ast if the size of the function is < max-inline-ast +
arg-inline-ast * num_args, we inline it.

Q) Why is there no threshold on the size of the function being inlined
into?
A) This is irrelevant. If a function is worth inlining, it's worth
inlining into a small function just as much as into a big function. N.B.
I tried with such a limit, and it had no significant effect. The code to
do this kind of stuff is still in the patch, but maybe I should remove
it. (the qsort). Also I discovered something that was surprising, but in
hindsight should be expected. If the inlining parameters are correct the
object code gets faster AND smaller.

Q) What about profile information?
A) I had thought this would be important, but see the previous answer
as to why it isn't. The usage information machinery is still in the patch,
but I think perhaps I should remove it (it currently estimates '1' for
any node, anyway).

Q) What about inlining static functions with only one caller?
A) I still do inlining on demand - there is no postponement until the
end of the TU, and therefore no complete call graph information. We
do generate (implicitly) the call graph routed at a particular function,
but not that for the whole TU. So I can answer the question 'what
functions does this function call?', but not 'what functions call this
function?'. So I don't have the information to make that decision.

Q) Doesn't DECL_NUM_STMTS count the number of statements in a
function, not the number of tree nodes.
A) Yes it does, we need to change that. Also, we'll need to recalculate
it when other AST optimizations start mangling the tree.

I've set max-inline-ast to 50 and ast-inline-ast to 8, based on
experiment. I'll present that data next week, as I've not got time to do
it with pretty 3D graphs, before end of play today [this weekend is the
building's electricity supply's maintainance, and all power is off].

These parameters will probably need retuning, once other AST
optimizations go in. I tried other cost functions, these were
1) allow inlining of a function if it is no bigger than X% of the
original caller's size.
2) allow inlining of a function if it will not increase the caller's
size to more than X% of it's original size. Pick smaller functions first.

I also experimented with a zero cost threshold -- functions smaller than
a certain size were presumed to have zero cost. I rejected this as a stop
gap pending other ast optimizations.

Gerald's horrible test case previously required 270MB of core and took
1133 seconds on my machine (192MB memory, so went into swap). Now it
takes about 102MB and 80 seconds at -O3. At -O2 it took about
80MB and 75 seconds. [These numbers were gathered with untuned parameters
that gave more inlining than turned out to be desirable -- tuned
parameters should give better results.]

booted and tested on i686-pc-linux-gnu

nathan

-- 
Dr Nathan Sidwell   ::   http://www.codesourcery.com   ::   CodeSourcery LLC
         'But that's a lie.' - 'Yes it is. What's your point?'
nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org

optimize4.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]