This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: mainline "exploding"

> So, the problem isn't unit-at-a-time -- it's our bad data structures,
> and our inlining heuristics.  That said, if our data structures or
> heuristics are truly horrible, we might have to hold off on
> unit-at-a-time until they are fixed.

The inlining heruistics has simple parameter limiting overall
compilation unit growth that should keep it under control for all cases.

Concerning our IL, about 20MB of our GCC backend sources parse into
700MB of memory....

note that I am not shooting for no way to avoid -funit-at-a-time for now
as I do believe we should give users some time to adopt it.  I just
would like to see it on by default as it helps us hide the fact that old
inlining herusitics is broken.  Users having gigantic noninlinable
function units or users using dirty assembly tricks will be able to use
-fno-unit-at-a-time for now and will have some motivation to fix their

The fact that the C++ testcases in bugzilla works for old inline setting
of 100 and does not work for 150 is simply because I lowered the limit
intentionally to make these testcases go.  There are other sane
testcases (POOMA) where we do explode even with 100, so trottling limit
is not sollution.  
> Also, the way that unit-at-a-time *should* work (although I don't know
> if this is how it *does* work) is:
> (1) Build trees for every function, doing no optimizations other than
> constant-folding, etc. as done in the front end.
> (2) Pick a function that needs to be output.  Inline into it, optimize
> it, output it, and throw the body away -- unless it will itself be
> inlined into something else.  (Ideally, the function picked would be one
> that is not going to be inlined into something else; that will throw
> away stuff as quickly as possible.)
> (3) Repeat (2) until done.

It is bit more complex than this.  It works in a way:

1) build trees with no optimization and enqueue functions that obviously
needs to be output
in finalize_compilation_unit:
2) pick function that needs to be output, walk and possibly lower it's
body, figure what ohter it needs and enqueue
3) repreat 2) until done.
4) release all unneeded functions
in cgraph_optimize:
5) figure out local functions (functions never called indirectly or
visible globally
6) make inlining decisions
7) sort functions topologically
8) compile all of them doing inlining and optimization

The idea is that in between 4) and 5) the function body can be stored to
disk allowing whole program optimization.
Once we are able to more sanely optimize the functions as trees, we can
first do optimization step at 8), re-doing reachability analysis at 9)
and output everything at 10) but there is no way to do so currently with
preserving the constraint that functions are topologically sorted that
is quite important to propagate various optimization information across
the callgraph.

> -- 
> Mark Mitchell <>
> CodeSourcery, LLC

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]