This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC/patch] Callgraph based inlining heuristics


> >very much from cse/gcse if you remove all calls inside a loop by
> >leafifying it for my scientific C++ application (POOMA based). Also
> >the loop optimizer could probably do better in this case (it doesnt, but
> >thats another problem). And I never want the compiler to do so much
> >inlining without telling it explicitly (of course some profile feedback
> >on the resulting asm code speed/size would really cut it).
> >
> That is what we should be aiming for instead: Find a way to include some 
> kind of profile information to guide inlining.  What I would really like 

I am aiming towards this with the current cfg code cleanups.  I hope to
be able to preserve CFG over post-treeSSA branch RTL expander and be
able to read profile in much easier than we do right now.  Then we can
propagate the frequencies into the callgraph.

How much of inlining do you actually need in your application?   Perhaps
improving the heuristics further will help it to get it without
additional hints?

> to see is that in addition to callgraph based inlining (unit-at-a-time), 
> we could also decide to expand calls inline later on in the compilation 
> process when we discover that the call is in a hot zone of the code. The 
> problem with this is that it will require the availability of a tree CFG 
> (ie. tree-ssa) and of profile information in the tree CFG.  I don't know 
> if this is feasible at all because we still don't maintain the CFG 
> across all passes, let alone when expanding trees to RTL, and I don't 
> have a clue about how GCC collects and loads profile information.  But 
> IIRC Honza was doing some work in that area as well?
Yes :)
The RTL side of the barrier is getting to be finished - we kill CFG in
loop optimizer Zdenek is working on, sibling call pass that will go away
with tree-SSA and some of exception handling that can be reorganized
without too much of trouble.

The question is whether the idea of preserving CFG over RTL expansion
will work.  I hope it will if we do some kind of lowering at tree level
so we will end up with conditional jumps/tample jumps like in RTL.
Expanding these would be trivial then and we already have code to
discover newly introduced basic block by expanding simple operations
that needs branches on target machine.

Honza
> 
> Gr.
> Steven
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]