This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC/patch] Callgraph based inlining heuristics
- From: Jan Hubicka <jh at suse dot cz>
- To: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- Cc: Jan Hubicka <jh at suse dot cz>,Steven Bosscher <s dot bosscher at student dot tudelft dot nl>, gcc at gcc dot gnu dot org
- Date: Mon, 23 Jun 2003 17:55:50 +0200
- Subject: Re: [RFC/patch] Callgraph based inlining heuristics
- References: <20030623153413.GF2327@kam.mff.cuni.cz> <Pine.LNX.4.44.0306231743441.582-100000@goofy>
> On Mon, 23 Jun 2003, Jan Hubicka wrote:
>
> > > >very much from cse/gcse if you remove all calls inside a loop by
> > > >leafifying it for my scientific C++ application (POOMA based). Also
> > > >the loop optimizer could probably do better in this case (it doesnt, but
> > > >thats another problem). And I never want the compiler to do so much
> > > >inlining without telling it explicitly (of course some profile feedback
> > > >on the resulting asm code speed/size would really cut it).
> > > >
> > > That is what we should be aiming for instead: Find a way to include some
> > > kind of profile information to guide inlining. What I would really like
> >
> > I am aiming towards this with the current cfg code cleanups. I hope to
> > be able to preserve CFG over post-treeSSA branch RTL expander and be
> > able to read profile in much easier than we do right now. Then we can
> > propagate the frequencies into the callgraph.
> >
> > How much of inlining do you actually need in your application? Perhaps
> > improving the heuristics further will help it to get it without
> > additional hints?
>
> The code I have is about
>
> void kernel(Array1, Array2, ...) {
> ..setup..
> for (int k=...)
> for (int j=...)
> for (int i=...)
> very_large_function(Array1, Array2, Array3,... , i, j, k);
> }
>
> and very_large_function does a _lot_ of function calls that need to be
> inlined due to C++ abstraction (these are small functions where we should
> be able to get these inlined with sane heuristics -- I currently hit the
> artificial total inlining limit here (see PR10679)!!). For the loop
> optimizer (and cse/gcse!) to do its work, we need to inline very_large_function. As you
> may expect there is a lot of index arithmetic to be optimized (and even
> with inlining it doesnt look as good as it could).
Hmm, can you please send me preprocessed file of this thing? I would be
interested how my curent code behaves on this.
I've killed the current total limit (and introduced different :) so
perhaps it can work now.
Honza
>
> So here, specializing the inliner for inlining a call in a loop consisting
> only of a call would cut it. But this is another speciality, just as
> __attribute__((leafify)).
>
> Richard.
>