This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC/patch] Callgraph based inlining heuristics
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: Jan Hubicka <jh at suse dot cz>
- Cc: Steven Bosscher <s dot bosscher at student dot tudelft dot nl>, Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>, <gcc at gcc dot gnu dot org>
- Date: Mon, 23 Jun 2003 17:51:01 +0200 (CEST)
- Subject: Re: [RFC/patch] Callgraph based inlining heuristics
On Mon, 23 Jun 2003, Jan Hubicka wrote:
> > >very much from cse/gcse if you remove all calls inside a loop by
> > >leafifying it for my scientific C++ application (POOMA based). Also
> > >the loop optimizer could probably do better in this case (it doesnt, but
> > >thats another problem). And I never want the compiler to do so much
> > >inlining without telling it explicitly (of course some profile feedback
> > >on the resulting asm code speed/size would really cut it).
> > >
> > That is what we should be aiming for instead: Find a way to include some
> > kind of profile information to guide inlining. What I would really like
>
> I am aiming towards this with the current cfg code cleanups. I hope to
> be able to preserve CFG over post-treeSSA branch RTL expander and be
> able to read profile in much easier than we do right now. Then we can
> propagate the frequencies into the callgraph.
>
> How much of inlining do you actually need in your application? Perhaps
> improving the heuristics further will help it to get it without
> additional hints?
The code I have is about
void kernel(Array1, Array2, ...) {
..setup..
for (int k=...)
for (int j=...)
for (int i=...)
very_large_function(Array1, Array2, Array3,... , i, j, k);
}
and very_large_function does a _lot_ of function calls that need to be
inlined due to C++ abstraction (these are small functions where we should
be able to get these inlined with sane heuristics -- I currently hit the
artificial total inlining limit here (see PR10679)!!). For the loop
optimizer (and cse/gcse!) to do its work, we need to inline very_large_function. As you
may expect there is a lot of index arithmetic to be optimized (and even
with inlining it doesnt look as good as it could).
So here, specializing the inliner for inlining a call in a loop consisting
only of a call would cut it. But this is another speciality, just as
__attribute__((leafify)).
Richard.