[PATCH] Drop callee function size limits for IPA inlining

Fri Feb 18 21:14:00 GMT 2011

> Hi,
> 
> On Fri, 18 Feb 2011, Jan Hubicka wrote:
> 
> > We have c-ray with relatively large raysphere (polyhedron's fatigue is 
> > the same case as analyzed in one of the PRs).  We declare decision on 
> > whether to inline it to be global property. We set growth to 30% that is 
> > safely big enough to make raysphere inlined on c-ray.
> > 
> > However now if you make c-ray not a stupid benchmark, but part of big 
> > program, then raysphere won't be inlined with LTO since the other useful 
> > calls from big program leading to smaller callees will win.
> 
> Um, yes of course.  But what you're basically saying is: "with 
> the current heuristics a different program than what the experiment is 
> about will not work as designed".  Well, that's to be expected.

Same program (locally), just padded with extra code around.  Think of linking
c-ray into GCC and replacing main() by conditional dispatch to cray or gcc.

This is bit funny, but not really detached from reality.  When you have huge
app, there is usually small kernel you worry about padded by tons of
uniteresting code. So I think you solved the benchmark case, but not real world
scenario, really.

Movie codec linked as part of Mozilla is quite well this case.
> 
> _Of course_ the definition of badness and usefullness must be changed too 
> for the removal of pruning to not have undesired effects.  I'm really not 
> sure what you're getting at, you're pointing at a clearly unfinished 
> experiment and argue that this approach can't work for reasons in the 
> current implementation of heuristics.  /me confused
> 
> > Badness is more or less function of the callee size with few extra local 
> > hints based on caller side so it is quite safe to assume that the 
> > fibheap queue is ordered by size of the function and we cut of much 
> > earlier than the current default of 40 instructions in size.
> 
> Yes, that is the case currently.  But you're argueing as if this has to be 
> the case forever.  The whole point is to also introduce different 
> definitions of badness and usefullness.  For instance taking into account 
> the actual parameters and their influence on the (then inlined) body.

Well, we agree that we need different notions of usefulness.  I am just trying
to argue that having local way of saying "we do not want to inline this call
because expected runtime benefits are very small" is very desriable and inliner
should be oriented towards it.

In addition to fact that we want prevent code growth at cold calls (I think we
all agree on this one), I still think that estimated function size (after it is
inlined to particular context, not before inlining as we do now) is good notion
unless we have some other hint overwritting it (like we know that the function
will combine partiuclarly well with surrounding code. Open64 has the LNO logic,
most compilers highly preffer single BB functions as they expect scheduling to
win).  Perhaps your idea of badness+usefulness refinement is pretty much the
same after all? Or what actual plans do you have here?

If we relate all local properties to global program growth, we will run into
problems that same code being part of larger program will become slower (or
faster, but the general observation seems to be that bigger program is the more
inlining possibilities are there with LTO).

This is IMO very non-intuitive to users: you will see more often LTOed programs
getting slower than non-LTOed, users will be suprised that adding more
uninteresting code to completely different part of application makes their
compitation kernel slower, etc.  It will also make testcases harder to reduce
and turn into stand alone benchmarks.

I am aware that this happens to large degree with current inliner and LTO on
very large apps (where we really are bound by program growth alone), but i see
this as a problem rather than feature.  

Some inliners (like LLVM I blieve) are purely local, our inliner is mixture of
local and global properties.  I will look more into open64 and HP, but I think
they are sort of similar - they try to be local but have global growth boundary.

Honza