[PATCH] Drop callee function size limits for IPA inlining

Richard Guenther rguenther@suse.de
Thu Feb 17 11:37:00 GMT 2011


On Thu, 17 Feb 2011, Jan Hubicka wrote:

> > On Wed, 16 Feb 2011, Jack Howarth wrote:
> > 
> > > On Wed, Feb 16, 2011 at 08:39:53PM +0100, Richard Guenther wrote:
> > > > On Wed, 16 Feb 2011, Dominique Dhumieres wrote:
> > > > 
> > > > > Richard,
> > > > > 
> > > > > The patch seems to fix pr45810 (at least most of it) without visible
> > > > > degradation of the other polyhedron tests. Is it really to late to
> > > > > apply it for 4.6? Would it be possible to test it on SPEC?
> > > > 
> > > > Heh, definitely way too late for 4.6!  I'm testing it tonight on
> > > > our usual benchmarks (including SPEC).  We also do not have reached
> > > > conclusion on whether the patch is a good idea.
> > > 
> > > Richard,
> > >    If it is eventually found to be the correct fix, might this be backported for
> > > gcc 4.6.1?
> > 
> > It isn't a "fix" it is a pretty substantial rework of how our IPA
> > inlining heuristics work.
> 
> I just checked tonight results.  The patch seems to cause 10% SPECfp code size
> growth and 20% SPECint with relatively little benefit (about 0.3% at most).  This is
> more aggressive code size/speed tradeoff that we did ever before more than doubling
> -O2 to -O3 code size gap.
> 
> We have improvements in polyhedron, applu, gzip (here I am convinced it is side effect
> of code layout, not benefit of inlining - I analyzed that problem previously), vpr.
> For some reason c-ray did not improve, but I can imagine it should. Perhaps wrong flags
> are used?

The improvement is from 20s runtime to 9s runtime, probably the
result wasn't ready yet when you looked.

> Wave benchmark gets smaller and has no slowdown. There is performance regression in botan,
> but it gets smaller.
> 
> We are still waiting for LTO spec2k6 results that will be IMO interesting based
> on discusion bellow.

Indeed.

> My overall opinion on this is that inliner should not blindly inline when it has no
> idea inlining is profitable until overall program growths even at -O3. Doing this would
> result in making -O3 even more benchmark centric.
> 
> What I see as main problem with this approach is that it will result in problems
> in tunning overal unit growth.  This parameter is problematic in the following way:

Heh, it was a radical approach to try to solve the C-Ray issue which
I don't see how we can do anything reasonable with more analysis (we'd
really need to see that we expose CSE opportunities, and even then we're
not going to get to the 40 instruction limit, even when considering 
savings).

>   1) programs written in "kernel" style push it up. If you have program split into
>      many tiny units, to get good code quality some units has to expand a lot, while
>      other units not at all

Do the units really have to expand more than 30%?  I really doubt so.
I think we also really need FRE in early opts.

>   2) programs written with large C++ abstraction push it up as our size after inlining
>      estimates are unrealistic.  The estimated resulting program size is much bigger
>      than what we get in the final binary because inlining enables a lot of additional
>      optimizations
>   3) LTO push the limits down.  When you see whole program you have no problem described
>      in 1).  On the other hand number of your inline candidates explode as you can
>      do crossmodule inlining.  inlining them all leads to excessive code size growths.
> 
>      Moreover LTO behaviour is different with -fwhole-program and not.
>      -fwhole-program allows a lot smaller unit growht as many of offline copies of
>      functions can be optimized out.

Well, with -fuse-linker-plugin being the sort-of default now
-fwhole-program is in effect by default.

> I think there is no way to solve 1) with this approach. The proposed patch bumps down
> large unit size that will result in problem with kernel style codebases.

The large unit size basically allows "small" units to grow unbounded if
enough candidates in the given bounds (40 insns for non-inline declared
and 400(!) for inline declared functions).  I can only guess that
this allows extra growth for small units that use a lot of inline
declared functions.  But even then I'd be curious to see unit growths
we allow for kernel units with -O2, can we have a set of worst offender
files for the random tester?

> It seems to me that inliner should inline when it has reason to believe that code
> will improve noticeably. At present we are very simple on estimating this improvements
> and our only guide is that if function is small, inlining is probably good idea.
> 
> To handle the cases like cray or polyhedron, we really need other kinds of analysis
> to contribute to selection of what inlining is profitable.  One of easiest bits is
> to make analysis on how much function body simplify when operands are known. Martin
> has code for that and it will help in some of the testcases above.

I don't think going down that route will help.  At least looking at
C-Ray it won't (ok, it's stupidly written benchmark with the critical
function both exported and not declared inline).

> There are number of other indicators of inline profitablitity.  I am bit
> hestiant to add too many of them as we will result in difficult to maintain
> inliner, but we probably can't avoid implementing some of important ones.

I'd like us to simplify the current design first, it is basically 
impossible to do reasonable analysis with the current heuristics and
hard limits.  We seem to have many limits that are closely related
but interact in very weird ways - not to even start thinking about
users that want to get more inlining ...

> I know that to solve polyhedron like problems, Open64 uses loop nest analysis
> info and I think ICC does that too. Obviously ICC is a lot less aggressive on
> i.e. inlining functions called once, but more aggressive in inlining within
> loopy regions.

You already do that, sort of, inside the edge badness computation.  But
of course the hard limits completely ignore the badness ;)

> With more analysis, we will need to refine what inline candidate is.  At first, inline
> candidate should be function that is small after inlined (not small before inlining),
> but we should also consider inlining regardless on function size when we know it will
> help i.e. from LNO analysis.

Well, all functions that can be inlined should be candidates.  What we
end up inlining might be not all of them, of course.  Currently we
don't even consider most functions, regardless of how profitable inlining
them would be.

Richard.



More information about the Gcc-patches mailing list