[RFA] Kill artificial inlining limit

Steven Bosscher s.bosscher@student.tudelft.nl
Tue May 13 14:51:00 GMT 2003


Op di 13-05-2003, om 12:06 schreef Richard Guenther:
> In optimization/10679 I report that --param min-inline-insns is not
> honoured all the time due to some strange code in tree-inline.c, so
> I propose to kill the "doesnt happen in real code" artificial inlining
> limit to expose another 30% performance increase in POOMA based scientific
> applications.
> 
> See also thread@http://gcc.gnu.org/ml/gcc/2003-05/msg00654.html
> 
> Also Gerald Pfeifer confirmed that the removed codepath does not trigger
> in his codebase and such the patch does not make any difference for him.
> 
> Patch is against HEAD, but applies to 3.3, too.
> 
> Thoughts? I'd apply it in HEAD and see, if people start complaining.

Please, no.

GCC inlining substitution is not good enough, I think everybody agrees
with that by now.  But the solution is _not_ to force more inlining. 
128*MAX_INLINE_INSNS with the current defaults is 76800 insns.  That
really should be enough for most functions even after inlining.  Instead
of pushing this limit up, we should improve our inlining heuristics.

I have said this before, but apparently you missed it:


The _front_ends_ need fixing before we can start thinking about fixing
the middle end (and of course in particular tree-inline.c).


In the mailing list thread you mention, I showed you (per your request)
that the obviously simple method foo in this template,

template <bool f>
struct blah {
	void foo() { if (f) { return; } }
};

represents 130 insns according to our current heuristics.  As a
consequence, this function is always inlined, but only just, because 130
happens to be MIN_INLINE_INSNS.  Add another statement:

template <bool f>
struct blah {
	void foo() { if (f) { return; } else ; }
};

et voila! 140 insns according to tree-inline.

So after, inlining it five times, you have 5*140>600==MAX_INLINE_INSNS,
and we start throttling; then you can easily compute that you can inline
this insignificant little function 35 times at most before the throttle
stops.  Only functions <= MIN_INLINE_INSNS will be inlined after that,
and 140>MIN_INLINE_INSNS.  In other words, this almost trivial function
will be inlined 40 times at most!!! (assuming nothing else is inlined).

Now say that g++ would (less incorrectly) think there are 20 insns (that
is, two statements, ignoring the empty else) in this function.  Then all
of a sudden, you can inline this very same function _7_ times at the
same cost of 140 insns!  Better yet, it's smaller than MIN_INLINE_INSNS
so it will always be inlined, even after throttling.

And _that_ (I think) is the source of this problem you have.
Your real problem is that C++ just overestimates the predicted number of
insns for this function.  I don't know what it looks like for "normal"
classes, but obviously, for this template it is not just wrong, it's way
off.


Another reason why I oppose this patch is that without this hard limit,
compile time for some targets _will_ blow up.  It's good for you and
Gerald that the code for PR8361 is not affected by your patch.  But what
about PR10160 on HP-PA, or any other target where the compile time with
lots of inlining is apparently dominated by the scheduler's quadratic
behavior?  


So once again, please see how the front end can be fixed _before_ we
start hacking tree-inline with patches like this.

One idea I had in mind for the inliner is for tree-ssa.  Because GIMPLE
is almost flat, we can predict the insns for a function body with far
greater accuracy.  A simple walk_tree over a function body to
guesstimate the number of insns per _EXPR node would do, and it could be
completely independent of the front ends (such that even front ends that
produce GENERIC with lots of optimization opportunities (like g95) would
benefit from the tree inliner).  The only trick is to get reasonable
estimates for the cost in insns for each _EXPR node.  On GIMPLE,
walk_tree is fast enough, and if the number of estimated insns grows
unreasonably large, you can cut off the tree walk.

I haven't looked at tree-ssa for some time now ("cvs up" always times
out...) but I suppose we could rearrange things such that the count is
done after the tree optimizers are finished, with obvious benefits. 
Diego?

(In theory we could even look for function parameters that appear in
COND_EXPRs, compute how many insns depend on that conditional, and
evaluate the condition with the non-destructive fold-const so that we
wouldn't take into account code that is obviously dead after inlining. 
And taking profile information into account with unit-at-a-time... Oh so
bright is the future... :-)

Greetz
Steven


P.S.
Another problem is of course that you want way too much inlining :-P 
Seriously though, the SPEC tests that Andreas and I have done clearly
show that the extreme inline limits your application needs are unusual.




More information about the Gcc-patches mailing list