This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Drop callee function size limits for IPA inlining


> 
> Hm, ok - so you tuned it up to the point where -Winline doesn't show
> any warnings?  Kernel people have started to drop inline decls for some

No, but until the really stupid appearences went away.
There was limited amount of warnings still.  I don't recall how many, but we inlined
stuff like memcpy/memset consistently at least.

> time now.  For the builtin-constant-p thing we can really do better
> by simply looking at call-sites and adjusting our estimates properly.

Yep, it solve one problem.

Consider however small unit containing an application kernel.
I.e.:

function_a() {...}

preformance_critical {... function_a (); ... function_a (); ... function_a (); ...}

function_a might be say 30 instructions, 5 insns inlining benefit.
performance_critical 20 instructions.
It might be absolute a must to inline all apperances of function_a.

After inlining we get something like 20+30*3-5*3 instructions, that is 95
instructions. A 3 fold growth on quite resonable implementation of internal
loop of some algorithm.   I would perfectly expect any inliner to do the
inlining at -O3 ;)

Fractal zomming application XaoS I wrote many years ago is organized
precisely this way.  function_a is computation of particular fractal formula
and then I have several wrappers that execute computation loops where I absolutely
rely on formula being inlined.

Large unit growth limit really plays on non-homogeneity of the application.
Applications almost never consist of huge majority of very many of such a tiny
kernels in a row (well, botan, maybe? :)) So when it is ignored for tiny units
we are safe as large units will get padded by non critical code that won't need
the growth that much.  It is then role of badness computation to try to make
inlining within the kernel to happen.

> > Yes, but it does not apply for poor-man's LTO a-la sqlite for example.
> > Again, it is coding style sensitivity.
> 
> poor-man's LTO?

Concatenating the soruces into huge .c file.

> > Well, lets see how much we can simplify things.  I was always trying to void too much
> > of complexity here, but after more than 8 years for sure quite some has cummulated ;)
> 
> Ideally I'd have a -finline-limit that would work in a way only inlining
> incrementally more functions when you raise it (as in, increase the unit
> growth limit), not cause random different inlining to happen when you
> change it as it does right now because the list of candidates and the
> list of good candidate isn't related ;)

-finline-limit was supposed to be obsoletted and go away in favour of --param
stuff ;) It is lame since it does things like forcing max-inline-insns-single =
max-inline-insns-auto.  It was kept to not break too many makefiles as
-finline-limit was quite common to tune in 2.x series of GCC.

I see point of giving user knob to monotonously increase amount of functions inlined.
I would be happy if we find way to accomplish this but I am willing to trade it
for some other stuff ;)
> 
> Right.  1) is the main problem, and my patch only addresses 1) in a very
> effective way.  It doesn't even try to address the fallout - that we now
> always inline up to 30% unit growth in case the function passes the
> tests in 2).  One could argue though that the tests in 2) need to be
> better - 2) is really about the "usefulness" of inlining.

Well, the problem is that I do not see how one can deal with the fallout without
still having some sort of size limit on inlining. Se we can just keep the existing
params and refine them for sanier behaviour.
Consider i.e.

gigantic_function1() {....}
gigantic_function2() {....}
gigantic_function3() {....}
simple_wrapper () {gigantic_function1();}

If gigantic function is, lets say, 200kb of code, I would argue that we do not
want to regress 30% compile time and code size on this.  I guess this appears
i.e. in our insn-* stuff at -O3.
>  
> > I am also not claiming that we need to apply hard limit on (inlined) function size
> > in all cases if we develop other ways to conclude that the inlining seems good
> > in this particular case.
> 
> We effectively already have a hard limit on (inlined) function size, the
> large function (growth) limits that are useful anyway, if only to limit
> exposal to non-linear algorithms in GCC.  I doubt we need three limits

It won't hit on the above testcase for any size if gigantic function.

> (large-function-insns, max-inline-insns-auto and max-inline-insns-single)
> though - I'd be also interested to see testcases where it is important
> that large-function-insns is 2700, it causes the function growth limit
> to apply very late - we can for example try to merge
> large-function-insns and max-inline-insns-auto into a single hard
> limit and drop max-inline-insns-single.

large-function-insns is not terribly tuned, and 2700 was result of some Steven's
math of my original random value of 1000 from tree-SSA merge. (i.e. computing that
average original tree is 2.7 gimple statements or so).
At time I dropped in 1000 every tree node had constant size of 10.
> 
> Yes, I have patches to separate want-* predicates for early and IPA
> inliner.

OK.
I guess the pracitcal approach should be:
 - reopen pretty-ipa (I welcome hints hot to make pretty-ipa to be current trunk now
   in merge friendly way.  svn cp, or some alternative?)
 - merge cleanups
 - merge improvements that are safe and we agree on being at least in good direction
   - early inliner should not inline functions called once
   - large unit insns might need reducing
   - better benefit analysis driven by knowledge of arguments
   - introduce the edge want predicate

I would say it is sane to aim for more aggressive inline heuristics revamp
(like killing the size limits or, for example, boosting sizes of single basic
block functions for inlining) for about half of stage1.  That time we should
have inliner cleaned up resonable and still have enough time to deal with
fallout to come.

Inliner heuristics issues usually takes a while to appear in bugzilla as people
are notorously lazy to dig into them.

In month or two from now I hope we will also have Mozilla testing (as representative
testcase of huge performance critical C++ blob) and we could probably extend our C++
testsuite to cover LTO and some non-template-heavy testcases.

I also wonder given the polyhedron testcases and dominant fortran coding style
where code size is not an important factor.  Perhaps we could make fortran frontend
to make all functions inline by default?  This might even fly for 4.6 series ;)

Honza

> 
> Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]