This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [Patch] C++ inlining heuristics changed (3.0.1)

On Wed, Aug 22, 2001 at 08:48:55AM -0400, Daniel Berlin wrote:
> Kurt Garloff <> writes:
> > Hi,
> >
> > thanks for confiming my suspicion, Daniel, that the leaves of the inlining
> > tree will get cut off inlining instead of the root by the C++ tree recursive
> > inlining limit. How stupid! The least you want is to not inline the
> > iterators at the end of your call chain!
> Right. This is the problem with inlining top-down, as the current tree
> inliner does.


> You have to be very good at picking which functions to inline, or else
> you'll hit your limit too early.

I thought that I used the inline keyword carefully enough.
At least -finline-functions does decrease performance.

> > Things for discussion:
> > * One could think of aggravating the limit (c) when we're far beyond (a) on
> >   recursive inlining.
> > * The exact numbers (a), (b), (c) are open for discussion.  
> c is usually set to be the cost of the save/restore of the arguments, but that's
> hard to calculate since we are dealing with trees here (it turns
> out 10 insns per statement is a little low).  Even then, they only
> take into account the fact that arguments need code to save/restore,
> *not* whether the cost to save/restore is more than the cost to inline
> the function, removing the stores/loads.  
> After all, memory latency is what kills us a lot of the time (missing
> a store in a loop or something).
> So we are probably better off with something more like what you've done than trying to
> artificially calculate the argument setup code *length*.

Then, I'd suggest to do somehting more smoothly;
Start with some limit (e.g. half of inline-limit)
As soon as we reach the recursive limit (== -param inline-limit), we start
decreasing the size of acceptable functions to be inlined until we reach the
setup code length (maybe increased slightly to account for the save/rest.

 ^ (Max size of inlineable fn)
 |----------------.                             <-- (b)
 |                 .
 |                  .
 |                   .  slope =: (d)
 |                    .
 |                     .
 |                      .
 |                       ---------------------  <-- (c)
                                    (Size of inlined code:
		 ^		     Already inlined + current)

This way, we would avoid the theoretical infinite inlining.
I'll start to play with this a bit.
I guess, a good starting value for (a) is 600, for (b) 300 and I'd choose
the slope (d) to be 0.1, i.e. we will reach the inlining limit just before
3300 and only inline extremely small functions afterwards.

Does this make sense?
Would a patch like this be acceptable for 3.0.2?

> Nathan already did bottom up inlining, it's on the ast optimizer
> branch.

Now, that would be the real solution.
As anyone knows: heuristics break from time to time.

> > It works amazingly well on my test program: With -finline-limit=600 I get
> > more than 90% of the peak performance.
> Of course.

I get 99% actually.

> You don't need to inline a large amount of functions, or
> large functions,  you just need to inline the right ones.
> With 10000, we are just generating tons of excess trees, that
> show off how inefficient some other parts of the compiler
> (particularly, garbage collection) are in terms of
> compile time, and bloat the heck out of the code, destroying it's
> performance (due to cache effects).


> > Compile time and result statistics to follow in a subsequent mail.

OK, find statistics of the patch I already sent 
(not the one I outlined above) here.

Please compare to the numbers reported by myself in

inline   compile     size      size       spec      spec
limit    time(u)   double      cplx     double      cplx

  200     1:38      71764     80365      0.814     0.517
  400     2:07      82612     89308      0.776     0.562
  600     2:41      88468    101444      0.870     0.846
 1000     3:18      87788    103412      0.869     0.844
 2000     3:49      89148    110732      0.873     0.847
 5000     3:58      89148    109852      0.872     0.845
10000     3:55      89148    109852      0.872     0.846

Just for comparison:
2.95.3    3:35      88364    123636      0.912     0.846
3.0.0     3:48      89488    109872      0.873     0.800
3.0.1     2:50      95620        -       0.320         -

The results look very nice, IMHO.

If you look at the mail cited above: The 2.95.3 double result seems to vary
significantly ... (I got 0.869 last time.) 
The rest of the runs however does not. I did not test plain 3.0.1
again, because I don't have an unpatched copy of 3.0.1 installed right now.

Kurt Garloff                   <>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)

PGP signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]