This is the mail archive of the
mailing list for the GCC project.
Re: [Patch] C++ inlining heuristics changed (3.0.1)
- To: Kurt Garloff <kurt at garloff dot de>
- Subject: Re: [Patch] C++ inlining heuristics changed (3.0.1)
- From: Daniel Berlin <dan at cgsoftware dot com>
- Date: Wed, 22 Aug 2001 08:48:55 -0400
- Cc: gcc at gcc dot gnu dot org, Gerald Pfeifer <pfeifer at dbai dot tuwien dot ac dot at>
- References: <20010822141829.C8602@gum01m.etpnet.phys.tue.nl>
Kurt Garloff <email@example.com> writes:
> thanks for confiming my suspicion, Daniel, that the leaves of the inlining
> tree will get cut off inlining instead of the root by the C++ tree recursive
> inlining limit. How stupid! The least you want is to not inline the
> iterators at the end of your call chain!
Right. This is the problem with inlining top-down, as the current tree
You have to be very good at picking which functions to inline, or else
you'll hit your limit too early.
> Here's a band-aid:
> Use 3 limits:
> (a) The limit given by max-insns
> (b) Half of it for single-fn inlining
> (c) A quarter of it for recursive stuff above limit
> Now, here's what g++ does after applying the patch:
> * Single functions larger than (b) do not get inlined
> * Once we exceed (a) through recursive inlining, only
> single functions smaller the (c) will get inlined.
> This will prevent cutting the small functions at the end (the leaves) from
> being inlined if they are small enough.
> Patch against 3.0.1 for cp/optimize.c attached.
> Things for discussion:
> * One could think of aggravating the limit (c) when we're far beyond (a) on
> recursive inlining.
> * The exact numbers (a), (b), (c) are open for discussion.
c is usually set to be the cost of the save/restore of the arguments, but that's
hard to calculate since we are dealing with trees here (it turns
out 10 insns per statement is a little low). Even then, they only
take into account the fact that arguments need code to save/restore,
*not* whether the cost to save/restore is more than the cost to inline
the function, removing the stores/loads.
After all, memory latency is what kills us a lot of the time (missing
a store in a loop or something).
So we are probably better off with something more like what you've done than trying to
artificially calculate the argument setup code *length*.
> * The reason why I want (b) smaller than (a) is that I want to improve the
> chance to cut early. (First tests indicate that this could be changed to
> 3/4 or just left out with not much effect on runtime performance.)
> * Who will implement inlining that does cut the trunk instead of the leaves?
> (which is the real solution IMHO)
Nathan already did bottom up inlining, it's on the ast optimizer
> It works amazingly well on my test program: With -finline-limit=600 I get
> more than 90% of the peak performance.
You don't need to inline a large amount of functions, or
large functions, you just need to inline the right ones.
With 10000, we are just generating tons of excess trees, that
show off how inefficient some other parts of the compiler
(particularly, garbage collection) are in terms of
compile time, and bloat the heck out of the code, destroying it's
performance (due to cache effects).
> Compile time and result statistics to follow in a subsequent mail.
> Gerald, could you test as well?
> Kurt Garloff <firstname.lastname@example.org> [Eindhoven, NL]
> Physics: Plasma simulations <K.Garloff@Phys.TUE.NL> [TU Eindhoven, NL]
> Linux: SCSI, Security <email@example.com> [SuSE Nuernberg, DE]
> (See mail header or public key servers for PGP2 and GPG public keys.)
"When I was eight, I played Little League. I was on first; I
stole third; I went straight across. Earlier that week, I
learned that the shortest distance between two points was a
direct line. I took advantage of that knowledge.