This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Memory footprint/compile time explosion caused by the tree inliner
- From: Mike Stump <mrs at apple dot com>
- To: Eric Botcazou <ebotcazou at libertysurf dot fr>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 17 Apr 2003 16:38:54 -0700
- Subject: Re: Memory footprint/compile time explosion caused by the tree inliner
On Thursday, April 17, 2003, at 03:15 PM, Eric Botcazou wrote:
PR optimization/10160 reports a huge compile time regression (4h30min
vs a
few minutes) on the 3.3 branch wrt the 3.2 branch for LyX/QT on Sparc.
We fix this, then break it, then fix it, then break it. :-( We are
doing something wrong.
Let me explain, in C++, there are no non-inlined functions. In C++,
there are no function bigger than 5 lines. Now, choose your inlining
strategy.
Hint, inlining all function under 6 lines doesn't work. Hint, inlining
all functions that are inline doesn't work.
Possible solutions, don't allow more than 15x growth in caller, don't
allow caller to grow past 1,000 stmts.
The
time is mostly (85%) spent in the scheduler but the regression is
caused by
the tree inliner.
It would be nice if the scheduler works with very large functions. It
is ok to miss some of the edges, as long as the creamy center is nice.
For example, one constructor grows from 128 stmts to 15437 stmts
(154370
insns according to the fixed conversion factor), that is more than 100
times. Given that the scheduler is at least O(n^2), the game is over.
The problem is the new heuristics of the tree inliner:
int sum_insns = (id ? id->inlined_stmts : 0) * INSNS_PER_STMT
+ currfn_insns;
/* In the extreme case that we have exceeded the recursive
inlining
limit by a huge factor (128), we just say no. Should not
happen
in real life. */
This comment it wrong. The comment should read, this happens all the
time in C++.
if (sum_insns > MAX_INLINE_INSNS * 128)
Suggestion:
if (sum_insns > MAX_INLINE_INSNS * 128
|| sum_insns > 15 * (currfn_insn - (id ? id->inlined_stmts : 0)))
Better, make the 15 a parameter. Try values 2-20 on real C++ code,
benchmark the benefit v compile time, set value near knee. I think the
relative cap is better than an absolute cap, as large bodied functions
(aka C), the limit will be higher, and in small bodied functions, the
limit will be lower.
:-(
Thanks for tacking this down, wanna try testing out the code above and
find a reasonable constant and submit it?