This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: inliner in gcc-3.1
- From: Gerald Pfeifer <pfeifer at dbai dot tuwien dot ac dot at>
- To: Kurt Garloff <garloff at suse dot de>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 25 Apr 2002 09:18:24 +0200 (CEST)
- Subject: Re: inliner in gcc-3.1
On Wed, 24 Apr 2002, Kurt Garloff wrote:
> I was browsing the gcc ML archives (I'm not subscribed) and found
> that the inliner may still not be tuned optimally in gcc-3.1.
> [...]
> * I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23)
> and it still works ...
> I do believe it's somewhat saner than v1, but the benefits are
> actually small. (up to 3% in some benchmarks, 0 in others, no
> pessimizations found).
Bad news: This patch increases compilation time (for DLV, the package
I've been using to test performance) quite a bit:
2.95.3 4:01 4430752
3.0 23:54 6295044
3.0.3 3:58 3948444
3.1-20020422 4:38 3996096
3.1-20020424+kurtpatch 5:35 4102432
3.1-20020422+limit=800 6:37 4177344
3.1-20020422+limit=1200 16:50 4597888
3.2-20020422 5:15 4003276
And excellent news: This patch really improves the quality of the
generated code, and quite significantly so in several cases (much
more than those 3% you claimed)!
Times in [s] | 2.95.3| 3.1-20020422 |3.1-.-kurtpatch|
--------------------+-------+---------------+---------------+
STRATCOMP1-ALL| 3.57 | 96.67 (0.10) | 24.93 (0.01) |
STRATCOMP-770.2-Q| 0.73 | 0.94 (0.01) | 0.79 (0.00) |
2QBF1| 19.08 | 22.26 (0.01) | 20.94 (0.01) |
PRIMEIMPL2| 10.74 | 12.91 (0.01) | 10.59 (0.01) |
ANCESTOR| 8.88 | 9.53 (0.01) | 9.45 (0.01) |
3COL-SIMPLEX1| 6.30 | 7.16 (0.00) | 6.86 (0.00) |
3COL-LADDER1| 36.24 | 42.33 (0.04) | 40.05 (0.02) |
3COL-N-LADDER1| 19.81 | 22.47 (0.16) | 20.44 (0.04) |
3COL-RANDOM1| 10.69 | 12.23 (0.01) | 11.17 (0.01) |
HP-RANDOM1| 13.16 | 14.82 (0.03) | 14.43 (0.03) |
HAMCYCLE-FREE| 1.18 | 1.71 (0.00) | 1.64 (0.00) |
DECOMP2| 21.91 | 24.02 (0.01) | 25.03 (0.02) |
BW-P4-Esra-a| 91.71 | 99.28 (0.01) | 95.02 (0.04) |
BW-P5-nopush| 6.96 | 7.43 (0.00) | 7.11 (0.00) |
BW-P5-pushbin| 6.20 | 6.50 (0.00) | 6.16 (0.00) |
BW-P5-nopushbin| 1.94 | 2.07 (0.01) | 1.98 (0.00) |
3SAT-1| 32.92 | 38.48 (0.02) | 32.80 (0.01) |
3SAT-1-CONSTRAINT| 17.46 | 20.64 (0.00) | 18.67 (0.00) |
HANOI-Towers| 4.73 | 4.95 (0.00) | 4.93 (0.01) |
RAMSEY| 8.00 | 8.66 (0.00) | 8.42 (0.01) |
CRISTAL| 11.07 | 13.38 (0.01) | 11.74 (0.01) |
HANOI-K| 33.41 | 38.76 (0.02) | 34.86 (0.01) |
21-QUEENS| 9.66 | 10.41 (0.01) | 9.58 (0.00) |
MSTDir[V=13,A=40]| 25.71 | 21.09 (0.00) | 19.93 (0.00) |
MSTDir[V=15,A=40]| 25.81 | 21.14 (0.00) | 19.97 (0.00) |
MSTUndir[V=13,A=40]| 12.86 | 11.46 (0.01) | 10.74 (0.00) |
MSTUndir[V=15,A=40]|214.87 | 188.57 (0.00) | 177.02 (0.03) |
TIMETABLING| 9.63 | 10.63 (0.01) | 10.21 (0.02) |
--------------------+-------+---------------+---------------+
This would be very nice to have in GCC 3.1, if it were not for the longer
compile time.
However, I'd really like to apply it to mainline as soon as possible,
because it (finally) compensates most of the code quality regressions
we have been seeing since GCC 2.95.
Gerald
2002-04-23 Kurt Garloff <garloff@suse.de>
* tree-inline.c: Improve heuristics by using a smoother
function to cut down allowable inlinable size.
--- gcc/tree-inline.c.orig Tue Apr 23 22:54:17 2002
+++ gcc/tree-inline.c Tue Apr 23 23:24:57 2002
@@ -706,14 +706,32 @@
/* Even if this function is not itself too big to inline, it might
be that we've done so much inlining already that we don't want to
- risk too much inlining any more and thus halve the acceptable
- size. */
+ risk too much inlining any more */
if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
&& ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) * INSNS_PER_STMT
- > MAX_INLINE_INSNS)
- && DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS / 4)
- inlinable = 0;
-
+ > MAX_INLINE_INSNS * 128))
+ inlinable = 0;
+ /* If we did not hit the extreme limit 128*MAX_INLINE_INSNS by recursion,
+ and we did not hit the limit for a single function (MAX_INLINE_INSNS/2)
+ but we are above the recursive throttling threshold (MAX_INLINE_INSNS),
+ we use a limit that descreases linearly with the already inlined
+ code. We always allow very small funtions (13 statements) to be inlined.
+ Value (13*INSNS_PER_STMT) found by numerous experiments in 3.0.x with
+ C++ code */
+ else if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
+ && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+ * INSNS_PER_STMT > MAX_INLINE_INSNS)
+ && DECL_NUM_STMTS (fn) > 13) {
+ /* Use a linear function with a slope of -0.03125
+ we could also use an int approx. of sqrt or similar things */
+ signed int max_curr = MAX_INLINE_INSNS/2
+ - (( DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+ * INSNS_PER_STMT - MAX_INLINE_INSNS) / 32;
+
+ if ((signed int)(DECL_NUM_STMTS (fn) * INSNS_PER_STMT) > max_curr)
+ inlinable = 0;
+ }
+
if (inlinable && (*lang_hooks.tree_inlining.cannot_inline_tree_fn) (&fn))
inlinable = 0;
@@ -968,7 +986,8 @@
/* Our function now has more statements than it did before. */
DECL_NUM_STMTS (VARRAY_TREE (id->fns, 0)) += DECL_NUM_STMTS (fn);
- id->inlined_stmts += DECL_NUM_STMTS (fn);
+ /* For accounting, subtract one for the saved call/ret */
+ id->inlined_stmts += DECL_NUM_STMTS (fn) - 1;
/* Recurse into the body of the just inlined function. */
expand_calls_inline (inlined_body, id);