This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Experimenting with tree inliner parameter settings


Hello,

Andreas Jaeger and I have spent a few cycles this weekend to
see what the effect is of different tree inline parameters on
the compile time performance of GCC and the runtime performace
of the binaries produced by GCC.  This wasn't exactly a
scientifically responsible investigation; it is just an
attempt to find appropriate default values for the inliner
parameters.

Andreas has tested 20 different sets of parameters with
SPECint2000, compiler was GCC 3.4 20030502 CVS.  All tests
were done with the same compiler, i.e. the compiler was not
bootstrapepd with these param settings.  Here are the results
(non-reportable; built once; two runs per test;
flags: -O2 -march=athlon):

1 = max-inline-insns-single == max-inline-insns-auto
2 = max-inline-insns
3 = min-inline-insns
4 = max-inline-slope

1	2	3	4	Build	Total bin size	Score
150	225	40	16	537s	4685997		397
150	225	70	16	539s	4685864		397

150	225	40	32	534s	4685626		397
150	225	70	32	533s	4685626		398

150	300	40	16	537s	4685931		398
150	300	70	16	539s	4684165		396

150	300	40	32	534s	4685558		398
150	300	70	32	535s	4685558		398

250	375	40	16	544s	4719601		398
250	375	70	16	544s	4719601		399
250	375	100	16	544s	4719601		398

250	375	40	32	542s	4723490		398
250	375	70	32	541s	4723490		398
250	375	100	32	538s	4723490		398

250	500	40	16	542s	4719466		399
250	500	70	16	548s	4719466		398
250	500	100	16	545s	4719466		398

250	500	40	32	542s	4723393		398
250	500	70	32	543s	4723393		398
250	500	100	32	538s	4723393		397

I provided Andreas with the param settings. and I choose
max-inline-insns-single to be always equal to
max-inline-insns-auto because the setting for that parameter
does not matter at -O2.

The scores with the default settings (from mainlne) are
available from Andreas' web site:
http://www.suse.de/~aj/SPEC/CINT/d-permanent/mean-int_big.png
You can find the full results including individual benchmark
results of the SPECint runs in the attached file.

As you can see there really is no significant difference in
compile time, total binary size, and score between any of
these settings, compared to each other and to the defaults.
This is a bit of a shame because it means that SPEC is not
a good test to find good default values for the tree inliner
parameters.

Another way to interpret these tests, is to say that, at
least for SPECint, it is safe to lower the parameter settings
without serious performance degradation.  Let's be optimistic,
then, and try some lower parameter settings.



I did timings for PR 8361 ("-quiet -O3 -fno-unit-at-a-time"
plus inline params; avarage of three runs each):

1	2	3	4	Build (user+sys) 
300	600	130	32	3m15 
250 	500	40	16	2m46 
250	500	40	32	2m46 
250	500	70	16	2m46 
250	500	70	32	2m46 
250	500	100	16	2m46 
250	500	100	32	2m46 
200	500	70	16	2m34 
200	500	70	32	2m34

This clearly shows that max-inline-insns-single is the
dominating parameter to attack if we want to get rid of the
slowdown tracked in this PR.

The drop from 300 to 250 is far more significant than the
drop from 250 to 200, and I know from private emails between
Richard Guenther and myself that for a test case he was
playing with for PR 10196 (A POOMA test case IIRC???), he
still had good runtime performance at
max-inline-insns-single=250, while there still was a nice
compiler speedup as well (though that was before Mark's fixes
for the slowdown with exceptions).
I'd like to see if max-inline-insns-auto=230 works for most
code.  That number corresponds to 20 statements in the
function body, that's 7 statements fewer than we do now.



I have also looked again at Richard Guenther's emails where he
tested different values of min-inline-insns (X in those emails:
http://gcc.gnu.org/ml/gcc/2003-04/msg00817.html, and
the follow up: http://gcc.gnu.org/ml/gcc/2003-04/msg01136.html).
I'll quote here:

/QUOTE/ [ first mail -- X == min-inline-insns ]
Lower numbers for the perf. indicator are better.
  X      compile-time    performance indicator
default     49.50           1.99804e-06
 50         50.25           2.26817e-06
100         50.00           1.96918e-06
150         51.00           1.90269e-06
200         58.25           1.83045e-06
250         61.25           1.28309e-06
300         62.75           1.29364e-06
default + -Dinline="__inline__ __attribute__((always_inline))"
            50.50           1.31171e-06
(while the source is not optimized for inline->always_inline
transformation)
/QUOTE/

/QUOTE/ [ follow-up -- after Mark's fixes for exceptions ]
With g++-3.3 (GCC) 3.3 20030423 (prerelease) I now get

250     79.75 [154MB]     1.2667e-06

(...)

Just for the curious, here are the g++-3.2 (GCC) 3.2.3
20030414 (prerelease) numbers: (...)

default  63.00 [170MB]   1.96574e-06
/QUOTE/


This shows that for performance equal to 3.2.3 we should not
choose min-inline-insns too small, but somewhere between 50
and 100 would probably be acceptable.  Smaller is better for
this parameter because it is the one parameter that can be
responsible for blowing up the inliner completely: Any
function with fewer than min-inline-insns is inlined no
matter how large the caller's function body has grown due to
other functions being inlined directly or via recursive
inlining.
So we have to take this number as small as possible; let's
try 80, which is equivalent to 5 meaningful statements in
the function body.
Choosing it as high as max-inline-insns-single, like Richard
did, is not a reasonable default.  It shows that with more
inlining we can produce faster code, but at the expense of a
potentially huge compile time increase like the one reported
in PR 10160.  Of course the whole point of this excercise is
to avoid that ;-)  I think that being as good as 3.2.3 is
"good enough".

Based on Richard's results I would also speculate that for C++
it _is_ important to have a threshold such that functions that
are always inlined if they are smaller than this threshold,
and that this number is probably more important than the
value of the throttle.  So we could choose to throttle down
faster and inline fewer relatively large functions.

I picked max-inline-insns-slope=20 because all insns counts
in the tree inliner are multiples of 10 (==ISNS_PER_STMT), so
that choosing this pararameter at 20 effectively means that
after reaching max-inline-insns, every statement in an inline
candidate counts for two.



Of course, choosing param values based on the results
reported above is not possible; they just show that we can
lower them.

So the question is: What do these numbers do for your code,
both in terms of compile time and in runtime performance?  
Richard, Dave, Gerald, can you please test the settings 
below on your favorite code and on the test case for
PR 10160 and PR 8361 (and maybe PR 10316)?

max-inline-insns-single=230
max-inline-insns-auto=230
max-inline-insns=500
min-inline-insns=80
max-inline-slope=20


If these settings produce improved compile times with no
or a small runtime performance degradation, then I would
like to propose these settings as the defaults for 3.3.1
and for mainline.  If they don't wokr, then we can still
try to take max-inline-insns-single and
max-inline-insns-auto further down and see what happens.
If _that_ doesn't work, I suppose it would be a sign for
us that we should just give up ;-)

HTH, I'm looking forward to seeing your results.

Thanks, regards
Steven

Attachment: mbox.inline_params.gz
Description: GNU Zip compressed data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]