This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hello, Andreas Jaeger and I have spent a few cycles this weekend to see what the effect is of different tree inline parameters on the compile time performance of GCC and the runtime performace of the binaries produced by GCC. This wasn't exactly a scientifically responsible investigation; it is just an attempt to find appropriate default values for the inliner parameters. Andreas has tested 20 different sets of parameters with SPECint2000, compiler was GCC 3.4 20030502 CVS. All tests were done with the same compiler, i.e. the compiler was not bootstrapepd with these param settings. Here are the results (non-reportable; built once; two runs per test; flags: -O2 -march=athlon): 1 = max-inline-insns-single == max-inline-insns-auto 2 = max-inline-insns 3 = min-inline-insns 4 = max-inline-slope 1 2 3 4 Build Total bin size Score 150 225 40 16 537s 4685997 397 150 225 70 16 539s 4685864 397 150 225 40 32 534s 4685626 397 150 225 70 32 533s 4685626 398 150 300 40 16 537s 4685931 398 150 300 70 16 539s 4684165 396 150 300 40 32 534s 4685558 398 150 300 70 32 535s 4685558 398 250 375 40 16 544s 4719601 398 250 375 70 16 544s 4719601 399 250 375 100 16 544s 4719601 398 250 375 40 32 542s 4723490 398 250 375 70 32 541s 4723490 398 250 375 100 32 538s 4723490 398 250 500 40 16 542s 4719466 399 250 500 70 16 548s 4719466 398 250 500 100 16 545s 4719466 398 250 500 40 32 542s 4723393 398 250 500 70 32 543s 4723393 398 250 500 100 32 538s 4723393 397 I provided Andreas with the param settings. and I choose max-inline-insns-single to be always equal to max-inline-insns-auto because the setting for that parameter does not matter at -O2. The scores with the default settings (from mainlne) are available from Andreas' web site: http://www.suse.de/~aj/SPEC/CINT/d-permanent/mean-int_big.png You can find the full results including individual benchmark results of the SPECint runs in the attached file. As you can see there really is no significant difference in compile time, total binary size, and score between any of these settings, compared to each other and to the defaults. This is a bit of a shame because it means that SPEC is not a good test to find good default values for the tree inliner parameters. Another way to interpret these tests, is to say that, at least for SPECint, it is safe to lower the parameter settings without serious performance degradation. Let's be optimistic, then, and try some lower parameter settings. I did timings for PR 8361 ("-quiet -O3 -fno-unit-at-a-time" plus inline params; avarage of three runs each): 1 2 3 4 Build (user+sys) 300 600 130 32 3m15 250 500 40 16 2m46 250 500 40 32 2m46 250 500 70 16 2m46 250 500 70 32 2m46 250 500 100 16 2m46 250 500 100 32 2m46 200 500 70 16 2m34 200 500 70 32 2m34 This clearly shows that max-inline-insns-single is the dominating parameter to attack if we want to get rid of the slowdown tracked in this PR. The drop from 300 to 250 is far more significant than the drop from 250 to 200, and I know from private emails between Richard Guenther and myself that for a test case he was playing with for PR 10196 (A POOMA test case IIRC???), he still had good runtime performance at max-inline-insns-single=250, while there still was a nice compiler speedup as well (though that was before Mark's fixes for the slowdown with exceptions). I'd like to see if max-inline-insns-auto=230 works for most code. That number corresponds to 20 statements in the function body, that's 7 statements fewer than we do now. I have also looked again at Richard Guenther's emails where he tested different values of min-inline-insns (X in those emails: http://gcc.gnu.org/ml/gcc/2003-04/msg00817.html, and the follow up: http://gcc.gnu.org/ml/gcc/2003-04/msg01136.html). I'll quote here: /QUOTE/ [ first mail -- X == min-inline-insns ] Lower numbers for the perf. indicator are better. X compile-time performance indicator default 49.50 1.99804e-06 50 50.25 2.26817e-06 100 50.00 1.96918e-06 150 51.00 1.90269e-06 200 58.25 1.83045e-06 250 61.25 1.28309e-06 300 62.75 1.29364e-06 default + -Dinline="__inline__ __attribute__((always_inline))" 50.50 1.31171e-06 (while the source is not optimized for inline->always_inline transformation) /QUOTE/ /QUOTE/ [ follow-up -- after Mark's fixes for exceptions ] With g++-3.3 (GCC) 3.3 20030423 (prerelease) I now get 250 79.75 [154MB] 1.2667e-06 (...) Just for the curious, here are the g++-3.2 (GCC) 3.2.3 20030414 (prerelease) numbers: (...) default 63.00 [170MB] 1.96574e-06 /QUOTE/ This shows that for performance equal to 3.2.3 we should not choose min-inline-insns too small, but somewhere between 50 and 100 would probably be acceptable. Smaller is better for this parameter because it is the one parameter that can be responsible for blowing up the inliner completely: Any function with fewer than min-inline-insns is inlined no matter how large the caller's function body has grown due to other functions being inlined directly or via recursive inlining. So we have to take this number as small as possible; let's try 80, which is equivalent to 5 meaningful statements in the function body. Choosing it as high as max-inline-insns-single, like Richard did, is not a reasonable default. It shows that with more inlining we can produce faster code, but at the expense of a potentially huge compile time increase like the one reported in PR 10160. Of course the whole point of this excercise is to avoid that ;-) I think that being as good as 3.2.3 is "good enough". Based on Richard's results I would also speculate that for C++ it _is_ important to have a threshold such that functions that are always inlined if they are smaller than this threshold, and that this number is probably more important than the value of the throttle. So we could choose to throttle down faster and inline fewer relatively large functions. I picked max-inline-insns-slope=20 because all insns counts in the tree inliner are multiples of 10 (==ISNS_PER_STMT), so that choosing this pararameter at 20 effectively means that after reaching max-inline-insns, every statement in an inline candidate counts for two. Of course, choosing param values based on the results reported above is not possible; they just show that we can lower them. So the question is: What do these numbers do for your code, both in terms of compile time and in runtime performance? Richard, Dave, Gerald, can you please test the settings below on your favorite code and on the test case for PR 10160 and PR 8361 (and maybe PR 10316)? max-inline-insns-single=230 max-inline-insns-auto=230 max-inline-insns=500 min-inline-insns=80 max-inline-slope=20 If these settings produce improved compile times with no or a small runtime performance degradation, then I would like to propose these settings as the defaults for 3.3.1 and for mainline. If they don't wokr, then we can still try to take max-inline-insns-single and max-inline-insns-auto further down and see what happens. If _that_ doesn't work, I suppose it would be a sign for us that we should just give up ;-) HTH, I'm looking forward to seeing your results. Thanks, regards Steven
Attachment:
mbox.inline_params.gz
Description: GNU Zip compressed data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |