This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: -fobey-inline (was Re: gcc and inlining)
- From: Richard Guenther <rguenth at tat dot physik dot uni-tuebingen dot de>
- To: Mike Stump <mrs at apple dot com>
- Cc: Stuart Hastings <stuart at apple dot com>, Matt Austern <austern at apple dot com>, Ron Price <ronp at apple dot com>, Mark Mitchell <mark at codesourcery dot com>, <gcc at gcc dot gnu dot org>
- Date: Thu, 13 Mar 2003 10:29:22 +0100 (CET)
- Subject: Re: -fobey-inline (was Re: gcc and inlining)
On Wed, 12 Mar 2003, Mike Stump wrote:
> On Wednesday, March 12, 2003, at 01:07 PM, Richard Guenther wrote:
> > I finally got the patch work for C++ (see attached patch - maybe
> > completely bogous, though...). An I have some numbers for you:
>
> If you could, find the various flags that control inlining, and bump
> the numbers up until you get similar number to (or better than) this
> flag. Then tell us what those numbers were, then we can consider
> upping those numbers. Also, tell us the language, I assume it was C++.
Ok, the solution for me is simple - just disable decay of
max-inline-insns-single by f.i. setting max-inline-insns to 1000000
or max-inline-slope to 1000000 (both just artificial high numbers). For
some reasons the latter produces better results, I dont know why.
One could find smaller values for my particular testcase, but this wouldnt
cure the problem in general I think, so maybe an extra switch to disable
the inlining limits decay would be useful?
Just one more point, upping this limit does have some negative impact on
compiling performance:
bellatrix:~/src/pooma-bib/r2/benchmarks/test$ g++-3.3
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/Bench.cpp -o
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/LINUXgcc/Bench
-ftemplate-depth-60 -fno-exceptions -Drestrict=__restrict__ -DNOPAssert
-DNOCTAssert -I/home/rguenth/src/pooma-bib/r2/src
-I/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc
-L/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc -lpooma -lm -O2
-march=athlon -fomit-frame-pointer -funroll-loops -ftime-report
Execution times (seconds)
garbage collection : 1.88 ( 8%) usr 0.00 ( 0%) sys 2.25 ( 7%)
cfg construction : 0.11 ( 0%) usr 0.01 ( 1%) sys 0.12 ( 0%)
cfg cleanup : 0.28 ( 1%) usr 0.01 ( 1%) sys 0.32 ( 1%)
trivially dead code : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.38 ( 1%)
life analysis : 0.49 ( 2%) usr 0.00 ( 0%) sys 0.56 ( 2%)
life info update : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 0%)
preprocessing : 0.49 ( 2%) usr 0.20 (10%) sys 0.71 ( 2%)
lexical analysis : 0.41 ( 2%) usr 0.17 ( 9%) sys 0.72 ( 2%)
parser : 5.27 (21%) usr 0.53 (27%) sys 6.28 (21%)
name lookup : 2.98 (12%) usr 0.82 (41%) sys 4.12 (14%)
expand : 2.74 (11%) usr 0.01 ( 1%) sys 3.49 (12%)
varconst : 0.10 ( 0%) usr 0.01 ( 1%) sys 0.12 ( 0%)
integration : 1.24 ( 5%) usr 0.03 ( 2%) sys 1.41 ( 5%)
jump : 0.23 ( 1%) usr 0.01 ( 1%) sys 0.27 ( 1%)
CSE : 2.19 ( 9%) usr 0.03 ( 2%) sys 2.46 ( 8%)
global CSE : 0.72 ( 3%) usr 0.02 ( 1%) sys 0.80 ( 3%)
loop analysis : 0.54 ( 2%) usr 0.00 ( 0%) sys 0.59 ( 2%)
CSE 2 : 0.85 ( 3%) usr 0.01 ( 1%) sys 0.95 ( 3%)
branch prediction : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.38 ( 1%)
flow analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%)
combiner : 0.29 ( 1%) usr 0.01 ( 1%) sys 0.40 ( 1%)
if-conversion : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%)
regmove : 0.10 ( 0%) usr 0.01 ( 1%) sys 0.15 ( 0%)
mode switching : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%)
local alloc : 0.29 ( 1%) usr 0.02 ( 1%) sys 0.32 ( 1%)
global alloc : 0.59 ( 2%) usr 0.00 ( 0%) sys 0.62 ( 2%)
reload CSE regs : 0.48 ( 2%) usr 0.00 ( 0%) sys 0.49 ( 2%)
flow 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%)
if-conversion 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%)
peephole 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%)
rename registers : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%)
scheduling 2 : 0.54 ( 2%) usr 0.05 ( 3%) sys 0.65 ( 2%)
reorder blocks : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
shorten branches : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%)
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
final : 0.12 ( 0%) usr 0.02 ( 1%) sys 0.14 ( 0%)
rest of compilation : 0.38 ( 2%) usr 0.02 ( 1%) sys 0.45 ( 2%)
TOTAL : 24.77 1.99 30.15
bellatrix:~/src/pooma-bib/r2/benchmarks/test$ g++-3.3
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/Bench.cpp -o
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/LINUXgcc/Bench
-ftemplate-depth-60 -fno-exceptions -Drestrict=__restrict__ -DNOPAssert
-DNOCTAssert -I/home/rguenth/src/pooma-bib/r2/src
-I/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc
-L/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc -lpooma -lm -O2
-march=athlon -fomit-frame-pointer -funroll-loops --param
max-inline-slope=1000000 -ftime-report
Execution times (seconds)
garbage collection : 2.58 ( 8%) usr 0.01 ( 0%) sys 2.62 ( 8%)
cfg construction : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%)
cfg cleanup : 0.40 ( 1%) usr 0.03 ( 1%) sys 0.45 ( 1%)
trivially dead code : 0.53 ( 2%) usr 0.00 ( 0%) sys 0.57 ( 2%)
life analysis : 0.66 ( 2%) usr 0.00 ( 0%) sys 0.70 ( 2%)
life info update : 0.20 ( 1%) usr 0.00 ( 0%) sys 0.21 ( 1%)
preprocessing : 0.44 ( 1%) usr 0.20 ( 7%) sys 0.73 ( 2%)
lexical analysis : 0.46 ( 1%) usr 0.22 ( 8%) sys 0.65 ( 2%)
parser : 5.41 (17%) usr 0.82 (30%) sys 6.31 (18%)
name lookup : 2.84 ( 9%) usr 0.63 (23%) sys 3.56 (10%)
expand : 4.43 (14%) usr 0.22 ( 8%) sys 4.69 (14%)
varconst : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%)
integration : 2.01 ( 6%) usr 0.17 ( 6%) sys 2.18 ( 6%)
jump : 0.39 ( 1%) usr 0.01 ( 0%) sys 0.40 ( 1%)
CSE : 2.95 ( 9%) usr 0.03 ( 1%) sys 3.02 ( 9%)
global CSE : 0.95 ( 3%) usr 0.03 ( 1%) sys 1.00 ( 3%)
loop analysis : 0.86 ( 3%) usr 0.08 ( 3%) sys 0.96 ( 3%)
CSE 2 : 1.35 ( 4%) usr 0.00 ( 0%) sys 1.36 ( 4%)
branch prediction : 0.37 ( 1%) usr 0.01 ( 0%) sys 0.38 ( 1%)
flow analysis : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%)
combiner : 0.40 ( 1%) usr 0.00 ( 0%) sys 0.44 ( 1%)
if-conversion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%)
regmove : 0.16 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 0%)
mode switching : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%)
local alloc : 0.48 ( 2%) usr 0.00 ( 0%) sys 0.48 ( 1%)
global alloc : 0.65 ( 2%) usr 0.01 ( 0%) sys 0.67 ( 2%)
reload CSE regs : 0.56 ( 2%) usr 0.03 ( 1%) sys 0.61 ( 2%)
flow 2 : 0.08 ( 0%) usr 0.03 ( 1%) sys 0.10 ( 0%)
if-conversion 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%)
peephole 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%)
rename registers : 0.20 ( 1%) usr 0.00 ( 0%) sys 0.21 ( 1%)
scheduling 2 : 0.61 ( 2%) usr 0.07 ( 3%) sys 0.65 ( 2%)
machine dep reorg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
shorten branches : 0.08 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 0%)
final : 0.09 ( 0%) usr 0.02 ( 1%) sys 0.13 ( 0%)
symout : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%)
rest of compilation : 0.57 ( 2%) usr 0.03 ( 1%) sys 0.59 ( 2%)
TOTAL : 31.21 2.69 34.45
Hope this helps the decision.
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/