This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: -fobey-inline (was Re: gcc and inlining)


On Wed, 12 Mar 2003, Mike Stump wrote:

> On Wednesday, March 12, 2003, at 01:07 PM, Richard Guenther wrote:
> > I finally got the patch work for C++ (see attached patch - maybe
> > completely bogous, though...). An I have some numbers for you:
>
> If you could, find the various flags that control inlining, and bump
> the numbers up until you get similar number to (or better than) this
> flag.  Then tell us what those numbers were, then we can consider
> upping those numbers.  Also, tell us the language, I assume it was C++.

Ok, the solution for me is simple - just disable decay of
max-inline-insns-single by f.i. setting max-inline-insns to 1000000
or max-inline-slope to 1000000 (both just artificial high numbers). For
some reasons the latter produces better results, I dont know why.

One could find smaller values for my particular testcase, but this wouldnt
cure the problem in general I think, so maybe an extra switch to disable
the inlining limits decay would be useful?

Just one more point, upping this limit does have some negative impact on
compiling performance:

bellatrix:~/src/pooma-bib/r2/benchmarks/test$ g++-3.3
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/Bench.cpp -o
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/LINUXgcc/Bench
-ftemplate-depth-60 -fno-exceptions  -Drestrict=__restrict__ -DNOPAssert
-DNOCTAssert -I/home/rguenth/src/pooma-bib/r2/src
-I/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc
-L/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc -lpooma -lm -O2
-march=athlon -fomit-frame-pointer -funroll-loops -ftime-report

Execution times (seconds)
 garbage collection    :   1.88 ( 8%) usr   0.00 ( 0%) sys   2.25 ( 7%)
 cfg construction      :   0.11 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%)
 cfg cleanup           :   0.28 ( 1%) usr   0.01 ( 1%) sys   0.32 ( 1%)
 trivially dead code   :   0.34 ( 1%) usr   0.00 ( 0%) sys   0.38 ( 1%)
 life analysis         :   0.49 ( 2%) usr   0.00 ( 0%) sys   0.56 ( 2%)
 life info update      :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 0%)
 preprocessing         :   0.49 ( 2%) usr   0.20 (10%) sys   0.71 ( 2%)
 lexical analysis      :   0.41 ( 2%) usr   0.17 ( 9%) sys   0.72 ( 2%)
 parser                :   5.27 (21%) usr   0.53 (27%) sys   6.28 (21%)
 name lookup           :   2.98 (12%) usr   0.82 (41%) sys   4.12 (14%)
 expand                :   2.74 (11%) usr   0.01 ( 1%) sys   3.49 (12%)
 varconst              :   0.10 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%)
 integration           :   1.24 ( 5%) usr   0.03 ( 2%) sys   1.41 ( 5%)
 jump                  :   0.23 ( 1%) usr   0.01 ( 1%) sys   0.27 ( 1%)
 CSE                   :   2.19 ( 9%) usr   0.03 ( 2%) sys   2.46 ( 8%)
 global CSE            :   0.72 ( 3%) usr   0.02 ( 1%) sys   0.80 ( 3%)
 loop analysis         :   0.54 ( 2%) usr   0.00 ( 0%) sys   0.59 ( 2%)
 CSE 2                 :   0.85 ( 3%) usr   0.01 ( 1%) sys   0.95 ( 3%)
 branch prediction     :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.38 ( 1%)
 flow analysis         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%)
 combiner              :   0.29 ( 1%) usr   0.01 ( 1%) sys   0.40 ( 1%)
 if-conversion         :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%)
 regmove               :   0.10 ( 0%) usr   0.01 ( 1%) sys   0.15 ( 0%)
 mode switching        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%)
 local alloc           :   0.29 ( 1%) usr   0.02 ( 1%) sys   0.32 ( 1%)
 global alloc          :   0.59 ( 2%) usr   0.00 ( 0%) sys   0.62 ( 2%)
 reload CSE regs       :   0.48 ( 2%) usr   0.00 ( 0%) sys   0.49 ( 2%)
 flow 2                :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%)
 if-conversion 2       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%)
 peephole 2            :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%)
 rename registers      :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.23 ( 1%)
 scheduling 2          :   0.54 ( 2%) usr   0.05 ( 3%) sys   0.65 ( 2%)
 reorder blocks        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%)
 shorten branches      :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%)
 reg stack             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%)
 final                 :   0.12 ( 0%) usr   0.02 ( 1%) sys   0.14 ( 0%)
 rest of compilation   :   0.38 ( 2%) usr   0.02 ( 1%) sys   0.45 ( 2%)
 TOTAL                 :  24.77             1.99            30.15

bellatrix:~/src/pooma-bib/r2/benchmarks/test$ g++-3.3
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/Bench.cpp -o
/net/bellatrix/home/rguenth/src/pooma-bib/r2/benchmarks/test/LINUXgcc/Bench
-ftemplate-depth-60 -fno-exceptions  -Drestrict=__restrict__ -DNOPAssert
-DNOCTAssert -I/home/rguenth/src/pooma-bib/r2/src
-I/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc
-L/home/rguenth/src/pooma-bib/r2/lib/LINUXgcc -lpooma -lm -O2
-march=athlon -fomit-frame-pointer -funroll-loops --param
max-inline-slope=1000000 -ftime-report

Execution times (seconds)
 garbage collection    :   2.58 ( 8%) usr   0.01 ( 0%) sys   2.62 ( 8%)
 cfg construction      :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.16 ( 0%)
 cfg cleanup           :   0.40 ( 1%) usr   0.03 ( 1%) sys   0.45 ( 1%)
 trivially dead code   :   0.53 ( 2%) usr   0.00 ( 0%) sys   0.57 ( 2%)
 life analysis         :   0.66 ( 2%) usr   0.00 ( 0%) sys   0.70 ( 2%)
 life info update      :   0.20 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%)
 preprocessing         :   0.44 ( 1%) usr   0.20 ( 7%) sys   0.73 ( 2%)
 lexical analysis      :   0.46 ( 1%) usr   0.22 ( 8%) sys   0.65 ( 2%)
 parser                :   5.41 (17%) usr   0.82 (30%) sys   6.31 (18%)
 name lookup           :   2.84 ( 9%) usr   0.63 (23%) sys   3.56 (10%)
 expand                :   4.43 (14%) usr   0.22 ( 8%) sys   4.69 (14%)
 varconst              :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%)
 integration           :   2.01 ( 6%) usr   0.17 ( 6%) sys   2.18 ( 6%)
 jump                  :   0.39 ( 1%) usr   0.01 ( 0%) sys   0.40 ( 1%)
 CSE                   :   2.95 ( 9%) usr   0.03 ( 1%) sys   3.02 ( 9%)
 global CSE            :   0.95 ( 3%) usr   0.03 ( 1%) sys   1.00 ( 3%)
 loop analysis         :   0.86 ( 3%) usr   0.08 ( 3%) sys   0.96 ( 3%)
 CSE 2                 :   1.35 ( 4%) usr   0.00 ( 0%) sys   1.36 ( 4%)
 branch prediction     :   0.37 ( 1%) usr   0.01 ( 0%) sys   0.38 ( 1%)
 flow analysis         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%)
 combiner              :   0.40 ( 1%) usr   0.00 ( 0%) sys   0.44 ( 1%)
 if-conversion         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%)
 regmove               :   0.16 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 0%)
 mode switching        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%)
 local alloc           :   0.48 ( 2%) usr   0.00 ( 0%) sys   0.48 ( 1%)
 global alloc          :   0.65 ( 2%) usr   0.01 ( 0%) sys   0.67 ( 2%)
 reload CSE regs       :   0.56 ( 2%) usr   0.03 ( 1%) sys   0.61 ( 2%)
 flow 2                :   0.08 ( 0%) usr   0.03 ( 1%) sys   0.10 ( 0%)
 if-conversion 2       :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%)
 peephole 2            :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%)
 rename registers      :   0.20 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%)
 scheduling 2          :   0.61 ( 2%) usr   0.07 ( 3%) sys   0.65 ( 2%)
 machine dep reorg     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%)
 shorten branches      :   0.08 ( 0%) usr   0.01 ( 0%) sys   0.09 ( 0%)
 final                 :   0.09 ( 0%) usr   0.02 ( 1%) sys   0.13 ( 0%)
 symout                :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%)
 rest of compilation   :   0.57 ( 2%) usr   0.03 ( 1%) sys   0.59 ( 2%)
 TOTAL                 :  31.21             2.69            34.45


Hope this helps the decision.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]