Bug 42175 - Slow compile and much memory use at -O1
Summary: Slow compile and much memory use at -O1
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog, memory-hog
Depends on:
Blocks:
 
Reported: 2009-11-25 16:28 UTC by Richard Biener
Modified: 2014-01-15 15:21 UTC (History)
3 users (show)

See Also:
Host:
Target: x86-64-linux
Build:
Known to work: 4.9.0
Known to fail:
Last reconfirmed: 2009-11-27 11:58:38


Attachments
testcase (415.15 KB, application/octet-stream)
2009-11-25 16:29 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2009-11-25 16:28:19 UTC
 
Comment 1 Richard Biener 2009-11-25 16:29:03 UTC
Created attachment 19149 [details]
testcase

Testcase from gmic.
Comment 2 Richard Biener 2009-11-25 16:41:27 UTC
With the current 4.4 branch I see

 loop analysis         : 116.95 (44%) usr   0.02 ( 0%) sys 117.11 (42%) wall   11269 kB ( 1%) ggc
 TOTAL                 : 266.16             8.09           277.29            1988801 kB

(we seem to bin all rtl loop opt passes there, ugh)

4.5 runs out-of memory for me currently.
Comment 3 Richard Biener 2009-11-27 11:58:37 UTC
4.5 shows at -O1:

Execution times (seconds)
 garbage collection    :   1.66 ( 1%) usr   0.05 ( 0%) sys   1.73 ( 1%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   12135 kB ( 1%) ggc
 callgraph optimization:   0.77 ( 1%) usr   0.03 ( 0%) sys   0.83 ( 1%) wall    2655 kB ( 0%) ggc
 ipa reference         :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall     338 kB ( 0%) ggc
 ipa free lang data    :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall     679 kB ( 0%) ggc
 cfg cleanup           :   0.95 ( 1%) usr   0.00 ( 0%) sys   0.99 ( 1%) wall     308 kB ( 0%) ggc
 trivially dead code   :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 df multiple defs      :   9.71 ( 7%) usr   0.34 ( 2%) sys  10.07 ( 7%) wall       0 kB ( 0%) ggc
 df reaching defs      :   6.12 ( 5%) usr   0.01 ( 0%) sys   5.99 ( 4%) wall       0 kB ( 0%) ggc
 df live regs          :   6.30 ( 5%) usr   0.00 ( 0%) sys   6.32 ( 4%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   8.70 ( 7%) usr   0.00 ( 0%) sys   8.73 ( 6%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   1.56 ( 1%) usr   0.00 ( 0%) sys   1.56 ( 1%) wall   19924 kB ( 1%) ggc
 register information  :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.58 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.69 ( 1%) usr   0.00 ( 0%) sys   0.68 ( 0%) wall   36096 kB ( 2%) ggc
 alias stmt walking    :   0.23 ( 0%) usr   0.11 ( 1%) sys   0.32 ( 0%) wall       0 kB ( 0%) ggc
 register scan         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     117 kB ( 0%) ggc
 rebuild jump labels   :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.12 ( 0%) usr   0.12 ( 1%) sys   0.25 ( 0%) wall    2516 kB ( 0%) ggc
 parser                :   2.13 ( 2%) usr   0.30 ( 2%) sys   2.57 ( 2%) wall  284375 kB (17%) ggc
 name lookup           :   0.43 ( 0%) usr   0.27 ( 2%) sys   0.54 ( 0%) wall   26614 kB ( 2%) ggc
 inline heuristics     :   0.65 ( 0%) usr   0.02 ( 0%) sys   0.67 ( 0%) wall   10162 kB ( 1%) ggc
 integration           :   1.16 ( 1%) usr   0.19 ( 1%) sys   1.21 ( 1%) wall  250954 kB (15%) ggc
 tree gimplify         :   0.58 ( 0%) usr   0.02 ( 0%) sys   0.58 ( 0%) wall  101453 kB ( 6%) ggc
 tree eh               :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall    7971 kB ( 0%) ggc
 tree CFG construction :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall   29744 kB ( 2%) ggc
 tree CFG cleanup      :   1.40 ( 1%) usr   0.01 ( 0%) sys   1.22 ( 1%) wall    6447 kB ( 0%) ggc
 tree copy propagation :   0.68 ( 1%) usr   0.00 ( 0%) sys   0.75 ( 1%) wall    2410 kB ( 0%) ggc
 tree find ref. vars   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    5799 kB ( 0%) ggc
 tree PTA              :   0.51 ( 0%) usr   0.02 ( 0%) sys   0.58 ( 0%) wall    7695 kB ( 0%) ggc
 tree PHI insertion    :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall    8341 kB ( 0%) ggc
 tree SSA rewrite      :   0.52 ( 0%) usr   0.01 ( 0%) sys   0.58 ( 0%) wall   36397 kB ( 2%) ggc
 tree SSA other        :   0.09 ( 0%) usr   0.03 ( 0%) sys   0.13 ( 0%) wall     352 kB ( 0%) ggc
 tree SSA incremental  :   2.69 ( 2%) usr   0.04 ( 0%) sys   2.87 ( 2%) wall   14341 kB ( 1%) ggc
 tree operand scan     :   0.54 ( 0%) usr   0.17 ( 1%) sys   0.85 ( 1%) wall  100704 kB ( 6%) ggc
 dominator optimization:   0.90 ( 1%) usr   0.00 ( 0%) sys   0.97 ( 1%) wall   23583 kB ( 1%) ggc
 tree SRA              :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     417 kB ( 0%) ggc
 tree CCP              :   0.82 ( 1%) usr   0.01 ( 0%) sys   0.94 ( 1%) wall    8250 kB ( 0%) ggc
 tree PHI const/copy prop:   0.08 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     492 kB ( 0%) ggc
 tree split crit edges :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall   19160 kB ( 1%) ggc
 tree reassociation    :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall    7811 kB ( 0%) ggc
 tree FRE              :   1.22 ( 1%) usr   0.01 ( 0%) sys   1.29 ( 1%) wall    9250 kB ( 1%) ggc
 tree code sinking     :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall    2506 kB ( 0%) ggc
 tree linearize phis   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     801 kB ( 0%) ggc
 tree forward propagate:   0.23 ( 0%) usr   0.01 ( 0%) sys   0.17 ( 0%) wall    7457 kB ( 0%) ggc
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      13 kB ( 0%) ggc
 tree conservative DCE :   0.32 ( 0%) usr   0.02 ( 0%) sys   0.38 ( 0%) wall      26 kB ( 0%) ggc
 tree aggressive DCE   :   0.22 ( 0%) usr   0.01 ( 0%) sys   0.24 ( 0%) wall      35 kB ( 0%) ggc
 tree DSE              :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     440 kB ( 0%) ggc
 PHI merge             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    1351 kB ( 0%) ggc
 tree loop bounds      :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall    4813 kB ( 0%) ggc
 loop invariant motion :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall      57 kB ( 0%) ggc
 tree canonical iv     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    5261 kB ( 0%) ggc
 scev constant prop    :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall    9369 kB ( 1%) ggc
 complete unrolling    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    3410 kB ( 0%) ggc
 tree iv optimization  :   1.86 ( 1%) usr   0.01 ( 0%) sys   1.73 ( 1%) wall  145322 kB ( 9%) ggc
 tree loop init        :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall    4518 kB ( 0%) ggc
 tree copy headers     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   12790 kB ( 1%) ggc
 tree SSA uncprop      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree NRV optimization :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      21 kB ( 0%) ggc
 tree rename SSA copies:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall       0 kB ( 0%) ggc
 tree switch initialization conversion:   0.00 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall       3 kB ( 0%) ggc
 dominance frontiers   :   0.58 ( 0%) usr   0.00 ( 0%) sys   0.56 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   1.13 ( 1%) usr   0.00 ( 0%) sys   1.06 ( 1%) wall       0 kB ( 0%) ggc
 expand                :   6.74 ( 5%) usr   0.04 ( 0%) sys   7.00 ( 5%) wall  197896 kB (12%) ggc
 varconst              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     340 kB ( 0%) ggc
 lower subreg          :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 jump                  :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall    4317 kB ( 0%) ggc
 forward prop          :   1.25 ( 1%) usr   0.02 ( 0%) sys   1.20 ( 1%) wall   16469 kB ( 1%) ggc
 CSE                   :   0.82 ( 1%) usr   0.00 ( 0%) sys   0.74 ( 0%) wall    1324 kB ( 0%) ggc
 dead code elimination :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1      :   0.83 ( 1%) usr   0.01 ( 0%) sys   0.80 ( 1%) wall   10643 kB ( 1%) ggc
 dead store elim2      :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.69 ( 0%) wall   15696 kB ( 1%) ggc
 loop analysis         :  39.45 (30%) usr   0.00 ( 0%) sys  39.59 (27%) wall    8374 kB ( 1%) ggc
 branch prediction     :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall   14907 kB ( 1%) ggc
 combiner              :   2.66 ( 2%) usr   0.02 ( 0%) sys   2.90 ( 2%) wall   57748 kB ( 3%) ggc
 if-conversion         :   1.04 ( 1%) usr   0.02 ( 0%) sys   1.05 ( 1%) wall    7066 kB ( 0%) ggc
 integrated RA         :   7.11 ( 5%) usr  11.66 (85%) sys  20.52 (14%) wall   34152 kB ( 2%) ggc
 reload                :   5.83 ( 4%) usr   0.02 ( 0%) sys   5.80 ( 4%) wall   34513 kB ( 2%) ggc
 reload CSE regs       :   3.73 ( 3%) usr   0.01 ( 0%) sys   3.73 ( 3%) wall   16022 kB ( 1%) ggc
 thread pro- & epilogue:   0.21 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall    3221 kB ( 0%) ggc
 if-conversion 2       :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    2270 kB ( 0%) ggc
 combine stack adjustments:   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 hard reg cprop        :   0.51 ( 0%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall     184 kB ( 0%) ggc
 machine dep reorg     :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall     179 kB ( 0%) ggc
 reorder blocks        :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    5860 kB ( 0%) ggc
 final                 :   1.05 ( 1%) usr   0.00 ( 0%) sys   0.94 ( 1%) wall    1285 kB ( 0%) ggc
 tree if-combine       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      25 kB ( 0%) ggc
 TOTAL                 : 133.52            13.64           149.03            1669511 kB

so we spend 30% in "loop analysis" (all RTL loop opt passes) which is
a lot better than 4.4 but still too much for -O1.
Comment 4 Richard Biener 2009-11-27 12:13:09 UTC
At -O2 gcc 4.4 uses about 2.7GB ram and shows:

Execution times (seconds)
 df reaching defs      :  30.13 ( 6%) usr   7.70 (73%) sys  39.01 ( 8%) wall       0 kB ( 0%) ggc
 df live regs          :  14.44 ( 3%) usr   0.01 ( 0%) sys  14.51 ( 3%) wall       0 kB ( 0%) ggc
 df live&initialized regs:  22.65 ( 5%) usr   0.00 ( 0%) sys  22.45 ( 5%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   4.86 ( 1%) usr   0.02 ( 0%) sys   4.67 ( 1%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :  23.30 ( 5%) usr   0.01 ( 0%) sys  23.00 ( 5%) wall   15627 kB ( 1%) ggc
 tree operand scan     :  17.51 ( 4%) usr   0.31 ( 3%) sys  17.91 ( 4%) wall  126792 kB ( 5%) ggc
 tree PRE              : 107.30 (23%) usr   0.43 ( 4%) sys 107.77 (22%) wall   53805 kB ( 2%) ggc
 loop analysis         : 130.15 (28%) usr   0.01 ( 0%) sys 130.09 (27%) wall   10542 kB ( 0%) ggc
 TOTAL                 : 469.03            10.51           480.58            2378268 kB

There is a bug about the PRE slowness already, PR36439.

trunk uses about 3GB ram and shows:

 ipa SRA               :   6.41 ( 4%) usr   0.00 ( 0%) sys   6.44 ( 4%) wall   11215 kB ( 1%) ggc
 df multiple defs      :   8.56 ( 5%) usr   0.33 (13%) sys   8.84 ( 5%) wall       0 kB ( 0%) ggc
 df reaching defs      :   7.07 ( 4%) usr   0.16 ( 6%) sys   7.13 ( 4%) wall       0 kB ( 0%) ggc
 df live regs          :   9.98 ( 6%) usr   0.00 ( 0%) sys   9.77 ( 6%) wall       0 kB ( 0%) ggc
 df live&initialized regs:  14.85 ( 9%) usr   0.01 ( 0%) sys  15.15 ( 9%) wall       0 kB ( 0%) ggc
 tree VRP              :   3.04 ( 2%) usr   0.09 ( 4%) sys   3.31 ( 2%) wall  123387 kB ( 6%) ggc
 tree SSA incremental  :   3.18 ( 2%) usr   0.02 ( 1%) sys   3.08 ( 2%) wall   12660 kB ( 1%) ggc
 tree operand scan     :   1.08 ( 1%) usr   0.25 (10%) sys   1.41 ( 1%) wall  113932 kB ( 6%) ggc
 tree PRE              :   4.12 ( 2%) usr   0.02 ( 1%) sys   4.32 ( 3%) wall   36286 kB ( 2%) ggc
 expand                :   7.74 ( 5%) usr   0.04 ( 2%) sys   7.26 ( 4%) wall  194890 kB (10%) ggc
 loop analysis         :  35.87 (21%) usr   0.02 ( 1%) sys  35.97 (21%) wall    7798 kB ( 0%) ggc
 integrated RA         :   7.26 ( 4%) usr   0.28 (11%) sys   7.81 ( 5%) wall   33879 kB ( 2%) ggc
 reload                :   6.23 ( 4%) usr   0.01 ( 0%) sys   6.24 ( 4%) wall   35384 kB ( 2%) ggc
 reload CSE regs       :   3.41 ( 2%) usr   0.00 ( 0%) sys   3.65 ( 2%) wall   30591 kB ( 2%) ggc
 TOTAL                 : 169.84             2.56           172.43            1990005 kB

which is all reasonable again apart from "loop analysis".
Comment 5 Richard Biener 2009-11-27 12:38:35 UTC
With -O1 -fno-move-loop-invariants we get

 TOTAL                 :  83.46             2.17            85.63            1650809 kB
Comment 6 Richard Biener 2010-01-03 16:52:03 UTC
Maybe Zdenek has an idea why RTL LIM is so slow.
Comment 7 Richard Biener 2010-01-03 17:03:07 UTC
Well, obviously it is because

template<typename T>
gmic& gmic::parse(...)
{
...
    while (position<command_line.size() && !is_quit) {

loop body with 4000 lines of code (well, including lots of lines
with a few thousand chars, control flow, loops and function calls)

    }
...
}

and DF never was good scaling to this kind of code.  Maybe not considering
this outermost loop in LIM will fix the slowness...  the limit is currently
10000 basic-blocks, maybe a little large when considering non-innermost
loops.
Comment 8 rakdver@kam.mff.cuni.cz 2010-01-03 19:37:01 UTC
Subject: Re:  Slow compile and much memory  use
	at -O1

> ------- Comment #7 from rguenth at gcc dot gnu dot org  2010-01-03 17:03 -------
> Well, obviously it is because
> 
> template<typename T>
> gmic& gmic::parse(...)
> {
> ...
>     while (position<command_line.size() && !is_quit) {
> 
> loop body with 4000 lines of code (well, including lots of lines
> with a few thousand chars, control flow, loops and function calls)
> 
>     }
> ...
> }
> 
> and DF never was good scaling to this kind of code.  Maybe not considering
> this outermost loop in LIM will fix the slowness...  the limit is currently
> 10000 basic-blocks, maybe a little large when considering non-innermost
> loops.

10000 bbs seems way too large -- even for innermost loops.
Comment 9 Richard Biener 2014-01-15 15:21:57 UTC
GCC 4.9 will use about 800MB at -O1 and

 combiner                :   2.92 ( 4%) usr   0.02 ( 1%) sys   2.77 ( 4%) wall   56720 kB ( 4%) ggc
 integrated RA           :   4.87 ( 7%) usr   0.06 ( 3%) sys   4.87 ( 7%) wall  138947 kB ( 9%) ggc
 LRA non-specific        :   2.45 ( 4%) usr   0.03 ( 1%) sys   2.49 ( 4%) wall   22036 kB ( 1%) ggc
 TOTAL                 :  66.46 

at -O2 memory usage stays the same but


 df reaching defs        :   7.16 ( 7%) usr   0.01 ( 0%) sys   6.93 ( 7%) wall       0 kB ( 0%) ggc
 df live regs            :   6.87 ( 7%) usr   0.04 ( 1%) sys   6.86 ( 7%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   4.05 ( 4%) usr   0.00 ( 0%) sys   4.19 ( 4%) wall       0 kB ( 0%) ggc
 combiner                :   3.17 ( 3%) usr   0.04 ( 1%) sys   3.10 ( 3%) wall   61821 kB ( 3%) ggc
 integrated RA           :   5.77 ( 6%) usr   0.06 ( 2%) sys   5.95 ( 6%) wall  139550 kB ( 7%) ggc
 LRA non-specific        :   2.49 ( 3%) usr   0.02 ( 1%) sys   2.48 ( 2%) wall   21955 kB ( 1%) ggc
 TOTAL                 :  97.65             2.67           100.30            1881806 kB

I'd say FIXED.  Yay.