Created attachment 19149 [details] testcase Testcase from gmic.
With the current 4.4 branch I see loop analysis : 116.95 (44%) usr 0.02 ( 0%) sys 117.11 (42%) wall 11269 kB ( 1%) ggc TOTAL : 266.16 8.09 277.29 1988801 kB (we seem to bin all rtl loop opt passes there, ugh) 4.5 runs out-of memory for me currently.
4.5 shows at -O1: Execution times (seconds) garbage collection : 1.66 ( 1%) usr 0.05 ( 0%) sys 1.73 ( 1%) wall 0 kB ( 0%) ggc callgraph construction: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 12135 kB ( 1%) ggc callgraph optimization: 0.77 ( 1%) usr 0.03 ( 0%) sys 0.83 ( 1%) wall 2655 kB ( 0%) ggc ipa reference : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall 338 kB ( 0%) ggc ipa free lang data : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 679 kB ( 0%) ggc cfg cleanup : 0.95 ( 1%) usr 0.00 ( 0%) sys 0.99 ( 1%) wall 308 kB ( 0%) ggc trivially dead code : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 0 kB ( 0%) ggc df multiple defs : 9.71 ( 7%) usr 0.34 ( 2%) sys 10.07 ( 7%) wall 0 kB ( 0%) ggc df reaching defs : 6.12 ( 5%) usr 0.01 ( 0%) sys 5.99 ( 4%) wall 0 kB ( 0%) ggc df live regs : 6.30 ( 5%) usr 0.00 ( 0%) sys 6.32 ( 4%) wall 0 kB ( 0%) ggc df live&initialized regs: 8.70 ( 7%) usr 0.00 ( 0%) sys 8.73 ( 6%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 1.56 ( 1%) usr 0.00 ( 0%) sys 1.56 ( 1%) wall 19924 kB ( 1%) ggc register information : 0.61 ( 0%) usr 0.00 ( 0%) sys 0.58 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.69 ( 1%) usr 0.00 ( 0%) sys 0.68 ( 0%) wall 36096 kB ( 2%) ggc alias stmt walking : 0.23 ( 0%) usr 0.11 ( 1%) sys 0.32 ( 0%) wall 0 kB ( 0%) ggc register scan : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 117 kB ( 0%) ggc rebuild jump labels : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.12 ( 0%) usr 0.12 ( 1%) sys 0.25 ( 0%) wall 2516 kB ( 0%) ggc parser : 2.13 ( 2%) usr 0.30 ( 2%) sys 2.57 ( 2%) wall 284375 kB (17%) ggc name lookup : 0.43 ( 0%) usr 0.27 ( 2%) sys 0.54 ( 0%) wall 26614 kB ( 2%) ggc inline heuristics : 0.65 ( 0%) usr 0.02 ( 0%) sys 0.67 ( 0%) wall 10162 kB ( 1%) ggc integration : 1.16 ( 1%) usr 0.19 ( 1%) sys 1.21 ( 1%) wall 250954 kB (15%) ggc tree gimplify : 0.58 ( 0%) usr 0.02 ( 0%) sys 0.58 ( 0%) wall 101453 kB ( 6%) ggc tree eh : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 7971 kB ( 0%) ggc tree CFG construction : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 29744 kB ( 2%) ggc tree CFG cleanup : 1.40 ( 1%) usr 0.01 ( 0%) sys 1.22 ( 1%) wall 6447 kB ( 0%) ggc tree copy propagation : 0.68 ( 1%) usr 0.00 ( 0%) sys 0.75 ( 1%) wall 2410 kB ( 0%) ggc tree find ref. vars : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 5799 kB ( 0%) ggc tree PTA : 0.51 ( 0%) usr 0.02 ( 0%) sys 0.58 ( 0%) wall 7695 kB ( 0%) ggc tree PHI insertion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 8341 kB ( 0%) ggc tree SSA rewrite : 0.52 ( 0%) usr 0.01 ( 0%) sys 0.58 ( 0%) wall 36397 kB ( 2%) ggc tree SSA other : 0.09 ( 0%) usr 0.03 ( 0%) sys 0.13 ( 0%) wall 352 kB ( 0%) ggc tree SSA incremental : 2.69 ( 2%) usr 0.04 ( 0%) sys 2.87 ( 2%) wall 14341 kB ( 1%) ggc tree operand scan : 0.54 ( 0%) usr 0.17 ( 1%) sys 0.85 ( 1%) wall 100704 kB ( 6%) ggc dominator optimization: 0.90 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall 23583 kB ( 1%) ggc tree SRA : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 417 kB ( 0%) ggc tree CCP : 0.82 ( 1%) usr 0.01 ( 0%) sys 0.94 ( 1%) wall 8250 kB ( 0%) ggc tree PHI const/copy prop: 0.08 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 492 kB ( 0%) ggc tree split crit edges : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall 19160 kB ( 1%) ggc tree reassociation : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 7811 kB ( 0%) ggc tree FRE : 1.22 ( 1%) usr 0.01 ( 0%) sys 1.29 ( 1%) wall 9250 kB ( 1%) ggc tree code sinking : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall 2506 kB ( 0%) ggc tree linearize phis : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 801 kB ( 0%) ggc tree forward propagate: 0.23 ( 0%) usr 0.01 ( 0%) sys 0.17 ( 0%) wall 7457 kB ( 0%) ggc tree phiprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 13 kB ( 0%) ggc tree conservative DCE : 0.32 ( 0%) usr 0.02 ( 0%) sys 0.38 ( 0%) wall 26 kB ( 0%) ggc tree aggressive DCE : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.24 ( 0%) wall 35 kB ( 0%) ggc tree DSE : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 440 kB ( 0%) ggc PHI merge : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 1351 kB ( 0%) ggc tree loop bounds : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 4813 kB ( 0%) ggc loop invariant motion : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall 57 kB ( 0%) ggc tree canonical iv : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 5261 kB ( 0%) ggc scev constant prop : 0.15 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 9369 kB ( 1%) ggc complete unrolling : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 3410 kB ( 0%) ggc tree iv optimization : 1.86 ( 1%) usr 0.01 ( 0%) sys 1.73 ( 1%) wall 145322 kB ( 9%) ggc tree loop init : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 4518 kB ( 0%) ggc tree copy headers : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 12790 kB ( 1%) ggc tree SSA uncprop : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc tree NRV optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 21 kB ( 0%) ggc tree rename SSA copies: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 0 kB ( 0%) ggc tree switch initialization conversion: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 3 kB ( 0%) ggc dominance frontiers : 0.58 ( 0%) usr 0.00 ( 0%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 1.13 ( 1%) usr 0.00 ( 0%) sys 1.06 ( 1%) wall 0 kB ( 0%) ggc expand : 6.74 ( 5%) usr 0.04 ( 0%) sys 7.00 ( 5%) wall 197896 kB (12%) ggc varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 340 kB ( 0%) ggc lower subreg : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc jump : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 4317 kB ( 0%) ggc forward prop : 1.25 ( 1%) usr 0.02 ( 0%) sys 1.20 ( 1%) wall 16469 kB ( 1%) ggc CSE : 0.82 ( 1%) usr 0.00 ( 0%) sys 0.74 ( 0%) wall 1324 kB ( 0%) ggc dead code elimination : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 0.83 ( 1%) usr 0.01 ( 0%) sys 0.80 ( 1%) wall 10643 kB ( 1%) ggc dead store elim2 : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall 15696 kB ( 1%) ggc loop analysis : 39.45 (30%) usr 0.00 ( 0%) sys 39.59 (27%) wall 8374 kB ( 1%) ggc branch prediction : 0.44 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 14907 kB ( 1%) ggc combiner : 2.66 ( 2%) usr 0.02 ( 0%) sys 2.90 ( 2%) wall 57748 kB ( 3%) ggc if-conversion : 1.04 ( 1%) usr 0.02 ( 0%) sys 1.05 ( 1%) wall 7066 kB ( 0%) ggc integrated RA : 7.11 ( 5%) usr 11.66 (85%) sys 20.52 (14%) wall 34152 kB ( 2%) ggc reload : 5.83 ( 4%) usr 0.02 ( 0%) sys 5.80 ( 4%) wall 34513 kB ( 2%) ggc reload CSE regs : 3.73 ( 3%) usr 0.01 ( 0%) sys 3.73 ( 3%) wall 16022 kB ( 1%) ggc thread pro- & epilogue: 0.21 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 3221 kB ( 0%) ggc if-conversion 2 : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 2270 kB ( 0%) ggc combine stack adjustments: 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc hard reg cprop : 0.51 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall 184 kB ( 0%) ggc machine dep reorg : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 179 kB ( 0%) ggc reorder blocks : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 5860 kB ( 0%) ggc final : 1.05 ( 1%) usr 0.00 ( 0%) sys 0.94 ( 1%) wall 1285 kB ( 0%) ggc tree if-combine : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 25 kB ( 0%) ggc TOTAL : 133.52 13.64 149.03 1669511 kB so we spend 30% in "loop analysis" (all RTL loop opt passes) which is a lot better than 4.4 but still too much for -O1.
At -O2 gcc 4.4 uses about 2.7GB ram and shows: Execution times (seconds) df reaching defs : 30.13 ( 6%) usr 7.70 (73%) sys 39.01 ( 8%) wall 0 kB ( 0%) ggc df live regs : 14.44 ( 3%) usr 0.01 ( 0%) sys 14.51 ( 3%) wall 0 kB ( 0%) ggc df live&initialized regs: 22.65 ( 5%) usr 0.00 ( 0%) sys 22.45 ( 5%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 4.86 ( 1%) usr 0.02 ( 0%) sys 4.67 ( 1%) wall 0 kB ( 0%) ggc tree SSA incremental : 23.30 ( 5%) usr 0.01 ( 0%) sys 23.00 ( 5%) wall 15627 kB ( 1%) ggc tree operand scan : 17.51 ( 4%) usr 0.31 ( 3%) sys 17.91 ( 4%) wall 126792 kB ( 5%) ggc tree PRE : 107.30 (23%) usr 0.43 ( 4%) sys 107.77 (22%) wall 53805 kB ( 2%) ggc loop analysis : 130.15 (28%) usr 0.01 ( 0%) sys 130.09 (27%) wall 10542 kB ( 0%) ggc TOTAL : 469.03 10.51 480.58 2378268 kB There is a bug about the PRE slowness already, PR36439. trunk uses about 3GB ram and shows: ipa SRA : 6.41 ( 4%) usr 0.00 ( 0%) sys 6.44 ( 4%) wall 11215 kB ( 1%) ggc df multiple defs : 8.56 ( 5%) usr 0.33 (13%) sys 8.84 ( 5%) wall 0 kB ( 0%) ggc df reaching defs : 7.07 ( 4%) usr 0.16 ( 6%) sys 7.13 ( 4%) wall 0 kB ( 0%) ggc df live regs : 9.98 ( 6%) usr 0.00 ( 0%) sys 9.77 ( 6%) wall 0 kB ( 0%) ggc df live&initialized regs: 14.85 ( 9%) usr 0.01 ( 0%) sys 15.15 ( 9%) wall 0 kB ( 0%) ggc tree VRP : 3.04 ( 2%) usr 0.09 ( 4%) sys 3.31 ( 2%) wall 123387 kB ( 6%) ggc tree SSA incremental : 3.18 ( 2%) usr 0.02 ( 1%) sys 3.08 ( 2%) wall 12660 kB ( 1%) ggc tree operand scan : 1.08 ( 1%) usr 0.25 (10%) sys 1.41 ( 1%) wall 113932 kB ( 6%) ggc tree PRE : 4.12 ( 2%) usr 0.02 ( 1%) sys 4.32 ( 3%) wall 36286 kB ( 2%) ggc expand : 7.74 ( 5%) usr 0.04 ( 2%) sys 7.26 ( 4%) wall 194890 kB (10%) ggc loop analysis : 35.87 (21%) usr 0.02 ( 1%) sys 35.97 (21%) wall 7798 kB ( 0%) ggc integrated RA : 7.26 ( 4%) usr 0.28 (11%) sys 7.81 ( 5%) wall 33879 kB ( 2%) ggc reload : 6.23 ( 4%) usr 0.01 ( 0%) sys 6.24 ( 4%) wall 35384 kB ( 2%) ggc reload CSE regs : 3.41 ( 2%) usr 0.00 ( 0%) sys 3.65 ( 2%) wall 30591 kB ( 2%) ggc TOTAL : 169.84 2.56 172.43 1990005 kB which is all reasonable again apart from "loop analysis".
With -O1 -fno-move-loop-invariants we get TOTAL : 83.46 2.17 85.63 1650809 kB
Maybe Zdenek has an idea why RTL LIM is so slow.
Well, obviously it is because template<typename T> gmic& gmic::parse(...) { ... while (position<command_line.size() && !is_quit) { loop body with 4000 lines of code (well, including lots of lines with a few thousand chars, control flow, loops and function calls) } ... } and DF never was good scaling to this kind of code. Maybe not considering this outermost loop in LIM will fix the slowness... the limit is currently 10000 basic-blocks, maybe a little large when considering non-innermost loops.
Subject: Re: Slow compile and much memory use at -O1 > ------- Comment #7 from rguenth at gcc dot gnu dot org 2010-01-03 17:03 ------- > Well, obviously it is because > > template<typename T> > gmic& gmic::parse(...) > { > ... > while (position<command_line.size() && !is_quit) { > > loop body with 4000 lines of code (well, including lots of lines > with a few thousand chars, control flow, loops and function calls) > > } > ... > } > > and DF never was good scaling to this kind of code. Maybe not considering > this outermost loop in LIM will fix the slowness... the limit is currently > 10000 basic-blocks, maybe a little large when considering non-innermost > loops. 10000 bbs seems way too large -- even for innermost loops.
GCC 4.9 will use about 800MB at -O1 and combiner : 2.92 ( 4%) usr 0.02 ( 1%) sys 2.77 ( 4%) wall 56720 kB ( 4%) ggc integrated RA : 4.87 ( 7%) usr 0.06 ( 3%) sys 4.87 ( 7%) wall 138947 kB ( 9%) ggc LRA non-specific : 2.45 ( 4%) usr 0.03 ( 1%) sys 2.49 ( 4%) wall 22036 kB ( 1%) ggc TOTAL : 66.46 at -O2 memory usage stays the same but df reaching defs : 7.16 ( 7%) usr 0.01 ( 0%) sys 6.93 ( 7%) wall 0 kB ( 0%) ggc df live regs : 6.87 ( 7%) usr 0.04 ( 1%) sys 6.86 ( 7%) wall 0 kB ( 0%) ggc df live&initialized regs: 4.05 ( 4%) usr 0.00 ( 0%) sys 4.19 ( 4%) wall 0 kB ( 0%) ggc combiner : 3.17 ( 3%) usr 0.04 ( 1%) sys 3.10 ( 3%) wall 61821 kB ( 3%) ggc integrated RA : 5.77 ( 6%) usr 0.06 ( 2%) sys 5.95 ( 6%) wall 139550 kB ( 7%) ggc LRA non-specific : 2.49 ( 3%) usr 0.02 ( 1%) sys 2.48 ( 2%) wall 21955 kB ( 1%) ggc TOTAL : 97.65 2.67 100.30 1881806 kB I'd say FIXED. Yay.