This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
- From: "hubicka at ucw dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 22 Jul 2006 19:30:44 -0000
- Subject: [Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
- References: <bug-28071-12846@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #14 from hubicka at ucw dot cz 2006-07-22 19:30 -------
Subject: Re: [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space
Hi,
with the attached patch I can cure the regmove quadratic behaviour and
the time report is not so unresonable now:
gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr
sub subl subr mul mull mulr divl ipow fi
Analyzing compilation unitPerforming intraprocedural optimizations
Assembling functions:
max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC
126177k -> 85112k} {GC 327625k -> 39474k}
Execution times (seconds)
garbage collection : 0.83 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.16 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall
1147 kB ( 0%) ggc
callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
533 kB ( 0%) ggc
ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall
0 kB ( 0%) ggc
life analysis : 21.38 ( 3%) usr 0.02 ( 1%) sys 21.39 ( 3%) wall
1120 kB ( 0%) ggc
life info update : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.87 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall
4266 kB ( 1%) ggc
register scan : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall
150 kB ( 0%) ggc
rebuild jump labels : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
preprocessing : 0.27 ( 0%) usr 0.06 ( 2%) sys 0.36 ( 0%) wall
471 kB ( 0%) ggc
lexical analysis : 0.04 ( 0%) usr 0.05 ( 2%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
parser : 0.12 ( 0%) usr 0.03 ( 1%) sys 0.17 ( 0%) wall
3207 kB ( 1%) ggc
inline heuristics : 15.14 ( 2%) usr 0.01 ( 0%) sys 15.26 ( 2%) wall
1486 kB ( 0%) ggc
integration : 21.35 ( 3%) usr 0.12 ( 4%) sys 21.71 ( 3%) wall
33445 kB ( 8%) ggc
tree gimplify : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall
3341 kB ( 1%) ggc
tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1338 kB ( 0%) ggc
tree CFG cleanup : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
20 kB ( 0%) ggc
tree VRP : 0.38 ( 0%) usr 0.01 ( 0%) sys 0.42 ( 0%) wall
11 kB ( 0%) ggc
tree copy propagation : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall
222 kB ( 0%) ggc
tree store copy prop : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall
4 kB ( 0%) ggc
tree find ref. vars : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall
8137 kB ( 2%) ggc
tree PTA : 1.29 ( 0%) usr 0.04 ( 1%) sys 1.36 ( 0%) wall
57 kB ( 0%) ggc
tree alias analysis : 1.89 ( 0%) usr 0.20 ( 7%) sys 2.10 ( 0%) wall
0 kB ( 0%) ggc
tree PHI insertion : 1.68 ( 0%) usr 0.01 ( 0%) sys 1.70 ( 0%) wall
18 kB ( 0%) ggc
tree SSA rewrite : 0.62 ( 0%) usr 0.04 ( 1%) sys 0.65 ( 0%) wall
17084 kB ( 4%) ggc
tree SSA other : 0.48 ( 0%) usr 0.08 ( 3%) sys 0.56 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 1.20 ( 0%) usr 0.00 ( 0%) sys 1.24 ( 0%) wall
0 kB ( 0%) ggc
tree operand scan : 1.48 ( 0%) usr 0.34 (11%) sys 1.93 ( 0%) wall
15634 kB ( 4%) ggc
dominator optimization: 1.05 ( 0%) usr 0.05 ( 2%) sys 1.05 ( 0%) wall
2698 kB ( 1%) ggc
tree SRA : 1.05 ( 0%) usr 0.09 ( 3%) sys 1.15 ( 0%) wall
24835 kB ( 6%) ggc
tree STORE-CCP : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall
4 kB ( 0%) ggc
tree CCP : 0.51 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall
154 kB ( 0%) ggc
tree reassociation : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
tree PRE : 296.46 (45%) usr 0.49 (16%) sys 298.81 (45%) wall
19481 kB ( 5%) ggc
tree FRE : 0.96 ( 0%) usr 0.05 ( 2%) sys 1.00 ( 0%) wall
7991 kB ( 2%) ggc
tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall
0 kB ( 0%) ggc
tree aggressive DCE : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
8 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 27.19 ( 4%) usr 0.01 ( 0%) sys 27.33 ( 4%) wall
22 kB ( 0%) ggc
tree rename SSA copies: 0.15 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
expand : 2.96 ( 0%) usr 0.09 ( 3%) sys 3.05 ( 0%) wall
24095 kB ( 6%) ggc
jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
CSE : 1.87 ( 0%) usr 0.00 ( 0%) sys 1.88 ( 0%) wall
118 kB ( 0%) ggc
global CSE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
CPROP 1 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
1620 kB ( 0%) ggc
PRE : 21.36 ( 3%) usr 0.01 ( 0%) sys 21.41 ( 3%) wall
200 kB ( 0%) ggc
CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
390 kB ( 0%) ggc
bypass jumps : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
389 kB ( 0%) ggc
CSE 2 : 1.05 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall
72 kB ( 0%) ggc
branch prediction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1 kB ( 0%) ggc
flow analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
combiner : 0.87 ( 0%) usr 0.01 ( 0%) sys 0.88 ( 0%) wall
1745 kB ( 0%) ggc
if-conversion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
3 kB ( 0%) ggc
regmove : 21.69 ( 3%) usr 0.02 ( 1%) sys 21.78 ( 3%) wall
2 kB ( 0%) ggc
local alloc : 7.60 ( 1%) usr 0.00 ( 0%) sys 7.62 ( 1%) wall
1480 kB ( 0%) ggc
global alloc : 16.47 ( 2%) usr 0.35 (12%) sys 16.91 ( 3%) wall
16915 kB ( 4%) ggc
reload CSE regs : 107.52 (16%) usr 0.15 ( 5%) sys 108.55 (16%) wall
4783 kB ( 1%) ggc
flow 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
225 kB ( 0%) ggc
peephole 2 : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
0 kB ( 0%) ggc
rename registers : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 75.09 (11%) usr 0.53 (18%) sys 76.86 (12%) wall
206227 kB (51%) ggc
machine dep reorg : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
15 kB ( 0%) ggc
reg stack : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
37 kB ( 0%) ggc
final : 0.66 ( 0%) usr 0.02 ( 1%) sys 0.74 ( 0%) wall
1156 kB ( 0%) ggc
TOTAL : 659.57 2.99 668.06
407297 kB
PRE is somewhat slow, but I will leave this to Danny.
For scheduling the situation is quite clear - we have huge basic blocks
and produce huge amount of dependencies. For reload, I am also not
really surprised since the code produces is regalloc nightmare and
reload manages to create very huge bitmaps that results in quadratic
behaviour.
Since Danny asked for allocpools:
Alloc-pool Kind Pools Allocated Peak Leak
-------------------------------------------------------------
Value sets 18 2230608 1929200 0
Bitmap sets 18 9504 8432 0
Value set nodes 18 2032208 1768488 0
Binary tree nodes 18 1291320 783992 0
value 48 3875872 1246744 0
et_occ pool 127 238144 48040 0
et_node pool 127 159680 36024 0
Reference tree nodes 18 1430880 1437864 0
Expression tree nodes 18 426240 428840 0
elt_list 48 3639816 397672 0
List tree nodes 18 511488 516880 0
elt_loc_list 48 14186784 975240 0
Comparison tree nodes 18 4520 4832 0
original_copy 26 48 88 0
Constraint pool 108 4335432 1501136 0
Unary tree nodes 18 96 968 0
Variable info pool 108 12261704 4550848 0
Constraint edges 108 2112 496 0
operand entry pool 36 512 248 0
cselib_val_list 48 11627616 974144 0
-------------------------------------------------------------
Total 994 58264584
Memory consumption is now dominated by scheduler's dependency info:
ggc-common.c:193 (ggc_calloc) 6303224: 1.9%
5139976:12.3% 1863696: 8.8% 1073688:21.8% 530
gimplify.c:453 (create_tmp_var_raw) 7325032: 2.2% 0:
0.0% 889240: 4.2% 0: 0.0% 93344
genrtl.c:17 (gen_rtx_fmt_ee) 9819384: 2.9% 0:
0.0% 138900: 0.7% 0: 0.0% 829857
tree-dfa.c:186 (create_stmt_ann) 9970168: 2.9% 763932:
1.8% 3692: 0.0% 0: 0.0% 206496
tree-ssanames.c:147 (make_ssa_name) 9740544: 2.9% 0:
0.0% 2373936:11.2% 0: 0.0% 252385
bitmap.c:139 (bitmap_element_allocate) 18876340: 5.6% 0:
0.0% 0: 0.0% 0: 0.0% 674155
genrtl.c:32 (gen_rtx_fmt_ue) 193579104:57.2% 0:
0.0% 0: 0.0% 0: 0.0% 16131592
Total 338496482 41839722
21146495 4929007 22457179
I am now looking into -O3 compilation that creases at into-ssa by overly
large stack.
Honza
------- Comment #15 from hubicka at ucw dot cz 2006-07-22 19:30 -------
Created an attachment (id=11920)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11920&action=view)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071