Using head from today to build a telecom stack on the ARM port. It spends nearly 100% of time in tree call clobbering for a particularly nasty testcase which I am attaching. Almost every time I do a Ctrl-C in gdb I hit myself in the function set_initial_properties with a similar backtrace. . #0 0x0847f0da in var_ann (t=0xb6f433c8) at ../../gnu/gcc/tree-flow-inline.h:204 #1 0x0847eee5 in mark_call_clobbered (var=0xb6f433c8, escape_type=256) at ../../gnu/gcc/tree-flow-inline.h:878 #2 0x08480ce6 in set_initial_properties (ai=0x8a8b280) at ../../gnu/gcc/tree-ssa-alias.c:565 #3 0x08481198 in compute_call_clobbered (ai=0x8a8b280) at ../../gnu/gcc/tree-ssa-alias.c:619 in the hunk if (TREE_CODE (alias) == STRUCT_FIELD_TAG) { subvar_t svars; svars = get_subvars_for_var (SFT_PARENT_VAR (alias)); for (; svars; svars = svars->next) if (!unmodifiable_var_p (alias)) mark_call_clobbered (svars->var, pi->escape_mask); } else if (!unmodifiable_var_p (alias)) mark_call_clobbered (alias, pi->escape_mask); I tried by reverting the patch from r127629 but that didn't help straight off. Doing a binary search I got to the versions r127831-37 : r127831 is the last revision of the tree that builds this testcase without any compile time regressions. r127836 is the first revision of the tree where this goes awry with a huge compile time regression . versions (r127832-35) do not build for undeclared references to forbidden_auto_inc_dec_classes in regclass.c. I haven't yet seen what's gone into these commits to make any informed decisions. This is the statistic with the garbage collection : 0.43 ( 0%) usr 0.03 ( 0%) sys 4.51 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 69 kB ( 0%) ggc callgraph optimization: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 51 kB ( 0%) ggc ipa type escape : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc cfg cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 2 kB ( 0%) ggc CFG verifier : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df live regs : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df live&initialized regs: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc df use-def / def-use chains: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc df reg dead/unused notes: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 19 kB ( 0%) ggc register information : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 88 kB ( 0%) ggc preprocessing : 0.13 ( 0%) usr 0.04 ( 0%) sys 0.19 ( 0%) wall 1636 kB ( 4%) ggc lexical analysis : 0.05 ( 0%) usr 0.10 ( 1%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc parser : 0.22 ( 0%) usr 0.09 ( 1%) sys 0.39 ( 0%) wall 7237 kB (19%) ggc inline heuristics : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 325 kB ( 1%) ggc integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 91 kB ( 0%) ggc tree CFG cleanup : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 3 kB ( 0%) ggc tree VRP : 0.04 ( 0%) usr 0.00 ( 0%) sys 1.11 ( 0%) wall 338 kB ( 1%) ggc tree copy propagation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 20 kB ( 0%) ggc tree PTA : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 123 kB ( 0%) ggc tree alias analysis : 0.53 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%) wall 23 kB ( 0%) ggc tree call clobbering :3604.47 (100%) usr 12.38 (95%) sys4166.01 (99%) wall 0 kB ( 0%) ggc tree flow sensitive alias: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 150 kB ( 0%) ggc tree flow insensitive alias: 0.54 ( 0%) usr 0.00 ( 0%) sys 0.73 ( 0%) wall 0 kB ( 0%) ggc tree memory partitioning: 1.91 ( 0%) usr 0.04 ( 0%) sys 3.32 ( 0%) wall 2 kB ( 0%) ggc tree SSA rewrite : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall 338 kB ( 1%) ggc tree SSA incremental : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall 17 kB ( 0%) ggc tree operand scan : 1.96 ( 0%) usr 0.04 ( 0%) sys 4.16 ( 0%) wall 507 kB ( 1%) ggc dominator optimization: 0.58 ( 0%) usr 0.00 ( 0%) sys 0.78 ( 0%) wall 234 kB ( 1%) ggc tree SRA : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 1 kB ( 0%) ggc tree CCP : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 17 kB ( 0%) ggc tree reassociation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 11 kB ( 0%) ggc tree PRE : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 192 kB ( 1%) ggc tree FRE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 113 kB ( 0%) ggc tree code sinking : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 5 kB ( 0%) ggc tree forward propagate: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 40 kB ( 0%) ggc tree conservative DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree loop bounds : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 4 kB ( 0%) ggc tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 38 kB ( 0%) ggc tree loop init : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 12 kB ( 0%) ggc tree SSA to normal : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall 31 kB ( 0%) ggc tree SSA verifier : 4.07 ( 0%) usr 0.27 ( 2%) sys 5.43 ( 0%) wall 99 kB ( 0%) ggc tree STMT verifier : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc expand : 0.96 ( 0%) usr 0.06 ( 0%) sys 9.88 ( 0%) wall 25260 kB (66%) ggc forward prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 32 kB ( 0%) ggc CSE : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.46 ( 0%) wall 18 kB ( 0%) ggc dead code elimination : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 17 kB ( 0%) ggc dead store elim2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 24 kB ( 0%) ggc loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 15 kB ( 0%) ggc CPROP 1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 15 kB ( 0%) ggc PRE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 6 kB ( 0%) ggc CPROP 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 15 kB ( 0%) ggc bypass jumps : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 14 kB ( 0%) ggc auto inc dec : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 7 kB ( 0%) ggc CSE 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 8 kB ( 0%) ggc combiner : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall 37 kB ( 0%) ggc regmove : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 2 kB ( 0%) ggc scheduling : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 39 kB ( 0%) ggc local alloc : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 98 kB ( 0%) ggc global alloc : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 35 kB ( 0%) ggc reload CSE regs : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 52 kB ( 0%) ggc thread pro- & epilogue: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 34 kB ( 0%) ggc peephole 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 7 kB ( 0%) ggc machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 4 kB ( 0%) ggc final : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.74 ( 0%) wall 0 kB ( 0%) ggc symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc TOTAL :3617.35 13.09 4205.33 38504 kB
Created attachment 14229 [details] testcase.
At least the unmodifiable_var_p (alias) call looks suspicious (but probably unrelated to the problem): Index: tree-ssa-alias.c =================================================================== --- tree-ssa-alias.c (revision 128618) +++ tree-ssa-alias.c (working copy) @@ -561,7 +561,7 @@ set_initial_properties (struct alias_inf subvar_t svars; svars = get_subvars_for_var (SFT_PARENT_VAR (alias)); for (; svars; svars = svars->next) - if (!unmodifiable_var_p (alias)) + if (!unmodifiable_var_p (svars->var)) mark_call_clobbered (svars->var, pi->escape_mask); } else if (!unmodifiable_var_p (alias)) what we have is that SFT_PARENT_VAR is of type 'union Meep' which has quite a lot of fields and the above loop is quadratic in the number of fields.
(gdb) call bitmap_count_bits (pi->pt_vars) $23 = 10307 that is, we have all SFTs in pi->pt_vars and as we process them we loop over all of them again.
Created attachment 14230 [details] patch fixing the problem This fixes it. The idea is to keep track of which parent vars we need to add all subvars to the call clobbered list in a bitmap and process them after the first walk.
4.2 doesn't have this extra loop.
(In reply to comment #4) > Created an attachment (id=14230) [edit] > patch fixing the problem > > This fixes it. The idea is to keep track of which parent vars we need to add > all subvars to the call clobbered list in a bitmap and process them after the > first walk. > Yep it does - Thanks for the quick fix. I am testing it now and will let you know in a bit . (In reply to comment #5) > 4.2 doesn't have this extra loop. >
Subject: Re: [4.3 Regression] tree struct aliasing goes into a loop marking call clobbers. On 20 Sep 2007 13:52:11 -0000, ramana dot radhakrishnan at celunite dot com <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #6 from ramana dot radhakrishnan at celunite dot com 2007-09-20 13:52 ------- > (In reply to comment #4) > > Created an attachment (id=14230) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14230&action=view) [edit] > > patch fixing the problem > > > > This fixes it. The idea is to keep track of which parent vars we need to add > > all subvars to the call clobbered list in a bitmap and process them after the > > first walk. I don't have enough context handy in the patch to see if you moved it outside the transitive loop, or just the initial loop. You can move it outside the initial loop, but not the transitive loop, because the SFT's may themselves be pointers (obviously for non-pointer SFT's, you can certainly ignore them). You could also speed this up dramatically by not queuing non-pointers.
Subject: Re: [4.3 Regression] tree struct aliasing goes into a loop marking call clobbers. On Thu, 20 Sep 2007, dberlin at dberlin dot org wrote: > ------- Comment #7 from dberlin at gcc dot gnu dot org 2007-09-20 15:12 ------- > Subject: Re: [4.3 Regression] tree struct aliasing goes into a loop marking > call clobbers. > > On 20 Sep 2007 13:52:11 -0000, ramana dot radhakrishnan at celunite > dot com <gcc-bugzilla@gcc.gnu.org> wrote: > > > > > > ------- Comment #6 from ramana dot radhakrishnan at celunite dot com 2007-09-20 13:52 ------- > > (In reply to comment #4) > > > Created an attachment (id=14230) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14230&action=view) > > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14230&action=view) [edit] > > > patch fixing the problem > > > > > > This fixes it. The idea is to keep track of which parent vars we need to add > > > all subvars to the call clobbered list in a bitmap and process them after the > > > first walk. > > > I don't have enough context handy in the patch to see if you moved it > outside the transitive loop, or just the initial loop. > You can move it outside the initial loop, but not the transitive loop, > because the SFT's may themselves be pointers (obviously for > non-pointer SFT's, you can certainly ignore them). I only moved it outside of the innermost EXECUTE_IF_SET_IN_BITMAP loop on aliases, which should be safe. I don't completely understand what you mean with non queuing non-pointers - I suppose queuing non-pointers in the worklist in mark_aliases_call_clobbered? Richard.
Subject: Bug 33508 Author: rguenth Date: Fri Sep 21 09:36:52 2007 New Revision: 128645 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128645 Log: 2007-09-21 Richard Guenther <rguenther@suse.de> PR tree-optimization/33508 * tree-ssa-alias.c (mark_aliases_call_clobbered): Avoid quadratic loop by keeping a bitmap of variables we have to clobber all subvariables for. (set_initial_properties): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-alias.c
Fixed.