Bug 19097 - [4.3/4.4/4.5 regression] Quadratic behavior with many sets for the same register in VRP
Summary: [4.3/4.4/4.5 regression] Quadratic behavior with many sets for the same regis...
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: 4.3.5
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on: 21430
Blocks:
  Show dependency treegraph
 
Reported: 2004-12-21 01:56 UTC by James A. Morrison
Modified: 2009-12-24 13:02 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.3.4 4.0.0
Known to fail: 4.1.0 4.2.0
Last reconfirmed: 2007-09-17 13:11:02


Attachments
patch for operand scan (609 bytes, patch)
2005-10-18 12:25 UTC, Andrew Macleod
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James A. Morrison 2004-12-21 01:56:54 UTC
The following bit of code doesn't compile on my laptop (at -O2) because gcc gets
killed. This happens with gcc 3.3.5 and gcc 3.4.3 and gcc 4.0

#define CL0(a) else if (b == a) { f(); }
#define CL1(a) CL0(a##0) CL0(a##1) CL0(a##2) CL0(a##3) CL0(a##4) CL0(a##5) \
 CL0(a##6) CL0(a##7) CL0(a##8) CL0(a##9)
#define CL2(a) CL1(a##0) CL1(a##1) CL1(a##2) CL1(a##3) CL1(a##4) CL1(a##5) \
 CL1(a##6) CL1(a##7) CL1(a##8) CL1(a##9)
#define CL3(a) CL2(a##0) CL2(a##1) CL2(a##2) CL2(a##3) CL2(a##4) CL2(a##5) \
 CL2(a##6) CL2(a##7) CL2(a##8) CL2(a##9)
#define CL4(a) CL3(a##0) CL3(a##1) CL3(a##2) CL3(a##3) CL3(a##4) CL3(a##5) \
 CL3(a##6) CL3(a##7) CL3(a##8) CL3(a##9)
#define CL5(a) CL4(a##0) CL4(a##1) CL4(a##2) CL4(a##3) CL4(a##4) CL4(a##5) \
 CL4(a##6) CL4(a##7) CL4(a##8) CL4(a##9)

void f();

void a() {
  int b;
  if (b == 1) { f(); }
  CL4(1)
}

 Changing if (b == 1) to if (b) fixes the large compile time on gcc 4.0, but
keeps the large compile time problem with gcc 3.3 and 3.4.
Comment 1 Andrew Pinski 2004-12-21 02:39:43 UTC
I think DOM is "fixing" the compile-time/memory-hog on the mainline :).
Comment 2 Steven Bosscher 2004-12-21 08:38:14 UTC
This is what I get (with checking enabled) for an C5 with two C4s with 
"GNU C version 4.0.0 20041220 (experimental) (x86_64-unknown-linux-gnu)": 
 
$ ./cc1 t.c -O2 
 a 
 {GC 14385k -> 11467k} 
Analyzing compilation unit 
Performing intraprocedural optimizations 
Assembling functions: 
 a 
 {GC 68010k -> 54166k} {GC 74855k -> 56923k} {GC 91181k -> 71784k} {GC 99265k 
-> 71726k} 
Execution times (seconds) 
 garbage collection    :   0.77 ( 0%) usr   0.01 ( 0%) sys   0.77 ( 0%) wall 
 callgraph construction:   3.03 ( 2%) usr   0.06 ( 2%) sys   3.09 ( 2%) wall 
 cfg construction      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
 cfg cleanup           :   1.55 ( 1%) usr   0.00 ( 0%) sys   1.52 ( 1%) wall 
 CFG verifier          :   3.63 ( 2%) usr   0.04 ( 2%) sys   3.65 ( 2%) wall 
 trivially dead code   :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall 
 life analysis         :   0.84 ( 0%) usr   0.00 ( 0%) sys   0.84 ( 0%) wall 
 life info update      :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) wall 
 alias analysis        :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall 
 register scan         :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall 
 rebuild jump labels   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
 preprocessing         :   0.17 ( 0%) usr   0.07 ( 3%) sys   0.34 ( 0%) wall 
 lexical analysis      :   0.09 ( 0%) usr   0.19 ( 8%) sys   0.26 ( 0%) wall 
 parser                :   0.35 ( 0%) usr   0.10 ( 4%) sys   0.44 ( 0%) wall 
 tree gimplify         :   0.10 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall 
 tree eh               :   0.05 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall 
 tree CFG construction :   0.17 ( 0%) usr   0.03 ( 1%) sys   0.20 ( 0%) wall 
 tree CFG cleanup      :   0.85 ( 0%) usr   0.00 ( 0%) sys   0.89 ( 1%) wall 
 tree find referenced vars:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
wall 
 tree PTA              :   0.19 ( 0%) usr   0.01 ( 0%) sys   0.19 ( 0%) wall 
 tree alias analysis   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall 
 tree PHI insertion    :   0.08 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall 
 tree SSA rewrite      :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.37 ( 0%) wall 
 tree SSA other        :   0.58 ( 0%) usr   0.17 ( 7%) sys   0.65 ( 0%) wall 
 tree operand scan     :   0.19 ( 0%) usr   0.10 ( 4%) sys   0.38 ( 0%) wall 
 dominator optimization:   2.12 ( 1%) usr   0.03 ( 1%) sys   2.16 ( 1%) wall 
 tree CCP              :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall 
 tree split crit edges :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
 tree PRE              :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall 
 tree remove redundant PHIs:   0.23 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) 
wall 
 tree linearize phis   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
 tree forward propagate:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
 tree conservative DCE :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall 
 tree aggressive DCE   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall 
 tree DSE              :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall 
 tree loop init        :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall 
 tree copy headers     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
 tree SSA to normal    :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall 
 tree rename SSA copies:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall 
 tree SSA verifier     :   1.99 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) wall 
 tree STMT verifier    :   0.43 ( 0%) usr   0.02 ( 1%) sys   0.44 ( 0%) wall 
 callgraph verifier    :   8.76 ( 5%) usr   0.05 ( 2%) sys   8.81 ( 5%) wall 
 dominance frontiers   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
 control dependences   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall 
 expand                :   1.09 ( 1%) usr   0.03 ( 1%) sys   1.20 ( 1%) wall 
 jump                  :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall 
 CSE                   :   9.12 ( 5%) usr   0.00 ( 0%) sys   9.12 ( 5%) wall 
 loop analysis         :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall 
 global CSE            :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall 
 CPROP 1               :  55.78 (33%) usr   0.57 (23%) sys  56.38 (32%) wall 
 PRE                   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall 
 CPROP 2               :  28.46 (17%) usr   0.41 (17%) sys  28.91 (17%) wall 
 bypass jumps          :  28.96 (17%) usr   0.40 (16%) sys  29.39 (17%) wall 
 CSE 2                 :   8.25 ( 5%) usr   0.00 ( 0%) sys   8.25 ( 5%) wall 
 branch prediction     :   0.95 ( 1%) usr   0.01 ( 0%) sys   0.95 ( 1%) wall 
 flow analysis         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
 combiner              :   0.36 ( 0%) usr   0.01 ( 0%) sys   0.38 ( 0%) wall 
 if-conversion         :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.48 ( 0%) wall 
 regmove               :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall 
 local alloc           :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall 
 global alloc          :   0.55 ( 0%) usr   0.02 ( 1%) sys   0.58 ( 0%) wall 
 reload CSE regs       :   3.45 ( 2%) usr   0.00 ( 0%) sys   3.46 ( 2%) wall 
 flow 2                :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall 
 if-conversion 2       :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall 
 peephole 2            :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall 
 rename registers      :   0.93 ( 1%) usr   0.02 ( 1%) sys   0.98 ( 1%) wall 
 scheduling 2          :   1.07 ( 1%) usr   0.02 ( 1%) sys   1.13 ( 1%) wall 
 machine dep reorg     :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall 
 reorder blocks        :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall 
 shorten branches      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall 
 final                 :   0.18 ( 0%) usr   0.01 ( 0%) sys   0.19 ( 0%) wall 
 rest of compilation   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall 
 TOTAL                 : 171.18             2.43           174.57 
 
 
My bet: known non-linear behavior in compute_transp.  I'm curous why gcse's 
"is_too_expensive ()" thinks GCSE is _not_ too expensive for this test case. 
 
Comment 3 Steven Bosscher 2004-12-21 08:47:28 UTC
Hmm no, it's not compute_transp: 
 
CPU: P4 / Xeon with 2 hyper-threads, speed 3194.18 MHz (estimated) 
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 100000 
samples  %        symbol name 
343490   22.3269  exp_equiv_p 
298238   19.3856  next_set 
63817     4.1481  find_avail_set 
62787     4.0812  insert_set_in_table 
55981     3.6388  cgraph_edge 
35531     2.3095  cse_insn 
18770     1.2201  cgraph_create_edge 
17784     1.1560  for_each_rtx 
15956     1.0371  fold_rtx 
15164     0.9857  expr_equiv_p 
15081     0.9803  hash_rtx 
12551     0.8158  validate_value_data 
12380     0.8047  get_cse_reg_info 
 
Comment 4 Steven Bosscher 2004-12-23 01:43:36 UTC
This looks like a problem with the hash function for a REG when we 
have many implicit sets: 
 
Found 6001 implicit sets 
SET hash table (6001 buckets, 6001 entries) 
Index 0 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 1 [0x1])) 
Index 1 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10000 [0x2710])) 
Index 2 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10001 [0x2711])) 
Index 3 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10002 [0x2712])) 
Index 4 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10003 [0x2713])) 
Index 5 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10004 [0x2714])) 
Index 6 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10005 [0x2715])) 
Index 7 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10006 [0x2716])) 
Index 8 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10007 [0x2717])) 
Index 9 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10008 [0x2718])) 
Index 10 (hash value 58) 
  (set (reg/v:SI 58 [ b ]) 
    (const_int 10009 [0x2719])) 
(etc.) 
 
Needless to say, this results in truely dramatically bad compile 
time behavior of the hash table. 
 
Comment 5 Steven Bosscher 2004-12-24 01:00:41 UTC
We did not have implicit sets in 3.3, so it doesn't show this behavior. 
On the other hand, someone could feed RTL to 3.3 where a single register 
is set many times, and I'm sure it will show the same behavior.  But in 
this case, with implicit sets, we could have it for normal code too, for 
example an implicit set from a switch condition.  So I'm marking this as a 
regression from the last release without implicit sets. 
 
The memory issue is different, I'm not sure if/where/why we allocate too 
much.  It's probably just bad cache behavior that makes this a "memory-hog": 
walk a very long linked list that doesn't fit in RAM, well, that sends any 
machine to swapping hell.  I'm not seeing memory consumption grow in a non 
linear way, so I'm removing that keyword. 
 
 
Comment 6 Paolo Bonzini 2004-12-28 18:48:44 UTC
Interestingly, on arm-elf the hog is CSE, because each copy of b is given a
different pseudo.
Comment 7 Steven Bosscher 2005-03-05 23:37:04 UTC
I don't think this will be fixed for 4.1, unless we kick out implicit 
sets from gcse, or all of gcse.  The former may be possible if our 
const/copy prop at the tree level is good enough, but I wouldn't count 
on it. 
Comment 8 James A. Morrison 2005-06-08 03:32:06 UTC
 The problem with the hash table seems to be fixed in gcc 4.1, but not gcc 3.4
or 4.0.  In gcc 4.1 hash_rtx is used for the implicit sets instead of the really
dumb hash_set.
Comment 9 Steven Bosscher 2005-06-08 09:38:40 UTC
Can you try and figure out which patch changed this for GCC 4.1?  Bonus 
points if you can see if backporting that patch gives GCC 4.0 a speed-up, 
because in that case you may have something to go to Mark with for the 
next GCC 4.0.x (x>1) release ;-) 
Comment 10 James A. Morrison 2005-06-08 12:43:56 UTC
 Ok, I seem to be wrong, hash_set still seems to be used for implicit sets. 
However, the destination registers in gcc 4.1 are all different:
SET hash table (1251 buckets, 1001 entries)
Index 0 (hash value 339)
  (set (reg/v:SI 339 [ b ])
    (const_int 1 [0x1]))
Index 1 (hash value 342)
  (set (reg:SI 342)
    (const_int 1000 [0x3e8]))
Index 2 (hash value 344)
  (set (reg:SI 344)
    (const_int 1001 [0x3e9]))
Index 3 (hash value 346)
  (set (reg:SI 346)
    (const_int 1002 [0x3ea]))
Index 4 (hash value 348)
  (set (reg:SI 348)
    (const_int 1003 [0x3eb]))
Index 5 (hash value 350)
  (set (reg:SI 350)
    (const_int 1004 [0x3ec]))
Index 6 (hash value 352)
  (set (reg:SI 352)
    (const_int 1005 [0x3ed]))
Index 7 (hash value 354)
...
Index 999 (hash value 1087)
  (set (reg:SI 2338)
    (const_int 1998 [0x7ce]))
Index 1000 (hash value 1089)
  (set (reg:SI 2340)
    (const_int 1999 [0x7cf]))

 And the time report for -O2 using CL4 on ia64-linux
Execution times (seconds)
 garbage collection    :   1.98 ( 1%) usr   0.00 ( 0%) sys   1.98 ( 1%) wall   
   0 kB ( 0%) ggc
 dump files            :   1.03 ( 0%) usr   0.06 ( 4%) sys   1.08 ( 0%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   4.20 ( 1%) usr   0.01 ( 1%) sys   4.21 ( 1%) wall  
12188 kB ( 7%) ggc
 cfg construction      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
2131 kB ( 1%) ggc
 cfg cleanup           :   1.56 ( 0%) usr   0.00 ( 0%) sys   1.56 ( 0%) wall   
 568 kB ( 0%) ggc
 CFG verifier          :   7.19 ( 2%) usr   0.02 ( 1%) sys   7.21 ( 2%) wall   
   0 kB ( 0%) ggc
 trivially dead code   :   0.54 ( 0%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall   
   0 kB ( 0%) ggc
 life analysis         :   4.12 ( 1%) usr   0.00 ( 0%) sys   4.12 ( 1%) wall   
4688 kB ( 3%) ggc
 life info update      :   0.83 ( 0%) usr   0.00 ( 0%) sys   0.83 ( 0%) wall   
1250 kB ( 1%) ggc
 alias analysis        :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.02 ( 0%) wall   
4096 kB ( 2%) ggc
 register scan         :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall   
   0 kB ( 0%) ggc
 rebuild jump labels   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 preprocessing         :   0.11 ( 0%) usr   0.07 ( 4%) sys   0.20 ( 0%) wall   
 702 kB ( 0%) ggc
 lexical analysis      :   0.08 ( 0%) usr   0.14 ( 8%) sys   0.21 ( 0%) wall   
   0 kB ( 0%) ggc
 parser                :   0.22 ( 0%) usr   0.06 ( 4%) sys   0.28 ( 0%) wall   
5672 kB ( 3%) ggc
 tree gimplify         :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   
 625 kB ( 0%) ggc
 tree eh               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CFG construction :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall  
17418 kB (10%) ggc
 tree CFG cleanup      :   1.64 ( 0%) usr   0.00 ( 0%) sys   1.65 ( 0%) wall   
   0 kB ( 0%) ggc
 tree VRP              :  30.28 ( 9%) usr   0.02 ( 1%) sys  30.31 ( 9%) wall   
5568 kB ( 3%) ggc
 tree copy propagation :   0.60 ( 0%) usr   0.01 ( 1%) sys   0.61 ( 0%) wall   
   2 kB ( 0%) ggc
 tree store copy prop  :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 tree find ref. vars   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PTA              :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall   
   0 kB ( 0%) ggc
 tree alias analysis   :   0.29 ( 0%) usr   0.10 ( 6%) sys   0.42 ( 0%) wall   
   2 kB ( 0%) ggc
 tree PHI insertion    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA rewrite      :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA other        :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall   
   2 kB ( 0%) ggc
 tree SSA incremental  :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 tree operand scan     :  58.50 (18%) usr   0.17 (11%) sys  58.63 (17%) wall   
1354 kB ( 1%) ggc
 dominator optimization:   1.51 ( 0%) usr   0.00 ( 0%) sys   1.51 ( 0%) wall  
10313 kB ( 6%) ggc
 tree STORE-CCP        :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CCP              :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall   
   0 kB ( 0%) ggc
 tree split crit edges :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PRE              :   0.28 ( 0%) usr   0.01 ( 0%) sys   0.29 ( 0%) wall   
   0 kB ( 0%) ggc
 tree FRE              :   0.17 ( 0%) usr   0.02 ( 1%) sys   0.19 ( 0%) wall   
   0 kB ( 0%) ggc
 tree code sinking     :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall   
   0 kB ( 0%) ggc
 tree linearize phis   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc
 tree forward propagate:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 tree conservative DCE :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) wall   
   0 kB ( 0%) ggc
 tree aggressive DCE   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall   
   0 kB ( 0%) ggc
 tree DSE              :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall   
   0 kB ( 0%) ggc
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree loop init        :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
   0 kB ( 0%) ggc
 tree copy headers     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA uncprop      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA to normal    :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall   
   2 kB ( 0%) ggc
 tree rename SSA copies:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA verifier     :   2.57 ( 1%) usr   0.00 ( 0%) sys   2.57 ( 1%) wall   
   0 kB ( 0%) ggc
 tree STMT verifier    :   7.09 ( 2%) usr   0.07 ( 4%) sys   7.15 ( 2%) wall   
   0 kB ( 0%) ggc
 callgraph verifier    :  14.63 ( 4%) usr   0.00 ( 0%) sys  14.63 ( 4%) wall   
   0 kB ( 0%) ggc
 dominance frontiers   :  27.54 ( 8%) usr   0.00 ( 0%) sys  27.54 ( 8%) wall   
   0 kB ( 0%) ggc
 control dependences   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 expand                :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.70 ( 0%) wall  
21941 kB (13%) ggc
 jump                  :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall   
   0 kB ( 0%) ggc
 CSE                   :  11.83 ( 4%) usr   0.01 ( 0%) sys  11.84 ( 4%) wall   
   0 kB ( 0%) ggc
 loop analysis         :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.37 ( 0%) wall   
3750 kB ( 2%) ggc
 global CSE            :   0.50 ( 0%) usr   0.08 ( 5%) sys   0.58 ( 0%) wall   
   0 kB ( 0%) ggc
 CPROP 1               :   1.29 ( 0%) usr   0.07 ( 4%) sys   1.36 ( 0%) wall   
2656 kB ( 2%) ggc
 PRE                   :   1.65 ( 0%) usr   0.26 (16%) sys   1.91 ( 1%) wall   
   0 kB ( 0%) ggc
 CPROP 2               :   1.33 ( 0%) usr   0.07 ( 4%) sys   1.39 ( 0%) wall   
2031 kB ( 1%) ggc
 bypass jumps          :   1.35 ( 0%) usr   0.10 ( 6%) sys   1.45 ( 0%) wall   
2031 kB ( 1%) ggc
 CSE 2                 :  13.92 ( 4%) usr   0.00 ( 0%) sys  13.92 ( 4%) wall   
   0 kB ( 0%) ggc
 branch prediction     :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.92 ( 0%) wall   
2187 kB ( 1%) ggc
 flow analysis         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
   0 kB ( 0%) ggc
 combiner              :   0.79 ( 0%) usr   0.00 ( 0%) sys   0.79 ( 0%) wall   
1875 kB ( 1%) ggc
 if-conversion         :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall   
   0 kB ( 0%) ggc
 regmove               :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling            :  31.43 ( 9%) usr   0.01 ( 1%) sys  31.44 ( 9%) wall   
4073 kB ( 2%) ggc
 local alloc           :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) wall   
1193 kB ( 1%) ggc
 global alloc          :   1.57 ( 0%) usr   0.04 ( 3%) sys   1.61 ( 0%) wall   
 312 kB ( 0%) ggc
 reload CSE regs       :   1.47 ( 0%) usr   0.00 ( 0%) sys   1.47 ( 0%) wall   
1875 kB ( 1%) ggc
 flow 2                :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall   
3751 kB ( 2%) ggc
 if-conversion 2       :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.28 ( 0%) wall   
   0 kB ( 0%) ggc
 peephole 2            :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall   
   0 kB ( 0%) ggc
 rename registers      :  18.93 ( 6%) usr   0.05 ( 3%) sys  18.98 ( 6%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :  69.37 (21%) usr   0.12 ( 7%) sys  69.48 (21%) wall  
34853 kB (21%) ggc
 machine dep reorg     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc
 reorder blocks        :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.75 ( 0%) wall  
17522 kB (10%) ggc
 shorten branches      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc
 final                 :   0.52 ( 0%) usr   0.01 ( 0%) sys   0.52 ( 0%) wall   
   0 kB ( 0%) ggc
 rest of compilation   :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall   
   1 kB ( 0%) ggc
 TOTAL                 : 333.44             1.64           335.08            
167877 kB
Comment 11 James A. Morrison 2005-08-23 06:46:59 UTC
Here is an updated time-report with the main time contributors on ppc-linux with
checking disabled.
 tree VRP              :  75.60 (11%) usr   0.39 ( 0%) sys  89.92 ( 9%) wall  
13693 kB ( 6%) ggc
 tree operand scan     : 256.77 (37%) usr   1.36 ( 2%) sys 287.58 (28%) wall
2122 kB ( 1%) ggc
 scheduling            : 274.05 (39%) usr  73.82 (89%) sys 381.13 (37%) wall
6251 kB ( 3%) ggc
 global alloc          :   6.98 ( 1%) usr   3.48 ( 4%) sys 166.93 (16%) wall
5469 kB ( 3%) ggc

 global alloc was really slow because it took a lot of memory, more than my
machine had.
Comment 12 James A. Morrison 2005-08-25 05:24:08 UTC
 If I use C5(1) with two C4's in C5 then I get a stack overflow in VRP between
find_assert_locations and find_conditional_asserts.
Comment 13 James A. Morrison 2005-08-25 06:03:11 UTC
 DOM also eats a metric ton of memory on this testcase. 900MB with C3(1).
Comment 14 James A. Morrison 2005-08-25 07:03:07 UTC
 FOO!  The exact testcase I have been testing for the last couple days has been
#define CL0(a) if (b == a) { foo (); } , so all the work DOM was doing converts
the if's to else if's .
Comment 15 Andrew Pinski 2005-09-19 00:14:03 UTC
For -O1 on x86_64-pc-linux-gnu:

For 4.1.0:
 tree operand scan     :  18.28 (50%) usr   0.10 (18%) sys  18.33 (50%) wall     402 kB ( 0%) ggc
That is the same issue as PR 21430

For both 4.0.0 and 4.1.0:
4.0.0:
 dominance frontiers   :  12.80 (66%) usr   0.00 ( 0%) sys  12.82 (64%) wall
4.1.0:
 dominance frontiers   :  10.87 (30%) usr   0.00 ( 0%) sys  10.89 (30%) wall       0 kB ( 0%) ggc

Comment 16 Steven Bosscher 2005-09-19 00:15:21 UTC
Another SSA operands cache slowness example... 
Comment 17 Steven Bosscher 2005-10-16 23:20:29 UTC
On AMD64, I now get the following timings:

                           -O1     -O2
3.3 (profilebootstrapped)  46.64   46.90
4.1 (checking=release)     72.82   156.43

In 4.1, the Big Spenders are "dominance frontiers" (41% usr)
and "tree operand scan" (also 41%) for -O1.  For -O2 those two
also are big time black holes, and the 3 gcse.c CPROP passes
join them at the top of the profile ("dominance frontiers" 18%,
"tree operand scan" 18%, "CPROP 1" 17%, "CPROP 2" 9%, "bypass
jumps" 10%).

So this is still a regression from GCC 3.3.
Comment 18 Paolo Bonzini 2005-10-18 08:36:24 UTC
Steven, how does your df.c-based cprop fare on this testcase?
Comment 19 Andrew Macleod 2005-10-18 12:25:20 UTC
Created attachment 10017 [details]
patch for operand scan

Now that correct_use_link is *only* used for real uses, it is no longer profitable to try to "shortcut" the search for the owner of a use list. The shortcut use to look at each previous node as the list was traversed, and check to see if the stmt was modified. If it wasn't, we knew that node was in the correct list and wouldnt have to scan all the way back to the owner.

well, this testcase was spending almost all its time checking for stmt_modified_p....  something like 250,000,000 checks on 50,000 calls.

I've removed the no longer useful shortcut, and the results are as follows:

bootstrapped and no new regressions on i686-pc-linux-gnu.

Andrew


                    x86-64:
-O1 on testcase:
before patch     tree operand scan     :   8.00 (36%)
                 TOTAL                 :  22.18

after patch:     tree operand scan     :   1.41 ( 9%)
                 TOTAL                 :  15.62

-O2 on testcase:
before patch     tree operand scan     :   7.88 (15%)
                 TOTAL                 :  53.08

after patch:     tree operand scan     :   1.42 ( 3%)
                 TOTAL                 :  46.94


                    x86:
-O1 on testcase:
before patch     tree operand scan     :   2.54 (17%)
                 TOTAL                 :  14.60


after patch:     tree operand scan     :   1.01 ( 8%)
                 TOTAL                 :  12.51

-O2 on testcase:
before patch     tree operand scan     :   2.95 ( 8%)
                 TOTAL                 :  39.20

after patch:     tree operand scan     :   1.06 ( 3%)
                 TOTAL                 :  38.03


pretty much a wash on cc1-i and cpgram.cc testcases on both targets.
Comment 20 Steven Bosscher 2005-10-29 22:38:11 UTC
amacleod, are you going to post your patch and/or commit it??
Comment 21 Mark Mitchell 2005-10-31 02:02:17 UTC
I'm going to leave this as P2, since we've got a proposed patch in Comment #19.  Andrew, do you need a review on that patch?  Or, is there any other reason it hasn't been committed?
Comment 22 Andrew Macleod 2005-10-31 13:33:30 UTC
It will be checked in shortly. I got your OK for this stage last week, and I was merely waiting for the SVN switchover freeze to expire, trying a new build and getting back to work today.
Comment 23 Andrew Macleod 2005-10-31 14:41:22 UTC
Hmm. This has been committed, but the commit hasn't shown up yet. Perhaps because I tagged it as a tree-optimization PR and I now notice that its marked as rtl-optimization?
Comment 24 Daniel Berlin 2005-10-31 15:04:02 UTC
I fixed the bug that was preventing it from sending it to this bug, it should pop up in a second
Comment 25 Andrew Macleod 2005-10-31 15:19:38 UTC
Subject: Bug 19097

Author: amacleod
Date: Mon Oct 31 13:38:05 2005
New Revision: 106272

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106272
Log:

2005-10-31  Andrew MacLeod  <amacleod@redhat.com>
	
	PR tree-optimization/19097
	* tree-ssa-operands.c (correct_use_link): Don't look for modified stmts.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-operands.c

Comment 26 Steven Bosscher 2005-10-31 17:12:20 UTC
Moving back to new, because I don't know if the GCSE CPROP issue with implicit sets is also already fixed.
Comment 27 Steven Bosscher 2005-11-08 00:18:15 UTC
On AMD64, revision 106596M (the M is for a local loop-invariant.c
patch, nothing special), compiler built with --enable-checking=release:

at -O1:
 tree operand scan     :   1.50 (10%) usr   0.09 (17%) sys   1.62 (10%) wall      
 dominance frontiers   :   9.09 (60%) usr   0.00 ( 0%) sys   9.20 (58%) wall      
 TOTAL                 :  15.05             0.53            15.80

at -O2:
 tree VRP              :  12.20 (23%) usr   0.03 ( 3%) sys  12.44 (23%) wall
 dominance frontiers   :   9.17 (18%) usr   0.01 ( 1%) sys   9.30 (17%) wall       
 CPROP 1               :   8.17 (16%) usr   0.16 (16%) sys   8.44 (16%) wall
 CPROP 2               :   5.54 (11%) usr   0.11 (11%) sys   5.72 (11%) wall
 bypass jumps          :   5.57 (11%) usr   0.11 (11%) sys   5.75 (11%) wall
 TOTAL                 :  52.31             1.00            53.98

For GCC 3.3.5 at -O1 the total time is 26s, and at -O2 it is 31s.



For AMD64 -m32 -march=i686:

at -O1:
 tree operand scan     :   1.48 (10%) usr   0.09 (18%) sys   1.59 (10%) wall
 dominance frontiers   :   9.03 (61%) usr   0.00 ( 0%) sys   9.14 (59%) wall
 TOTAL                 :  14.70             0.49            15.39

at -O2:
 tree VRP              :  11.84 (24%) usr   0.04 ( 4%) sys  12.02 (24%) wall
 dominance frontiers   :   9.11 (19%) usr   0.02 ( 2%) sys   9.25 (18%) wall
 CPROP 1               :   7.54 (15%) usr   0.10 (11%) sys   7.74 (15%) wall
 CPROP 2               :   4.99 (10%) usr   0.10 (11%) sys   5.15 (10%) wall
 bypass jumps          :   4.96 (10%) usr   0.10 (11%) sys   5.12 (10%) wall
 TOTAL                 :  48.97             0.93            50.54

For GCC 3.3.5 at -O1 the total time is 25s, and at -O2 it is 28s.

Compared to my measurements from comment #17, this is good progress.  James, do you think we can close this bug now?
Comment 28 James A. Morrison 2005-11-08 06:48:38 UTC
Steven: How long does 4.0 take to compile this function on your box?
Comment 29 Steven Bosscher 2006-01-08 16:02:06 UTC
User times:

optimization    |       GCC version
level           |       3.3-hammer      4.0             4.1
----------------+------------------------------------------------
-O0             |       0m7.248s        0m1.684s        0m1.800s
-O1             |       0m24.786s       0m18.489s       0m18.749s
-O2             |       0m30.390s       0m54.147s       1m9.636s

The top 5 time consumers for each compiler:
timevar         |  GCC version
at -O2          |  3.3-hammer           4.0                 4.1
----------------+--------------------------------------------------
1.              |  cfg cleanup (45%)    dom. front. (24%)   tree VRP (27%)
2.              |  branch pred. (21%)   CPROP 1 (17%)       dom. front. (14%)
3.              |  rest of comp. (10%)  CPROP 2 (13%)       CPROP 1 (14%)
4.              |  CSE 2 (8%)           bypass jumps (13%)  CPROP 2 (10%)
5.              |  global CSE (3%)      CSE (9%)            bypass jumps (9%)

Comment 30 Steven Bosscher 2006-01-08 16:08:03 UTC
Other than VRP taking so much time for GCC 4.1, there are no surprises in the timings I just added in comment #29 for GCC 4.0 and GCC 4.1.

Computing dominance frontiers is just not a linear operation (especially not for a function with this many of them :-).  "CPROP 1", "CPROP 2", and "bypass jumps" are all the same algorithm (const/copy propagation in gcse.c) which also has known non-linear algorithms for strange test cases like the one for this bug report.

If only VRP wouldn't suck up so much time here, I would call this one FIXED...
Comment 31 Steven Bosscher 2006-01-08 16:14:32 UTC
Re. the timings in comment #29, I should have said that my GCC 3.3 was bootstrapped, but the GCC 4.0 and GCC 4.1 I used were built with "-O0 -g".  I added 3.3 numbers for "ballpark" reference, not for actual comparison.
Comment 32 Steven Bosscher 2006-01-08 18:16:04 UTC
I have bootstrapped 4.0 and 4.1 and re-timed:
GCC 4.0 0m38.118s
GCC 4.1 0m51.059s

The distribution of the compile time is not significantly different from the timings of the -O0 compilers.
Comment 33 Steven Bosscher 2006-01-08 18:32:31 UTC
So where are we wrt. GCC 3.3-hammer at -O2?

compiler     absolute time       relative to 3.3-hammer
GCC 3.3      0m30.390s           1.00
GCC 4.0      0m38.118s           1.25
GCC 4.1      0m51.059s           1.68

Jump bypassing is new in GCC 3.4, that accounts for a 5.47s slowdown.  Without jump bypassing, GCC 4.0 would be ~7% slower than GCC 3.3.

The time for "tree VRP" in GCC 4.1 is 12.12s.  Together with jump bypassing, this is a ~18s slowdown for GCC 4.1 wrt. GCC 3.3 just due to new passes.  Without VRP and jump bypassing, GCC 4.1 would be 11% slower than GCC 3.3.

From the GCC 4.1 .vrp tree dump: "Number of ASSERT_EXPR expressions inserted: 10000".  This is exactly the number of "else" clauses in the test case.  This requires a huge effort on update_ssa (which is slow already anyway) to fix up SSA form for all these pseudo-copies (ASSERT_EXPRs are really just copies).  The resulting SSA replacement table is therefore also just _huge_ (also from the .vrp dump):

SSA replacement table
N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j

b_2 -> { b_1 }
b_3 -> { b_1 }
b_4 -> { b_1 }
b_5 -> { b_1 }
b_6 -> { b_1 }
b_7 -> { b_1 }
b_8 -> { b_1 }
b_9 -> { b_1 }
(etc.)

This is unique behavior that can only be triggered for unreasonable code like this test case.  Eventually it doesn't help to insert all those ASSERT_EXPRs, so it would still be nice, perhaps, to fix ASSERT_EXPR insertion to better understand that it is acting stupidly.

Comment 34 Steven Bosscher 2006-01-08 18:40:36 UTC
Another factor contributing to the huge compile time requirements of VRP for this test case is the number of equivalences recorded:

Value ranges after VRP ("..." meaning I cut away a *cough* few b_i SSA names ;-):

b_1: VARYING
b_2: ~[19998, 19998]  EQUIVALENCES: { b_1 b_3 ... b_10001 } (10000 elements)
b_3: ~[19997, 19997]  EQUIVALENCES: { b_1 b_4 ... b_10001 } (9999 elements)
b_4: ~[19996, 19996]  EQUIVALENCES: { b_1 b_5 ... b_10001 } (9998 elements)
...
b_9999: ~[10001, 10001]  EQUIVALENCES: { b_1 b_10000 b_10001 } (3 elements)
b_10000: ~[10000, 10000]  EQUIVALENCES: { b_1 b_10001 } (2 elements)
b_10001: ~[1, 1]  EQUIVALENCES: { b_1 } (1 elements)

I think that, if this is something we should worry about at all, this belongs in a separate bug report about VRP.

Comment 35 Steven Bosscher 2006-01-09 22:26:30 UTC
...and so we go blame Diego for the 4.1/4.2 problem, because gcse.c CPROP is no longer a problem here for GCC 4.0/4.1/4.2.
Comment 36 Steven Bosscher 2006-02-05 21:37:28 UTC
I think the easiest way to fix this is to limit the length of the EQUIVALENCE chains somehow.  I've collected some numbers about the number of elements in the EQUIVALENCE chains of GCC 2.7.2 (which is the version of GCC in SPEC2000):

            
frequency   # of elements
  20342          0
  43484          1
  13230          2
   1701          3
    456          4
    207          5
     81          6
     44          7
     24          8
     18          9
      4          10
      4          11
      4          12
      4          13
      2          14
      2          15
      2          16

So if we cut off the element lists at, say, 20 elements, we shouldn't be missing a significant number of optimizations.

I haven't looked at how such a cut-off should be implemented.
Comment 37 Steven Bosscher 2006-02-05 22:47:01 UTC
At least I get VRP time down to nothing with a patchlet like this one:

Index: tree-vrp.c
===================================================================
--- tree-vrp.c  (revision 110617)
+++ tree-vrp.c  (working copy)
@@ -326,7 +326,7 @@ add_equivalence (bitmap equiv, tree var)
   value_range_t *vr = vr_value[ver];

   bitmap_set_bit (equiv, ver);
-  if (vr && vr->equiv)
+  if (vr && vr->equiv && bitmap_count_bits (vr->equiv) < 20)
     bitmap_ior_into (equiv, vr->equiv);
 }


For a proper patch, the number should of course be a PARAM and I think using bitmap_count_bits penalizes the common case too much (walking the whole bitmap is cache expensive even for small bitmaps).
Comment 38 Mark Mitchell 2006-02-24 00:25:29 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 39 Mark Mitchell 2006-05-25 02:32:19 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 40 Jorn Wolfgang Rennecke 2006-08-24 20:50:14 UTC
(In reply to comment #37)
> For a proper patch, the number should of course be a PARAM and I think using
> bitmap_count_bits penalizes the common case too much (walking the whole bitmap
> is cache expensive even for small bitmaps).

How about:

 -  if (vr && vr->equiv)
 +  if (vr && vr->equiv && vr->equiv->first
        && (!vr->equiv->first->next
            || bitmap_count_bits (vr->equiv) < VRP_EQUIVALENCE_THRESHOLD)))
      bitmap_ior_into (equiv, vr->equiv);

The 0 bit case is sped up, and the 1 bit case is guaranteed to only need two
pointer dereferences to check that there are not too many bits.
Other small bit numbers should still have a high likelyhood to he handled
without a bitmap_count_bits call.
The quick & dirty check might allow more than VRP_EQUIVALENCE_THRESHOLD
bits through if they are tightly packed, but that is still imited to 64
for a host with 32 bit longs, 128 for a host with 64 bit longs.
Comment 41 Richard Biener 2006-10-10 11:19:37 UTC
We're still 100% slower than 4.0.3 on the mainline:

Execution times (seconds)
 callgraph construction:   0.21 ( 0%) usr   0.03 ( 2%) sys   0.23 ( 0%) wall   11563 kB (13%) ggc
 callgraph optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     937 kB ( 1%) ggc
 ipa reference         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 ipa type escape       :   0.03 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 cfg cleanup           :   0.57 ( 1%) usr   0.02 ( 1%) sys   0.62 ( 1%) wall    1250 kB ( 1%) ggc
 trivially dead code   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 life analysis         :   0.33 ( 0%) usr   0.01 ( 1%) sys   0.35 ( 0%) wall    1250 kB ( 1%) ggc
 life info update      :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       1 kB ( 0%) ggc
 register scan         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 rebuild jump labels   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.11 ( 0%) usr   0.03 ( 2%) sys   0.14 ( 0%) wall     704 kB ( 1%) ggc
 lexical analysis      :   0.07 ( 0%) usr   0.21 (12%) sys   0.20 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   0.09 ( 0%) usr   0.06 ( 3%) sys   0.23 ( 0%) wall    5224 kB ( 6%) ggc
 integration           :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     625 kB ( 1%) ggc
 tree eh               :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.12 ( 0%) usr   0.03 ( 2%) sys   0.15 ( 0%) wall   14606 kB (16%) ggc
 tree CFG cleanup      :   0.67 ( 1%) usr   0.01 ( 1%) sys   0.65 ( 1%) wall       0 kB ( 0%) ggc
 tree VRP              :  40.92 (46%) usr   0.24 (13%) sys  41.26 (46%) wall   10356 kB (11%) ggc
 tree copy propagation :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 tree store copy prop  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree find ref. vars   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree PTA              :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       6 kB ( 0%) ggc
 tree alias analysis   :   0.14 ( 0%) usr   0.10 ( 6%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA rewrite      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA other        :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :   0.07 ( 0%) usr   0.01 ( 1%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree operand scan     :   0.29 ( 0%) usr   0.17 ( 9%) sys   0.35 ( 0%) wall    2306 kB ( 3%) ggc
 dominator optimization:   0.47 ( 1%) usr   0.01 ( 1%) sys   0.52 ( 1%) wall    7031 kB ( 8%) ggc
 tree STORE-CCP        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP              :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 tree PRE              :   0.09 ( 0%) usr   0.01 ( 1%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 tree FRE              :   0.05 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree code sinking     :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree linearize phis   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 PHI merge             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree loop init        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA uncprop      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.48 ( 1%) usr   0.03 ( 2%) sys   0.47 ( 1%) wall       0 kB ( 0%) ggc
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   0.81 ( 1%) usr   0.06 ( 3%) sys   0.99 ( 1%) wall   19285 kB (21%) ggc
 jump                  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 CSE                   :   2.18 ( 2%) usr   0.01 ( 1%) sys   2.18 ( 2%) wall       0 kB ( 0%) ggc
 loop analysis         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 global CSE            :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 CPROP 1               :  14.66 (17%) usr   0.26 (14%) sys  14.93 (17%) wall    2031 kB ( 2%) ggc
 PRE                   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 CPROP 2               :  10.01 (11%) usr   0.19 (10%) sys  10.21 (11%) wall    1406 kB ( 2%) ggc
 bypass jumps          :   9.88 (11%) usr   0.21 (12%) sys  10.10 (11%) wall    1406 kB ( 2%) ggc
 CSE 2                 :   2.08 ( 2%) usr   0.01 ( 1%) sys   2.09 ( 2%) wall       0 kB ( 0%) ggc
 branch prediction     :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall     625 kB ( 1%) ggc
 flow analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 combiner              :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall     312 kB ( 0%) ggc
 if-conversion         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 regmove               :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 local alloc           :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       1 kB ( 0%) ggc
 global alloc          :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall       0 kB ( 0%) ggc
 reload CSE regs       :   0.13 ( 0%) usr   0.01 ( 1%) sys   0.14 ( 0%) wall    2502 kB ( 3%) ggc
 flow 2                :   0.03 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 if-conversion 2       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall    2187 kB ( 2%) ggc
 rename registers      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 scheduling 2          :   0.27 ( 0%) usr   0.01 ( 1%) sys   0.28 ( 0%) wall     314 kB ( 0%) ggc
 machine dep reorg     :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    1250 kB ( 1%) ggc
 reorder blocks        :   0.15 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 0%) wall    1875 kB ( 2%) ggc
 final                 :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall     512 kB ( 1%) ggc
 TOTAL                 :  88.19             1.81            90.10              91385 kB
Comment 42 Richard Biener 2007-09-17 13:11:02 UTC
Still slow.

 cfg cleanup           :   7.10 (10%) usr   0.16 (10%) sys   7.26 (10%) wall     625 kB ( 1%) ggc
 tree VRP              :  27.40 (39%) usr   0.39 (23%) sys  27.91 (38%) wall    7602 kB (11%) ggc
 CPROP 1               :  12.32 (17%) usr   0.26 (15%) sys  12.58 (17%) wall    1562 kB ( 2%) ggc
 CPROP 2               :   8.79 (12%) usr   0.20 (12%) sys   9.03 (12%) wall    1406 kB ( 2%) ggc
 bypass jumps          :   8.73 (12%) usr   0.20 (12%) sys   8.96 (12%) wall    1406 kB ( 2%) ggc
 TOTAL                 :  70.94             1.68            73.05              71071 kB
Comment 43 Steven Bosscher 2007-11-09 22:08:45 UTC
IMVHO this should be closed as WONTFIX.
Comment 44 sebpop@gmail.com 2007-11-11 05:16:58 UTC
Subject: Re:  [4.1/4.2/4.3 regression] Quadratic behavior with many sets for the same register in VRP

> IMVHO this should be closed as WONTFIX.

Steven, why isn't your patch from comment #37 not a candidate for
fixing this bug?
Comment 45 stevenb.gcc@gmail.com 2007-11-11 09:23:47 UTC
Subject: Re:  [4.1/4.2/4.3 regression] Quadratic behavior with many sets for the same register in VRP

Because it costs more than it brings: compile time on average goes
_up_ with that patch.
Comment 46 Richard Biener 2007-11-11 12:16:51 UTC
Note that on the mainline equiv processing can be completely turned off by
just tweaking add_equivalence () to do nothing.

There are a few missed optimizations if you do that, but I belive the important
kind of things there should be done in a separate pass (or DOM, which at least
in the past could do these).
Comment 47 Steven Bosscher 2007-12-18 13:46:48 UTC
Comment on attachment 10017 [details]
patch for operand scan

Patch is obsolete because it was commited.
Comment 48 Steven Bosscher 2008-01-23 08:52:55 UTC
The priority of this bug is too high.  If it wasn't just for recording that this behavior exists in VRP and CPROP, I would propose to just close this bug as WONTFIX.  The test case is just too obscure, no man or machine would normally generate anything like this.

So can a release manager please lower the priority of this bug report?

For GCC 4.4 I hopefully will nuke the CPROP problem.
Comment 49 Richard Biener 2008-01-23 09:06:51 UTC
I think it is reasonable to build in limits into passes that do work not
linear (or at least quadratic) in the number of statements, BBs or edges.

The testcase is probably just extracting the core problem from a real-world
testcase (where the situation is probably also not that bad).
Comment 50 Joseph S. Myers 2008-07-04 16:47:12 UTC
Closing 4.1 branch.
Comment 51 Joseph S. Myers 2009-03-31 16:42:39 UTC
Closing 4.2 branch.
Comment 52 Richard Biener 2009-08-04 12:26:11 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 53 Steven Bosscher 2009-12-24 13:02:47 UTC
I cannot reproduce this anymore on ia64 -- at least not for VRP and CPROP. The slowest pass is now expand:

 dominator optimization:   0.40 ( 1%) usr   0.00 ( 0%) sys   0.40 ( 1%) wall    2188 kB ( 3%) ggc
 tree CCP              :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc
 tree PHI const/copy prop:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree split crit edges :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree reassociation    :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 tree PRE              :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       1 kB ( 0%) ggc
 tree FRE              :   0.12 ( 0%) usr   0.00 ( 1%) sys   0.12 ( 0%) wall       1 kB ( 0%) ggc
 tree code sinking     :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree linearize phis   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.08 ( 0%) usr   0.01 ( 2%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.28 ( 0%) usr   0.00 ( 1%) sys   0.28 ( 0%) wall       3 kB ( 0%) ggc
 tree buildin call DCE :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 PHI merge             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 complete unrolling    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree loop init        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA uncprop      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree switch initialization conversion:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.62 ( 1%) usr   0.00 ( 1%) sys   0.64 ( 1%) wall       0 kB ( 0%) ggc
 control dependences   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   3.13 ( 5%) usr   0.06 ( 7%) sys   3.21 ( 5%) wall   17093 kB (26%) ggc
 forward prop          :   0.79 ( 1%) usr   0.00 ( 1%) sys   0.79 ( 1%) wall     703 kB ( 1%) ggc
 CSE                   :   1.36 ( 2%) usr   0.00 ( 0%) sys   1.36 ( 2%) wall       0 kB ( 0%) ggc
 dead code elimination :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1      :   0.39 ( 1%) usr   0.00 ( 0%) sys   0.39 ( 1%) wall     390 kB ( 1%) ggc
 loop analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 CPROP                 :   1.96 ( 3%) usr   0.18 (23%) sys   2.15 ( 3%) wall    6094 kB ( 9%) ggc
 PRE                   :   1.26 ( 2%) usr   0.20 (27%) sys   1.46 ( 2%) wall       0 kB ( 0%) ggc
 auto inc dec          :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc
 CSE 2                 :   1.69 ( 3%) usr   0.00 ( 0%) sys   1.69 ( 3%) wall    1288 kB ( 2%) ggc
 branch prediction     :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall       0 kB ( 0%) ggc
 combiner              :   0.62 ( 1%) usr   0.00 ( 0%) sys   0.62 ( 1%) wall     234 kB ( 0%) ggc
 if-conversion         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 scheduling            :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     512 kB ( 1%) ggc
 reload                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 scheduling 2          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       8 kB ( 0%) ggc
 tree if-combine       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  61.41             0.77            62.18              66790 kB

But this expand slowness also manifests itself at -O0 (IRA and gimplifier slow), and at -O1 and -O3 (expand). I don't find this unreasnable at all, for 20000 conditionals and 20000 function calls (and all the stack and temp slots saving/restoring involved). Closing.