Bug 46590 - long compile time with -O2 and many loops
Summary: long compile time with -O2 and many loops
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog, memory-hog
: 54337 (view as bug list)
Depends on: 61515
Blocks: 47344
  Show dependency treegraph
 
Reported: 2010-11-21 13:21 UTC by Thomas Koenig
Modified: 2014-10-21 09:24 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.1
Known to fail: 4.6.0
Last reconfirmed: 2010-11-22 13:13:03


Attachments
gzipped test case (63.31 KB, application/octet-stream)
2010-11-21 13:22 UTC, Thomas Koenig
Details
Test case from comment #5 and comment #6 (16.15 KB, text/plain)
2010-11-21 21:07 UTC, Thomas Koenig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2010-11-21 13:21:12 UTC
The attached program, which is a stress-test to test array assignment
dependencies for gfortran, compiles just about forever at -O2;
so far, it's 90 minutes and still no sign of finishing.

Compilation time without -O2 is 45 seconds.

Here is the perl script used to generate the test case.

#! /usr/bin/perl                                                         

$cnt = 0;
# @ind = (-6, -3, -1, 1, 3, 6);
@ind = (-5, -3, -1, 1, 3, 5);  
$n = 6;                        

print <<EOF;
program main
  implicit none
  integer, dimension(-100:100) :: a,b,original
  integer :: i
  original = [(15*i+2,i=-100,100)]

EOF
foreach $stride_l (@ind) {
    foreach $stride_r (@ind) {
        for ($start_l = 1; $start_l < $n; $start_l ++) {
            for ($start_r = 1; $start_r < $n; $start_r ++) {
                for ($times = 0; $times < $n; $times ++) {
                    $end_l = $start_l + ($times-1)*$stride_l;
                    $end_r = $start_r + ($times-1)*$stride_r;
                    print <<EOF;
  a = original
  b = original
  a($start_l:$end_l:$stride_l) =  a($start_r:$end_r:$stride_r)
  b($start_l:$end_l:$stride_l) = original($start_r:$end_r:$stride_r)
  if (any (a /= b)) then
    print *,$start_l, $end_l,$stride_l,$start_r,$end_r, $stride_r
    call abort
  end if

EOF
                }
            }
        }
    }
}
print "end program main\n";
Comment 1 Thomas Koenig 2010-11-21 13:22:50 UTC
Created attachment 22475 [details]
gzipped test case

Test case.
Comment 2 Dominique d'Humieres 2010-11-21 16:59:39 UTC
On my laptop the test in comment #1 takes ~4mn to compile and at -O1 I had to interrupt the compilation after ~20mn during which all the time was spent in memory pagination!-(with 4Gb of RAM).
Comment 3 Thomas Koenig 2010-11-21 18:49:20 UTC
Yes, this is a memory hog as well:

time ~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -O2 gener-max.f90
 MAIN__ main
Analyzing compilation unit
 {GC 183863k -> 103486k}Performing interprocedural optimizations
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 156854k -> 151699k} <whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling functions:
 MAIN__ {GC 281998k -> 189095k} {GC 393064k -> 341214k} {GC 554029k -> 337458k} {GC 477071k -> 337199k} {GC 557246k -> 473808k}
f951: out of memory allocating 968552 bytes after a total of 4310765568 bytes

real    150m58.614s
user    150m16.083s
sys     0m11.924s
ig25@linux-fd1f:~/Krempel/Dep-c>
Comment 4 Richard Biener 2010-11-21 18:56:58 UTC
Please strip down the testcase and provide -ftime-report output (I suppose
you hit IVOPTs slowness, so try -fno-ivopts).
Comment 5 Thomas Koenig 2010-11-21 19:39:23 UTC
With -fno-ivopts, the alias statement walking is the clear culprit
(a somewhat shorter test case):

ig25@linux-fd1f:~/Krempel/Dep-c> time ~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2 -fno-ivopts gener-4.f90                                                                       
 MAIN__ main                                                                                       
Analyzing compilation unit                                                                         
 {GC 44026k -> 25146k}Performing interprocedural optimizations                                     
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k} <whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling functions:                                   
 MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC 103199k -> 77477k} {GC 100723k -> 74180k} {GC 98147k -> 73521k} main                                                        
Execution times (seconds)                                                                            
 garbage collection    :   0.79 ( 0%) usr   0.00 ( 0%) sys   0.81 ( 0%) wall       0 kB ( 0%) ggc    
 callgraph construction:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    2465 kB ( 1%) ggc    
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       2 kB ( 0%) ggc    
 ipa cp                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     256 kB ( 0%) ggc    
 ipa profile           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 ipa pure const        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc    
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 cfg cleanup           :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall     344 kB ( 0%) ggc    
 CFG verifier          :   1.40 ( 0%) usr   0.01 ( 0%) sys   1.35 ( 0%) wall       0 kB ( 0%) ggc    
 trivially dead code   :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall       0 kB ( 0%) ggc    
 df scan insns         :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc    
 df multiple defs      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc    
 df reaching defs      :   2.12 ( 1%) usr   0.01 ( 0%) sys   2.15 ( 1%) wall       0 kB ( 0%) ggc    
 df live regs          :   0.93 ( 0%) usr   0.00 ( 0%) sys   1.00 ( 0%) wall       0 kB ( 0%) ggc    
 df live&initialized regs:   0.26 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall       0 kB ( 0%) ggc  
 df use-def / def-use chains:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc                                                                                                    
 df reg dead/unused notes:   0.44 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall    2460 kB ( 1%) ggc  
 register information  :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc    
 alias analysis        :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall    2816 kB ( 1%) ggc    
 alias stmt walking    : 205.44 (72%) usr   1.33 (58%) sys 207.39 (72%) wall    7023 kB ( 3%) ggc    
 register scan         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc    
 rebuild jump labels   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall       0 kB ( 0%) ggc    
 parser                :   0.44 ( 0%) usr   0.03 ( 1%) sys   0.47 ( 0%) wall   20481 kB ( 8%) ggc    
 inline heuristics     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc    
 tree gimplify         :   0.16 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall   17257 kB ( 6%) ggc    
 tree eh               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree CFG construction :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    5915 kB ( 2%) ggc    
 tree CFG cleanup      :   1.08 ( 0%) usr   0.02 ( 1%) sys   1.15 ( 0%) wall     602 kB ( 0%) ggc    
 tree VRP              :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall    4778 kB ( 2%) ggc    
 tree copy propagation :   0.23 ( 0%) usr   0.01 ( 0%) sys   0.25 ( 0%) wall      75 kB ( 0%) ggc    
 tree find ref. vars   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     865 kB ( 0%) ggc    
 tree PTA              :   2.31 ( 1%) usr   0.06 ( 3%) sys   2.38 ( 1%) wall    2253 kB ( 1%) ggc    
 tree PHI insertion    :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2203 kB ( 1%) ggc    
 tree SSA rewrite      :   3.08 ( 1%) usr   0.01 ( 0%) sys   5.10 ( 2%) wall    6503 kB ( 2%) ggc    
 tree SSA other        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      32 kB ( 0%) ggc    
 tree SSA incremental  :   6.20 ( 2%) usr   0.03 ( 1%) sys   7.43 ( 3%) wall    1372 kB ( 1%) ggc    
 tree operand scan     :   0.13 ( 0%) usr   0.10 ( 4%) sys   0.19 ( 0%) wall   13430 kB ( 5%) ggc    
 dominator optimization:   0.40 ( 0%) usr   0.01 ( 0%) sys   0.41 ( 0%) wall   22013 kB ( 8%) ggc    
 tree SRA              :   0.12 ( 0%) usr   0.02 ( 1%) sys   0.14 ( 0%) wall   11123 kB ( 4%) ggc    
 tree CCP              :   0.47 ( 0%) usr   0.02 ( 1%) sys   0.52 ( 0%) wall     838 kB ( 0%) ggc    
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc  
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     559 kB ( 0%) ggc    
 tree PRE              :   0.65 ( 0%) usr   0.05 ( 2%) sys   0.70 ( 0%) wall    7992 kB ( 3%) ggc    
 tree FRE              :   0.43 ( 0%) usr   0.05 ( 2%) sys   0.38 ( 0%) wall     530 kB ( 0%) ggc    
 tree code sinking     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall     360 kB ( 0%) ggc    
 tree linearize phis   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc    
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     337 kB ( 0%) ggc    
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc    
 tree conservative DCE :   0.10 ( 0%) usr   0.04 ( 2%) sys   0.11 ( 0%) wall      64 kB ( 0%) ggc    
 tree aggressive DCE   :   0.25 ( 0%) usr   0.01 ( 0%) sys   0.29 ( 0%) wall    4194 kB ( 2%) ggc    
 tree DSE              :   0.90 ( 0%) usr   0.01 ( 0%) sys   0.91 ( 0%) wall       0 kB ( 0%) ggc    
 tree loop bounds      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     821 kB ( 0%) ggc    
 tree loop invariant motion:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1012 kB ( 0%) ggc    
 complete unrolling    :   1.38 ( 0%) usr   0.05 ( 2%) sys   1.84 ( 1%) wall   16838 kB ( 6%) ggc    
 tree loop init        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     394 kB ( 0%) ggc    
 tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree copy headers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     613 kB ( 0%) ggc    
 tree SSA uncprop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree rename SSA copies:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc    
 tree SSA verifier     :   2.85 ( 1%) usr   0.01 ( 0%) sys   2.99 ( 1%) wall       0 kB ( 0%) ggc    
 tree STMT verifier    :   5.48 ( 2%) usr   0.00 ( 0%) sys   5.55 ( 2%) wall       0 kB ( 0%) ggc    
 tree switch initialization conversion:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc                                                                                          
 callgraph verifier    :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc    
 dominance frontiers   :   6.84 ( 2%) usr   0.01 ( 0%) sys   4.58 ( 2%) wall       0 kB ( 0%) ggc    
 dominance computation :   4.91 ( 2%) usr   0.00 ( 0%) sys   3.66 ( 1%) wall       0 kB ( 0%) ggc    
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 out of ssa            :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc    
 expand vars           :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    1109 kB ( 0%) ggc    
 expand                :   0.54 ( 0%) usr   0.02 ( 1%) sys   0.56 ( 0%) wall   38882 kB (14%) ggc    
 post expand cleanups  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       1 kB ( 0%) ggc    
 forward prop          :   0.22 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall    1581 kB ( 1%) ggc    
 CSE                   :   1.07 ( 0%) usr   0.00 ( 0%) sys   1.06 ( 0%) wall    1762 kB ( 1%) ggc    
 dead code elimination :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc    
 dead store elim1      :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall    4983 kB ( 2%) ggc    
 dead store elim2      :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall    6295 kB ( 2%) ggc    
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     836 kB ( 0%) ggc    
 loop invariant motion :   5.52 ( 2%) usr   0.00 ( 0%) sys   5.50 ( 2%) wall       0 kB ( 0%) ggc    
 CPROP                 :   0.89 ( 0%) usr   0.04 ( 2%) sys   0.94 ( 0%) wall    8097 kB ( 3%) ggc    
 PRE                   :  10.12 ( 4%) usr   0.11 ( 5%) sys  10.28 ( 4%) wall    2091 kB ( 1%) ggc    
 CSE 2                 :   0.55 ( 0%) usr   0.00 ( 0%) sys   0.55 ( 0%) wall     547 kB ( 0%) ggc    
 branch prediction     :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall    1542 kB ( 1%) ggc    
 combiner              :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall    4406 kB ( 2%) ggc    
 if-conversion         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     559 kB ( 0%) ggc
 regmove               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :   2.07 ( 1%) usr   0.18 ( 8%) sys   2.26 ( 1%) wall    5525 kB ( 2%) ggc
 reload                :   0.93 ( 0%) usr   0.00 ( 0%) sys   0.93 ( 0%) wall   14653 kB ( 5%) ggc
 reload CSE regs       :   1.30 ( 0%) usr   0.00 ( 0%) sys   1.32 ( 0%) wall   13495 kB ( 5%) ggc
 zee                   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 thread pro- & epilogue:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       3 kB ( 0%) ggc
 if-conversion 2       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     279 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     354 kB ( 0%) ggc
 hard reg cprop        :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall       0 kB ( 0%) ggc
 scheduling 2          :   1.83 ( 1%) usr   0.02 ( 1%) sys   1.94 ( 1%) wall      71 kB ( 0%) ggc
 machine dep reorg     :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks        :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall     891 kB ( 0%) ggc
 final                 :   0.32 ( 0%) usr   0.01 ( 0%) sys   0.33 ( 0%) wall       0 kB ( 0%) ggc
 tree if-combine       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 rest of compilation   :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall    3151 kB ( 1%) ggc
 remove unused locals  :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall       0 kB ( 0%) ggc
 address taken         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall      32 kB ( 0%) ggc
 unaccounted todo      :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 verify loop closed    :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall       0 kB ( 0%) ggc
 verify RTL sharing    :   1.59 ( 1%) usr   0.00 ( 0%) sys   1.61 ( 1%) wall       0 kB ( 0%) ggc
 repair loop structures:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     145 kB ( 0%) ggc
 TOTAL                 : 283.50             2.31           286.74             270936 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    4m46.813s
user    4m43.509s
sys     0m2.336s
Comment 6 Thomas Koenig 2010-11-21 19:48:41 UTC
Without -fno-ivopts:

ig25@linux-fd1f:~/Krempel/Dep-c> time ~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2  gener-4.f90                                                                         
 MAIN__ main                                                                              
Analyzing compilation unit                                                                
 {GC 44026k -> 25146k}Performing interprocedural optimizations                            
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k} <whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling functions:                                   
 MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC 75220k -> 51694k} {GC 89661k -> 77264k} {GC 500191k -> 73724k} {GC 106651k -> 72832k} main                                  
Execution times (seconds)                                                                            
 garbage collection    :   1.05 ( 0%) usr   0.03 ( 2%) sys   1.09 ( 0%) wall       0 kB ( 0%) ggc    
 callgraph construction:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    2465 kB ( 0%) ggc    
 callgraph optimization:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       2 kB ( 0%) ggc    
 ipa pure const        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc    
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc    
 cfg cleanup           :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall     344 kB ( 0%) ggc    
 CFG verifier          :   1.35 ( 0%) usr   0.00 ( 0%) sys   1.35 ( 0%) wall       0 kB ( 0%) ggc    
 trivially dead code   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc    
 df scan insns         :   0.17 ( 0%) usr   0.01 ( 1%) sys   0.18 ( 0%) wall       0 kB ( 0%) ggc    
 df multiple defs      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc    
 df reaching defs      :   2.04 ( 1%) usr   0.01 ( 1%) sys   2.11 ( 1%) wall       0 kB ( 0%) ggc    
 df live regs          :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.06 ( 0%) wall       0 kB ( 0%) ggc    
 df live&initialized regs:   0.29 ( 0%) usr   0.01 ( 1%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc  
 df use-def / def-use chains:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc                                                                                                    
 df reg dead/unused notes:   0.43 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall    2460 kB ( 0%) ggc  
 register information  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc    
 alias analysis        :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall    2816 kB ( 0%) ggc    
 alias stmt walking    : 192.54 (71%) usr   0.22 (16%) sys 193.02 (71%) wall    7023 kB ( 1%) ggc    
 register scan         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall      20 kB ( 0%) ggc    
 rebuild jump labels   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc    
 parser                :   0.43 ( 0%) usr   0.03 ( 2%) sys   0.47 ( 0%) wall   20481 kB ( 3%) ggc    
 inline heuristics     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc    
 tree gimplify         :   0.16 ( 0%) usr   0.02 ( 1%) sys   0.19 ( 0%) wall   17257 kB ( 3%) ggc    
 tree eh               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree CFG construction :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    5915 kB ( 1%) ggc    
 tree CFG cleanup      :   1.06 ( 0%) usr   0.01 ( 1%) sys   1.07 ( 0%) wall     602 kB ( 0%) ggc    
 tree VRP              :   0.40 ( 0%) usr   0.01 ( 1%) sys   0.37 ( 0%) wall    4754 kB ( 1%) ggc    
 tree copy propagation :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall      75 kB ( 0%) ggc    
 tree find ref. vars   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     865 kB ( 0%) ggc    
 tree PTA              :   2.24 ( 1%) usr   0.06 ( 4%) sys   2.30 ( 1%) wall    2253 kB ( 0%) ggc    
 tree PHI insertion    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2203 kB ( 0%) ggc    
 tree SSA rewrite      :   3.18 ( 1%) usr   0.00 ( 0%) sys   2.75 ( 1%) wall    6503 kB ( 1%) ggc    
 tree SSA other        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall      32 kB ( 0%) ggc    
 tree SSA incremental  :   6.64 ( 2%) usr   0.01 ( 1%) sys   6.73 ( 2%) wall    1372 kB ( 0%) ggc    
 tree operand scan     :   0.17 ( 0%) usr   0.09 ( 7%) sys   0.28 ( 0%) wall   13590 kB ( 2%) ggc    
 dominator optimization:   0.38 ( 0%) usr   0.01 ( 1%) sys   0.40 ( 0%) wall   22013 kB ( 3%) ggc    
 tree SRA              :   0.09 ( 0%) usr   0.02 ( 1%) sys   0.15 ( 0%) wall   11123 kB ( 2%) ggc    
 tree CCP              :   0.45 ( 0%) usr   0.02 ( 1%) sys   0.43 ( 0%) wall     838 kB ( 0%) ggc    
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     559 kB ( 0%) ggc    
 tree PRE              :   0.53 ( 0%) usr   0.04 ( 3%) sys   0.67 ( 0%) wall    7992 kB ( 1%) ggc    
 tree FRE              :   0.34 ( 0%) usr   0.01 ( 1%) sys   0.42 ( 0%) wall     530 kB ( 0%) ggc    
 tree code sinking     :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall     360 kB ( 0%) ggc    
 tree linearize phis   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree forward propagate:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     337 kB ( 0%) ggc    
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc    
 tree conservative DCE :   0.08 ( 0%) usr   0.03 ( 2%) sys   0.05 ( 0%) wall      64 kB ( 0%) ggc    
 tree aggressive DCE   :   0.23 ( 0%) usr   0.03 ( 2%) sys   0.27 ( 0%) wall    4227 kB ( 1%) ggc    
 tree buildin call DCE :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc    
 tree DSE              :   0.83 ( 0%) usr   0.00 ( 0%) sys   0.85 ( 0%) wall       0 kB ( 0%) ggc    
 tree loop bounds      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     821 kB ( 0%) ggc    
 tree loop invariant motion:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1012 kB ( 0%) ggc    
 scev constant prop    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     174 kB ( 0%) ggc    
 complete unrolling    :   1.00 ( 0%) usr   0.03 ( 2%) sys   0.86 ( 0%) wall   16838 kB ( 2%) ggc    
 tree iv optimization  :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall   11430 kB ( 2%) ggc    
 tree loop init        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     394 kB ( 0%) ggc    
 tree copy headers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     613 kB ( 0%) ggc    
 tree SSA uncprop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 tree rename SSA copies:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc    
 tree SSA verifier     :   2.92 ( 1%) usr   0.00 ( 0%) sys   2.81 ( 1%) wall       0 kB ( 0%) ggc    
 tree STMT verifier    :   5.27 ( 2%) usr   0.00 ( 0%) sys   5.41 ( 2%) wall       0 kB ( 0%) ggc    
 callgraph verifier    :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc    
 dominance frontiers   :   6.46 ( 2%) usr   0.01 ( 1%) sys   7.35 ( 3%) wall       0 kB ( 0%) ggc    
 dominance computation :   5.33 ( 2%) usr   0.01 ( 1%) sys   5.06 ( 2%) wall       0 kB ( 0%) ggc    
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc    
 out of ssa            :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc    
 expand vars           :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall     726 kB ( 0%) ggc    
 expand                :   0.44 ( 0%) usr   0.01 ( 1%) sys   0.45 ( 0%) wall   37845 kB ( 5%) ggc    
 post expand cleanups  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       1 kB ( 0%) ggc    
 forward prop          :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall    1217 kB ( 0%) ggc    
 CSE                   :   0.97 ( 0%) usr   0.00 ( 0%) sys   0.97 ( 0%) wall    2127 kB ( 0%) ggc    
 dead code elimination :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc    
 dead store elim1      :   0.23 ( 0%) usr   0.01 ( 1%) sys   0.23 ( 0%) wall    4659 kB ( 1%) ggc    
 dead store elim2      :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.51 ( 0%) wall    5526 kB ( 1%) ggc    
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     836 kB ( 0%) ggc    
 loop invariant motion :   5.21 ( 2%) usr   0.01 ( 1%) sys   5.23 ( 2%) wall       0 kB ( 0%) ggc    
 CPROP                 :   0.88 ( 0%) usr   0.03 ( 2%) sys   0.90 ( 0%) wall   13157 kB ( 2%) ggc    
 PRE                   :  10.87 ( 4%) usr   0.33 (24%) sys  11.24 ( 4%) wall  409967 kB (60%) ggc    
 CSE 2                 :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall     729 kB ( 0%) ggc    
 branch prediction     :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall    1542 kB ( 0%) ggc    
 combiner              :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall    1844 kB ( 0%) ggc
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     559 kB ( 0%) ggc
 regmove               :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :   2.04 ( 1%) usr   0.21 (15%) sys   2.26 ( 1%) wall    5282 kB ( 1%) ggc
 reload                :   0.90 ( 0%) usr   0.00 ( 0%) sys   0.90 ( 0%) wall   13255 kB ( 2%) ggc
 reload CSE regs       :   1.28 ( 0%) usr   0.00 ( 0%) sys   1.28 ( 0%) wall   13060 kB ( 2%) ggc
 zee                   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 thread pro- & epilogue:   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       3 kB ( 0%) ggc
 if-conversion 2       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     279 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     354 kB ( 0%) ggc
 hard reg cprop        :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall       0 kB ( 0%) ggc
 scheduling 2          :   1.84 ( 1%) usr   0.02 ( 1%) sys   1.85 ( 1%) wall      71 kB ( 0%) ggc
 machine dep reorg     :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall     891 kB ( 0%) ggc
 final                 :   0.31 ( 0%) usr   0.01 ( 1%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 rest of compilation   :   0.39 ( 0%) usr   0.01 ( 1%) sys   0.38 ( 0%) wall    3151 kB ( 0%) ggc
 remove unused locals  :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.38 ( 0%) wall       0 kB ( 0%) ggc
 address taken         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall      32 kB ( 0%) ggc
 unaccounted todo      :   0.07 ( 0%) usr   0.01 ( 1%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc
 verify loop closed    :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 verify RTL sharing    :   1.55 ( 1%) usr   0.00 ( 0%) sys   1.54 ( 1%) wall       0 kB ( 0%) ggc
 repair loop structures:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     145 kB ( 0%) ggc
 TOTAL                 : 270.29             1.38           272.32             688523 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    4m32.395s
user    4m30.299s
sys     0m1.406s
Comment 7 Thomas Koenig 2010-11-21 21:07:08 UTC
Created attachment 22478 [details]
Test case from comment #5 and comment #6

Here is a test case that is a little shorter.

A bit hard to trim it down further when the bug depends on test
case size...
Comment 8 Thomas Koenig 2010-11-21 21:13:24 UTC
4.4 compile time is much lower:

ig25@linux-fd1f:~/Krempel/Dep-c> time /usr/bin/gfortran-4.4 -O2 gener-4.f90

real    0m59.893s
user    0m58.172s
sys     0m0.664s
Comment 9 Richard Biener 2010-11-22 13:13:03 UTC
It's a very large monolithic function.  And as usual we have a gazillion
amount of local IO state variables.

On trunk with release checking I see:

A quarter of the testcase:

 alias stmt walking    :  15.94 (59%) usr   0.03 ( 8%) sys  16.00 (57%) wall    1845 kB ( 2%) ggc
 TOTAL                 :  27.08             0.38            27.89              96306 kB

Half of the testcase:

 alias stmt walking    :  63.31 (68%) usr   0.51 (31%) sys  64.06 (67%) wall    3684 kB ( 2%) ggc
 TOTAL                 :  93.52             1.66            95.57             241871 kB

All of the testcase:

 alias stmt walking    : 259.19 (73%) usr   0.78 (26%) sys 261.79 (72%) wall    7023 kB ( 1%) ggc
 TOTAL                 : 356.27             2.98           361.57             690719 kB

so it's definitely nearly quadratic (but that's expected).

4.5.x for a quarter of the testcase has:

 alias stmt walking    :  93.10 (88%) usr   0.03 ( 8%) sys  93.31 (87%) wall       0 kB ( 0%) ggc
 TOTAL                 : 106.11             0.40           106.93              87895 kB

so trunk is already a lot better.

Removing the alias stmt walk timevar gets us to the following on trunk
(quarter of the testcase again):

 tree PRE              :  12.93 (47%) usr   0.00 ( 0%) sys  12.98 (46%) wall    3607 kB ( 4%) ggc
 TOTAL                 :  27.57             0.34            27.99              96324 kB

What is costly is translating things through the loop bodies.  We can
improve this a lot by properly marking the I/O structs as dead once
they are no longer used and before they are used first.  The proposed
virtual kill stmts could be used for that.  We could also build this
kind of lifeness information up-front and use it to limit the walking
(but that again is only trivial for non-address taken variables, which
the I/O structs are not).

Anyway, confirmed.  Fortran I/O and array descriptor temporaries really
need re-use (I proposed a patch for I/O ones once but it was shot down
because of async-I/O).

Removing all prints from the testcase gives:

 alias stmt walking    : 132.44 (61%) usr   0.79 (30%) sys 133.47 (61%) wall    7023 kB ( 1%) ggc
 TOTAL                 : 216.55             2.67           219.78             645229 kB

As all arrays are not address-taken we really look for CSE opportunities
up to the very start of the function (PRE translates the in-loop
references from the any (a /= b) loop to the loop header using the
constant initial index and tries to CSE that, but it doesn't actually
succeed - which is another bug of course, it should look it up from
original resp. A.0).
Comment 10 Richard Biener 2010-11-22 13:29:56 UTC
At -O1 we have

 tree SSA rewrite      :  23.32 (13%) usr   0.06 ( 3%) sys  22.80 (13%) wall    6392 kB ( 3%) ggc
 tree SSA incremental  :  24.05 (14%) usr   0.07 ( 4%) sys  25.27 (14%) wall    3533 kB ( 2%) ggc
 tree FRE              :  45.17 (25%) usr   0.51 (29%) sys  45.83 (26%) wall    2048 kB ( 1%) ggc
 dominance frontiers   :  29.37 (17%) usr   0.01 ( 1%) sys  29.30 (16%) wall       0 kB ( 0%) ggc
 dominance computation :  19.49 (11%) usr   0.02 ( 1%) sys  19.29 (11%) wall       0 kB ( 0%) ggc
 loop invariant motion :  12.25 ( 7%) usr   0.01 ( 1%) sys  12.21 ( 7%) wall     648 kB ( 0%) ggc
 TOTAL                 : 177.31             1.74           179.57             196657 kB

the testcase is simply large.  And yes, the FRE/PRE alias stmt walks are
not limited (compared to the DCE/DSE ones).  Maybe it's time to add this
to cover this kind of artificial testcases.
Comment 11 Richard Biener 2010-11-22 16:19:37 UTC
(In reply to comment #10)
> At -O1 we have
> 
>  tree SSA rewrite      :  23.32 (13%) usr   0.06 ( 3%) sys  22.80 (13%) wall   
> 6392 kB ( 3%) ggc
>  tree SSA incremental  :  24.05 (14%) usr   0.07 ( 4%) sys  25.27 (14%) wall   
> 3533 kB ( 2%) ggc

The above is almost completely loop header copying.
Comment 12 Richard Biener 2010-11-24 12:22:48 UTC
Mine for now.
Comment 13 Richard Biener 2010-11-24 13:50:21 UTC
The I/O parts of the FRE cost are due to value-numbering stores in
visit_reference_op_store.  They can be drastically cut by an equivalent
of (not generating code)

Index: trans-io.c
===================================================================
--- trans-io.c  (revision 167111)
+++ trans-io.c  (working copy)
@@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
   gfc_init_block (&post_iu_block);
 
   var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
+  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
 
   set_error_locus (&block, var, &code->loc);
 
which constrains lifetime begin of the dt_parm structs (which have their
address taken and thus lifetime analysis will have a hard time).  That
prevents store value numbering to consider all dominated blocks (which
only contain the loops).  The above brings down the previous -O1 numbers to

 tree FRE              :  28.34 (18%) usr   0.39 (18%) sys  28.83 (18%) wall    1906 kB ( 1%) ggc
 TOTAL                 : 159.87             2.17           162.72             201091 kB

all other variables are re-used and thus their lifetime isn't limited.

The loads from original[] are the only unconstrained ones, handling
those exhibits quadratic behavior.  It walks up to the first statement
in the function which is

  MEM[(c_char * {ref-all})&original] = MEM[(c_char * {ref-all})&A.0];

and then fails to constant fold using the static initializer of A.0.

FRE optimizes away the self-assignments

  a(1:6:-5) =  a(1:6:-5)

it doesn't do anything useful to the other loops.

Overall it's not completely unreasonable what FRE does for a, b and original
(we could have propagated A.0 to all uses of original).  The I/O struct
walks are the only thing that would be nice to fix.

And of course maybe limit walking in general.
Comment 14 Jakub Jelinek 2011-03-25 19:52:13 UTC
GCC 4.6.0 is being released, adjusting target milestone.
Comment 15 Jakub Jelinek 2011-06-27 12:32:42 UTC
GCC 4.6.1 is being released.
Comment 16 Jakub Jelinek 2011-10-26 17:13:27 UTC
GCC 4.6.2 is being released.
Comment 17 Andrew Pinski 2012-01-05 01:47:57 UTC
On the trunk most of the time at -O0 is spent doing add_scope_conflicts.
Comment 18 Michael Matz 2012-01-19 15:06:14 UTC
Author: matz
Date: Thu Jan 19 15:06:04 2012
New Revision: 183305

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183305
Log:
        PR tree-optimization/46590
	* cfgexpand.c (add_scope_conflicts_1): New old_conflicts argument,
	use it in remembering which conflicts we already created.
	(add_scope_conflicts): Adjust call to above, (de)allocate helper
	bitmap.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
Comment 19 Michael Matz 2012-01-19 15:10:43 UTC
The var-expansion slowness is fixed again.  The rest of course still applies.
Comment 20 Michael Matz 2012-01-26 15:50:43 UTC
Author: matz
Date: Thu Jan 26 15:50:33 2012
New Revision: 183566

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183566
Log:
	PR tree-optimization/46590
	* cfgexpand.c: Revert last change (r183305).
	* gimplify.c (gimplify_bind_expr): Add clobbers for all non-gimple
	regs.
	* tree-eh.c (cleanup_empty_eh): Try to optimize clobbers before
	checking for emptiness.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/gimplify.c
    trunk/gcc/tree-eh.c
Comment 21 Jakub Jelinek 2012-03-01 14:38:15 UTC
GCC 4.6.3 is being released.
Comment 22 Richard Biener 2012-08-21 08:09:55 UTC
*** Bug 54337 has been marked as a duplicate of this bug. ***
Comment 23 Steven Bosscher 2012-08-21 09:42:42 UTC
(In reply to comment #13)
> The I/O parts of the FRE cost are due to value-numbering stores in
> visit_reference_op_store.  They can be drastically cut by an equivalent
> of (not generating code)
> 
> Index: trans-io.c
> ===================================================================
> --- trans-io.c  (revision 167111)
> +++ trans-io.c  (working copy)
> @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
>    gfc_init_block (&post_iu_block);
> 
>    var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> +  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
> 
>    set_error_locus (&block, var, &code->loc);
> 

You didn't post/commit this, but it looks like a reasonable change to me.
Comment 24 rguenther@suse.de 2012-08-21 09:59:41 UTC
On Tue, 21 Aug 2012, steven at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |steven at gcc dot gnu.org
> 
> --- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-21 09:42:42 UTC ---
> (In reply to comment #13)
> > The I/O parts of the FRE cost are due to value-numbering stores in
> > visit_reference_op_store.  They can be drastically cut by an equivalent
> > of (not generating code)
> > 
> > Index: trans-io.c
> > ===================================================================
> > --- trans-io.c  (revision 167111)
> > +++ trans-io.c  (working copy)
> > @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
> >    gfc_init_block (&post_iu_block);
> > 
> >    var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> > +  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
> > 
> >    set_error_locus (&block, var, &code->loc);
> > 
> 
> You didn't post/commit this, but it looks like a reasonable change to me.

I think this is obsolete now with Michas change to have CLOBBERs.  Or
if still useful, the above should be a CLOBBER now.

Richard.
Comment 25 Richard Biener 2012-08-21 14:10:38 UTC
I have a patch for the SCCVN issue, but trying to gather current trunk status
first.
Comment 26 Richard Biener 2012-08-21 14:56:41 UTC
For a somewhat reduced testcase I now get at -O1:

 alias stmt walking      : 105.51 (45%) usr   0.33 (24%) sys
 tree SSA rewrite        :  22.01 ( 9%) usr   0.04 ( 3%) sys
 tree SSA incremental    :  25.25 (11%) usr   0.07 ( 5%) sys
 dominance frontiers     :  35.35 (15%) usr   0.02 ( 1%) sys
 dominance computation   :  14.60 ( 6%) usr   0.09 ( 7%) sys
 TOTAL                 : 234.28             1.38

as previously said most of the non-alias-stmt walk time is spent
on loop header copying.  WIth -O1 -fno-tree-ch we have

 alias stmt walking      : 101.52 (68%) usr   0.37 (34%) sys
 tree SSA rewrite        :   4.14 ( 3%) usr   0.01 ( 1%) sys
 tree SSA incremental    :   8.00 ( 5%) usr   0.02 ( 2%) sys
 dominance frontiers     :   6.14 ( 4%) usr   0.01 ( 1%) sys
 dominance computation   :   4.74 ( 3%) usr   0.06 ( 6%) sys
 TOTAL                 : 150.14             1.09

limiting stmt walk results in the ability to arbitrarily scale down its cost
with a param (we can either limit alias oracle query numbers or SCCVN
table lookups).  With 100 alias oracle queries per load/store we end up with

 alias stmt walking      :   1.60 ( 3%) usr   0.05 ( 5%) sys

with 100 SCCVN table lookups the figure is

 alias stmt walking      :   1.60 ( 3%) usr   0.06 ( 6%) sys

one assumes the lookups are expensive, the other one assumes the walk itself is.
Increasing the latter to 1000 SCCVN table lookup produces

 alias stmt walking      :   9.24 (16%) usr   0.18 (19%) sys

which is around the expected 10-fold increase (but still reasonable given
the artificial nature of the testcase).  We could also, instead of
limiting each walk to a constant cost, have a per-function budget that
we can use up first before limiting further walks individually (helps
to not regress reasonably sized cases).
Comment 27 Richard Biener 2012-08-22 11:39:56 UTC
The issue with loop header copying is that we use incremental SSA updating
with PHI insertion for each individual loop header copied.  That computes
dominance frontiers which is always for the whole function.

I thought that loop header copying wouldn't need to insert new PHI nodes
and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
but appearantly that is not the case (I didn't inverstigate further here,
but possibly that's just because virtual operands are not in loop-closed
SSA form).
Comment 28 Richard Biener 2012-08-22 13:17:42 UTC
Author: rguenth
Date: Wed Aug 22 13:17:26 2012
New Revision: 190594

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190594
Log:
2012-08-22  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/46590
	* tree-ssa-alias.h (get_continuation_for_phi): Add alias query
	counter output argument.
	(walk_non_aliased_vuses): Add alias query counter argument
	to the walker callback.
	* tree-ssa-alias.c (maybe_skip_until): Add alias query counter
	output argument and count alias queries.
	(get_continuation_for_phi_1): Likewise.
	(get_continuation_for_phi): Likewise.
	(walk_non_aliased_vuses): Add alias query counter argument
	to the walker callback and allow it to abort the walk by
	returning -1.
	* tree-ssa-pre.c (translate_vuse_through_block): Adjust.
	* tree-ssa-sccvn.c (vn_reference_lookup_2): Add alias query
	counter parmeter, abort walk if that is bigger than
	--param sccvn-max-alias-queries-per-access.
	* params.def (sccvn-max-alias-queries-per-access): New param.
	* doc/invoke.texi (sccvn-max-alias-queries-per-access): Document.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/params.def
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-alias.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c
Comment 29 stevenb.gcc@gmail.com 2012-08-22 17:33:00 UTC
> I thought that loop header copying wouldn't need to insert new PHI nodes
> and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> but appearantly that is not the case (I didn't inverstigate further here,
> but possibly that's just because virtual operands are not in loop-closed
> SSA form).

I'll experiment with VOPs in LC-SSA.
Comment 30 rguenther@suse.de 2012-08-23 07:13:13 UTC
On Wed, 22 Aug 2012, stevenb.gcc at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
> 
> --- Comment #29 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-22 17:33:00 UTC ---
> > I thought that loop header copying wouldn't need to insert new PHI nodes
> > and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> > but appearantly that is not the case (I didn't inverstigate further here,
> > but possibly that's just because virtual operands are not in loop-closed
> > SSA form).
> 
> I'll experiment with VOPs in LC-SSA.

You might have seen the patch I posted - it passed testing and I will
install it today.
Comment 31 Richard Biener 2012-08-23 11:41:50 UTC
Btw, I have experimented with

Index: tree-into-ssa.c
===================================================================
--- tree-into-ssa.c     (revision 190594)
+++ tree-into-ssa.c     (working copy)
@@ -3224,11 +3224,35 @@ update_ssa (unsigned update_flags)
       bitmap_head *dfs;
 
       /* If the caller requested PHI nodes to be added, compute
-        dominance frontiers.  */
+        dominance frontiers for all interesting blocks
+        up to the dominating start_bb.  */
       dfs = XNEWVEC (bitmap_head, last_basic_block);
       FOR_EACH_BB (bb)
        bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
-      compute_dominance_frontiers (dfs);
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, i, bi)
+       {
+         basic_block b = BASIC_BLOCK (i);
+         edge_iterator ei;
+         edge p;
+         if (EDGE_COUNT (b->preds) >= 2)
+           FOR_EACH_EDGE (p, ei, b->preds)
+             {
+               basic_block runner = p->src;
+               basic_block domsb;
+               if (runner == start_bb)
+                 continue;
+
+               domsb = get_immediate_dominator (CDI_DOMINATORS, b);
+               while (runner != domsb)
+                 {
+                   if (!bitmap_set_bit (&dfs[runner->index],
+                                        b->index))
+                     break;
+                   runner = get_immediate_dominator (CDI_DOMINATORS,
+                                                     runner);
+                 }
+             }
+       }
 
       if (sbitmap_first_set_bit (old_ssa_names) >= 0)
        {

which helps reducing the time spent in computing dominance frontiers.  But
as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
we only ever access dfs for entries we computed ... but we should,
looking at how compute_idf works(?).
Comment 32 Steven Bosscher 2012-08-23 13:44:53 UTC
(In reply to comment #31)
> which helps reducing the time spent in computing dominance frontiers.  But
> as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
> we only ever access dfs for entries we computed ... but we should,
> looking at how compute_idf works(?).

I don't understand this comment. You can still always do:
bitmap_empty_p (&dfs[bb->index]) to see if something was
computed for bb.
Comment 33 Richard Biener 2012-09-03 10:45:06 UTC
Not mine anymore.
Comment 34 Michael Matz 2012-09-03 15:39:20 UTC
Author: matz
Date: Mon Sep  3 15:39:15 2012
New Revision: 190897

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190897
Log:
	PR tree-optimization/46590
	* tree-cfg.c (gimple_duplicate_sese_region): Don't update
	SSA web here ...
	* tree-ssa-loop-ch.c (copy_loop_headers): ... but here.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-ssa-loop-ch.c
Comment 35 Richard Biener 2012-09-04 13:17:53 UTC
With -O1 FRE uses loads of memory and compile-time.  Not the value-numbering
itself but compute_avail ()!  Bah.  Of course AVAIL sets get bigger and
bigger going down the dominator tree.  For FRE a single domwalk after
SCCVN computing "avail" and doing elimiation would be enough.  Especially
as all the overhead in AVAIL is just SSA DEFs and their value.
Comment 36 Richard Biener 2012-09-05 10:59:52 UTC
If I fix that (PR54489) by iterating over immediate dominators when querying AVAIL_OUT
instead of accumulating then other loop opts quickly take over in compile-time,
but memory usage stays reasonable at -O1.  LIM is now the pass that pushes
memory usage to 1.8GB - all other optimization passes are happy with just
~800MB.  The issue with LIM is that it analyzes the whole function instead
of working on outermost loops at a time (PR54488).  Then of course IRA
comes along and wrecks memory usage again ... (create_loop_tree_nodes).
One can tame down IRA a bit using -fno-ira-loop-pressure -fira-region=one.
We then arrive at roughly a constant 900MB memory usage for the full(!)
testcase at -O1 and

Execution times (seconds)
 phase opt and generate  : 495.90 (99%) usr   1.98 (98%) sys 499.91 (99%) wall  870508 kB (92%) ggc
 df reaching defs        :  19.16 ( 4%) usr   0.06 ( 3%) sys  19.18 ( 4%) wall       0 kB ( 0%) ggc
 alias stmt walking      :  28.75 ( 6%) usr   0.21 (10%) sys  29.12 ( 6%) wall    2336 kB ( 0%) ggc
 tree SSA rewrite        :  63.42 (13%) usr   0.02 ( 1%) sys  63.77 (13%) wall   18830 kB ( 2%) ggc
 tree SSA incremental    :  74.64 (15%) usr   0.03 ( 1%) sys  74.44 (15%) wall   25886 kB ( 3%) ggc
 dominance frontiers     : 101.71 (20%) usr   0.09 ( 4%) sys 102.17 (20%) wall       0 kB ( 0%) ggc
 dominance computation   :  52.56 (11%) usr   0.09 ( 4%) sys  53.35 (11%) wall       0 kB ( 0%) ggc
 loop invariant motion   : 101.20 (20%) usr   0.10 ( 5%) sys 101.75 (20%) wall    2700 kB ( 0%) ggc
 TOTAL                 : 498.79             2.03           502.87             947764 kB

(all entries > 10s)

The incremental SSA stuff is complete loop unrolling / IV canonicalization
which does SSA update once per loop (similar to what loop header copying
formerly did).  Fixing that leads to

Execution times (seconds)
 phase opt and generate  : 214.62 (99%) usr   1.53 (96%) sys 217.41 (99%) wall  870508 kB (92%) ggc
 df reaching defs        :  23.07 (11%) usr   0.01 ( 1%) sys  23.10 (10%) wall       0 kB ( 0%) ggc
 alias stmt walking      :  28.51 (13%) usr   0.23 (14%) sys  28.93 (13%) wall    2336 kB ( 0%) ggc
 loop invariant motion   : 105.43 (48%) usr   0.01 ( 1%) sys 106.22 (48%) wall    2700 kB ( 0%) ggc
 TOTAL                 : 217.56             1.59           220.44             947764 kB

so RTL invariant motion is now the main offender ;)
Comment 37 Richard Biener 2012-09-05 13:29:20 UTC
Author: rguenth
Date: Wed Sep  5 13:29:13 2012
New Revision: 190978

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190978
Log:
2012-09-05  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/46590
	* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not
	update SSA form here.
	(canonicalize_induction_variables): Assert we do not need to
	update SSA form.
	(tree_unroll_loops_completely): Update SSA form here.
	* tree-ssa-loop-manip.c (gimple_duplicate_loop_to_header_edge):
	Do not verify loop-closed SSA form if SSA form is not up-to-date.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-ivcanon.c
    trunk/gcc/tree-ssa-loop-manip.c
Comment 38 Richard Biener 2012-12-03 15:51:47 UTC
Generic compile-time/memory-usage umbrella regression tracking bug.
Comment 39 Richard Biener 2014-01-16 13:49:24 UTC
Author: rguenth
Date: Thu Jan 16 13:48:51 2014
New Revision: 206663

URL: http://gcc.gnu.org/viewcvs?rev=206663&root=gcc&view=rev
Log:
2014-01-16  Richard Biener  <rguenther@suse.de>

	PR rtl-optimization/46590
	* lcm.c (compute_antinout_edge): Use postorder iteration.
	(compute_laterin): Use inverted postorder iteration.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lcm.c
Comment 40 Richard Biener 2014-01-16 15:51:42 UTC
On the full testcase tree LIM uses too much memory (I didn't merge some of
the patches that only benefit this kind of testcase ... bah).  Without LIM
we use around 1GB of memory at -O2 and

 df reaching defs        :  19.42 (14%) usr  35.49 (82%) sys  55.13 (30%) wall       0 kB ( 0%) ggc
 alias stmt walking      :  34.25 (25%) usr   0.47 ( 1%) sys  34.76 (19%) wall    1451 kB ( 0%) ggc
 tree CFG cleanup        :   7.75 ( 6%) usr   0.04 ( 0%) sys   7.73 ( 4%) wall    5002 kB ( 1%) ggc
 tree PTA                :  22.32 (16%) usr   0.10 ( 0%) sys  22.43 (12%) wall    7882 kB ( 1%) ggc
 TOTAL                 : 137.59            43.38           180.98             866898 kB

-O3 is similar.

I have a patch for the memory usage issue.
Comment 41 Richard Biener 2014-01-17 11:37:08 UTC
After I will have committed the patch the original testcase will use about 1GB
of memory (regardless of optimization level) on x86_64, 122s at -O1 and 161s
at -Ofast with the following main contributors:

> ./f951 -quiet -Ofast t2.f90 -ftime-report 

Execution times (seconds)
 phase opt and generate  : 159.14 (98%) usr 116.92 (100%) sys 276.41 (99%) wall  821437 kB (92%) ggc
 df reaching defs        :  36.37 (22%) usr  91.52 (78%) sys 128.33 (46%) wall       0 kB ( 0%) ggc
 df live regs            :   2.60 ( 2%) usr   0.19 ( 0%) sys   2.84 ( 1%) wall       0 kB ( 0%) ggc
 alias stmt walking      :  36.03 (22%) usr   0.46 ( 0%) sys  36.53 (13%) wall    1451 kB ( 0%) ggc
 parser (global)         :   2.65 ( 2%) usr   0.08 ( 0%) sys   2.74 ( 1%) wall   70763 kB ( 8%) ggc
 tree CFG cleanup        :   8.23 ( 5%) usr   0.03 ( 0%) sys   8.19 ( 3%) wall    4982 kB ( 1%) ggc
 tree PTA                :  22.95 (14%) usr   0.17 ( 0%) sys  23.12 ( 8%) wall    7882 kB ( 1%) ggc
 complete unrolling      :   4.88 ( 3%) usr   0.14 ( 0%) sys   5.08 ( 2%) wall   80683 kB ( 9%) ggc
 loop init               :   5.93 ( 4%) usr   0.01 ( 0%) sys   5.92 ( 2%) wall   25175 kB ( 3%) ggc
 integrated RA           :   2.97 ( 2%) usr   0.03 ( 0%) sys   3.02 ( 1%) wall   64778 kB ( 7%) ggc
 TOTAL                 : 161.83           117.01           279.22             892406 kB

(everything > 1% usr listed)
Comment 42 Richard Biener 2014-01-17 12:08:59 UTC
Another interesting bit - -O0 takes 17s (and also ~1GB memory) but -Og is
not much faster than -O1 (116s).

> ./f951 -quiet -Og t2.f90 -ftime-report 

Execution times (seconds)
 df reaching defs        :  62.51 (54%) usr 104.47 (85%) sys 167.73 (70%) wall       0 kB ( 0%) ggc

bah.  I didn't constrain RTL optimizers with -Og (seems to be RTL loop
invariant motion here - after all at -Og no gimple loop opts run and
it _can_ move quite some invariants).

Disabling loop2 for -Og improves this to 50s

Execution times (seconds)
 alias stmt walking      :  10.11 (20%) usr   0.10 (14%) sys  10.29 (20%) wall     338 kB ( 0%) ggc
 parser (global)         :   2.34 ( 5%) usr   0.06 ( 8%) sys   2.39 ( 5%) wall   70763 kB (11%) ggc
 tree PTA                :   6.12 (12%) usr   0.04 ( 6%) sys   6.16 (12%) wall    2735 kB ( 0%) ggc
 loop init               :   5.78 (11%) usr   0.01 ( 1%) sys   5.78 (11%) wall   24078 kB ( 4%) ggc
 integrated RA           :   2.68 ( 5%) usr   0.04 ( 6%) sys   2.73 ( 5%) wall   64596 kB (10%) ggc
 TOTAL                 :  50.76             0.71            51.71             646243 kB

I'll push a patch to do that (the above shows PTA which also can be
disabled, and alias stmt walking could be limited some more).
Comment 43 Richard Biener 2014-01-17 14:40:43 UTC
Author: rguenth
Date: Fri Jan 17 14:40:11 2014
New Revision: 206709

URL: http://gcc.gnu.org/viewcvs?rev=206709&root=gcc&view=rev
Log:
2014-01-17  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/46590
	* vec.h (vec<>::bseach): New member function implementing
	binary search according to C89 bsearch.
	(vec<>::qsort): Avoid calling ::qsort for vectors with sizes 0 or 1.
	* tree-ssa-loop-im.c (struct mem_ref): Make stored member a
	bitmap pointer again.  Make accesses_in_loop a flat array.
	(mem_ref_obstack): New global.
	(outermost_indep_loop): Adjust for mem_ref->stored changes.
	(mark_ref_stored): Likewise.
	(ref_indep_loop_p_2): Likewise.
	(set_ref_stored_in_loop): New helper function.
	(mem_ref_alloc): Allocate mem_refs on the mem_ref_obstack obstack.
	(memref_free): Adjust.
	(record_mem_ref_loc): Simplify.
	(gather_mem_refs_stmt): Adjust.
	(sort_locs_in_loop_postorder_cmp): New function.
	(analyze_memory_references): Sort accesses_in_loop after
	loop postorder number.
	(find_ref_loc_in_loop_cmp): New function.
	(for_all_locs_in_loop): Find relevant cluster of locs in
	accesses_in_loop and iterate without recursion.
	(execute_sm): Avoid uninit warning.
	(struct ref_always_accessed): Simplify.
	(ref_always_accessed::operator ()): Likewise.
	(ref_always_accessed_p): Likewise.
	(tree_ssa_lim_initialize): Initialize mem_ref_obstack, compute
	loop postorder numbers here.
	(tree_ssa_lim_finalize): Free mem_ref_obstack and loop postorder
	numbers.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-im.c
    trunk/gcc/vec.h
Comment 44 Richard Biener 2014-01-17 14:49:49 UTC
Author: rguenth
Date: Fri Jan 17 14:49:18 2014
New Revision: 206714

URL: http://gcc.gnu.org/viewcvs?rev=206714&root=gcc&view=rev
Log:
2014-01-17  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/46590
	* opts.c (default_options_table): Add entries for
	OPT_fbranch_count_reg, OPT_fmove_loop_invariants and OPT_ftree_pta,
	all enabled at -O1 but not for -Og.
	* common.opt (fbranch-count-reg): Remove Init(1).
	(fmove-loop-invariants): Likewise.
	(ftree-pta): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/opts.c
Comment 45 Richard Biener 2014-10-21 09:22:27 UTC
Btw, now DOM and incremental SSA are the new main offenders (regression from 4.8).
This is maybe the same as PR61515.  Compile-time is up to 9 minutes (from two).

Jeff - any progress on PR61515?  This is really a major regression for -O1
compile-time.  Can we at least disable the offending code at !flag_expensive_optimizations on trunk and the 4.9 branch?  And fix it properly
for GCC 5?