Bug 26854 - Inordinate compile times on large routines
Summary: Inordinate compile times on large routines
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.0
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog, memory-hog
: 39157 (view as bug list)
Depends on: 66760
Blocks: 47344 27004 34400
  Show dependency treegraph
 
Reported: 2006-03-24 20:25 UTC by lucier
Modified: 2015-07-04 10:03 UTC (History)
15 users (show)

See Also:
Host: powerpc-apple-darwin8.5.0
Target: powerpc-apple-darwin8.5.0
Build: powerpc-apple-darwin8.5.0
Known to work:
Known to fail: 4.6.0
Last reconfirmed: 2007-11-14 14:08:44


Attachments
detailed memory usage report (20.59 KB, text/plain)
2007-12-20 02:29 UTC, lucier
Details
memory details for an unpatched mainline (20.53 KB, text/plain)
2007-12-20 03:52 UTC, lucier
Details
patch to count different types of def-use chains (652 bytes, patch)
2007-12-20 17:28 UTC, Kenneth Zadeck
Details | Diff
memory details for 131610 (20.52 KB, text/plain)
2008-01-17 22:39 UTC, lucier
Details
statistics for ira branch with -fno-ira (20.82 KB, text/plain)
2008-05-15 02:50 UTC, lucier
Details
statistics for ira branch with -fira (20.83 KB, text/plain)
2008-05-15 02:51 UTC, lucier
Details
detailed memory stats for trunk revision 137644 (20.60 KB, text/plain)
2008-07-10 17:36 UTC, lucier
Details
statistics with checking enabled and using longs to count bytes (23.90 KB, text/plain)
2008-09-18 01:19 UTC, lucier
Details
memory and cpu time statistics for 2008-09-19 (24.19 KB, text/plain)
2008-09-26 15:43 UTC, lucier
Details
memory and cpu statistics for 9/25 (23.98 KB, text/plain)
2008-09-26 15:44 UTC, lucier
Details
Memory and cpu statistics from 9/16 (23.67 KB, text/plain)
2008-09-26 15:45 UTC, lucier
Details
Memory and CPU statistics for 2009/02/04 (24.38 KB, text/plain)
2009-02-04 17:27 UTC, lucier
Details
Memory and CPU statistics when compiling _num.i with -O2 (29.78 KB, text/plain)
2009-02-20 19:54 UTC, lucier
Details
time/mem report compiling compiler.i (25.09 KB, text/plain)
2010-03-27 04:27 UTC, lucier
Details
time/mem report compiling compiler.i (27.36 KB, text/plain)
2010-03-27 04:59 UTC, lucier
Details
time/mem report compiling compiler.i with -O1 (24.70 KB, text/plain)
2010-03-27 05:20 UTC, lucier
Details
time/memory report compiling all.i with -O3 (25.45 KB, text/plain)
2010-03-27 16:44 UTC, lucier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lucier 2006-03-24 20:25:12 UTC
At this location:

http://www.math.purdue.edu/~lucier/gcc/test-files/bugzilla/1/all.i.gz

is the file

-rw-r--r--    1 lucier  lucier  6556015 Mar 22 20:28 all.i.gz

On a 2GHz Mac G5, this file took 3.4GB and 30 minutes to compile using gcc-4.1.0 configured and compiled with

/bin/rm -rf *; env CC='gcc -mcpu=970 -m64' ../configure --prefix=/pkgs/gcc-4.1.0 --with-gmp=/opt/local/ --with-mpfr=/opt/local/ --with-as=/usr/local/odcctools-20060123/bin/as --with-ld=/usr/local/odcctools-20060123/bin/ld --enable-languages=c; make -j 8 bootstrap BOOT_CFLAGS='-mcpu=970 -m64 -O2 -g' >& build.log 

I had to build gcc as a 64-bit binary so it could allocate more than 2GB at some point in the compilation process.  (The compiler itself, however, builds 32-bit binaries by default.)

The compile options were

/pkgs/gcc-4.1.0/bin/gcc -mcpu=970 -m64 -no-cpp-precomp -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -bundle -flat_namespace -undefined suppress -I/usr/local/Gambit-C/include/ -ftime-report -fmem-report all.i

The reports were

Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             16k         14k        480 
16            52k         12k       1144 
64          1276k       1239k         19k
256          484k        452k       6776 
512           36k         25k        504 
1024         220k        216k       3080 
2048          24k         20k        336 
4096          68k         68k        952 
8192          56k         56k        392 
16384         16k         16k         56 
32768        288k        288k        504 
65536         64k         64k         56 
131072        128k        128k         56 
262144        512k        512k        112 
524288        512k        512k         56 
1048576       1024k       1024k         56 
2097152       4096k       4096k        112 
112           34M         16M        484k
208           40k         38k        560 
192         3344k       3287k         45k
160           28k       6240         392 
176          564k        261k       7896 
48          2088k       1165k         32k
32           148k         68k       2664 
80            35M       2063k        495k
Total         84M         32M       1104k

String pool
entries         158178
identifiers     158178 (100.00%)
slots           262144
bytes           1982k (168k overhead)
table size      2048k
coll/search     1.1104
ins/search      0.1934
avg. entry      12.83 bytes (+/- 7.81)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 1021, 598 elements, 0.900368 collisions
DECL_DEBUG_EXPR  hash: size 8191, 0 elements, 1.307819 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
 garbage collection    :   1.87 ( 0%) usr   0.03 ( 0%) sys   2.38 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction:   1.78 ( 0%) usr   0.36 ( 0%) sys   2.60 ( 0%) wall   21241 kB ( 2%) ggc
 callgraph optimization:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 ipa reference         :   0.42 ( 0%) usr   0.14 ( 0%) sys   0.65 ( 0%) wall       7 kB ( 0%) ggc
 cfg construction      :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall    8421 kB ( 1%) ggc
 cfg cleanup           : 103.72 ( 7%) usr   0.71 ( 1%) sys 128.58 ( 7%) wall    2170 kB ( 0%) ggc
 trivially dead code   :   2.60 ( 0%) usr   0.06 ( 0%) sys   3.32 ( 0%) wall       0 kB ( 0%) ggc
 life analysis         :   6.42 ( 0%) usr   3.91 ( 3%) sys  12.64 ( 1%) wall   19365 kB ( 2%) ggc
 life info update      :   1.04 ( 0%) usr   0.03 ( 0%) sys   1.32 ( 0%) wall     525 kB ( 0%) ggc
 alias analysis        :   1.70 ( 0%) usr   0.07 ( 0%) sys   2.13 ( 0%) wall   16385 kB ( 2%) ggc
 register scan         :   0.96 ( 0%) usr   0.02 ( 0%) sys   1.28 ( 0%) wall       4 kB ( 0%) ggc
 rebuild jump labels   :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.38 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   8.05 ( 1%) usr  13.05 (10%) sys  25.54 ( 1%) wall    2197 kB ( 0%) ggc
 lexical analysis      :  12.59 ( 1%) usr  25.04 (19%) sys  46.78 ( 2%) wall       0 kB ( 0%) ggc
 parser                :   9.97 ( 1%) usr  13.25 (10%) sys  28.99 ( 1%) wall   72677 kB ( 7%) ggc
 tree gimplify         :   1.50 ( 0%) usr   0.07 ( 0%) sys   1.93 ( 0%) wall   30969 kB ( 3%) ggc
 tree eh               :   0.17 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.55 ( 0%) usr   0.13 ( 0%) sys   0.81 ( 0%) wall   76077 kB ( 8%) ggc
 tree CFG cleanup      :   5.82 ( 0%) usr   0.08 ( 0%) sys   7.25 ( 0%) wall     955 kB ( 0%) ggc
 tree copy propagation :   5.42 ( 0%) usr   0.44 ( 0%) sys   7.11 ( 0%) wall   12020 kB ( 1%) ggc
 tree store copy prop  :   0.75 ( 0%) usr   0.05 ( 0%) sys   0.97 ( 0%) wall    1600 kB ( 0%) ggc
 tree find ref. vars   :   0.23 ( 0%) usr   0.01 ( 0%) sys   0.24 ( 0%) wall    2502 kB ( 0%) ggc
 tree PTA              :   5.85 ( 0%) usr   0.60 ( 0%) sys   7.92 ( 0%) wall   16435 kB ( 2%) ggc
 tree alias analysis   :   6.98 ( 0%) usr  11.11 ( 8%) sys  15.83 ( 1%) wall   11736 kB ( 1%) ggc
 tree PHI insertion    :   1.06 ( 0%) usr   0.21 ( 0%) sys   1.59 ( 0%) wall   24377 kB ( 2%) ggc
 tree SSA rewrite      :   2.46 ( 0%) usr   0.16 ( 0%) sys   3.29 ( 0%) wall   39166 kB ( 4%) ggc
 tree SSA other        :   1.22 ( 0%) usr   1.51 ( 1%) sys   3.38 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :  14.15 ( 1%) usr   3.76 ( 3%) sys  22.12 ( 1%) wall   19167 kB ( 2%) ggc
 tree operand scan     : 628.65 (44%) usr  12.18 ( 9%) sys 814.05 (42%) wall   23896 kB ( 2%) ggc
 dominator optimization: 307.01 (21%) usr   2.88 ( 2%) sys 380.36 (20%) wall   63887 kB ( 7%) ggc
 tree STORE-CCP        :   0.67 ( 0%) usr   0.02 ( 0%) sys   0.87 ( 0%) wall     513 kB ( 0%) ggc
 tree CCP              :   0.74 ( 0%) usr   0.03 ( 0%) sys   0.95 ( 0%) wall     514 kB ( 0%) ggc
 tree split crit edges :   0.37 ( 0%) usr   0.24 ( 0%) sys   0.74 ( 0%) wall   40362 kB ( 4%) ggc
 tree reassociation    :   0.57 ( 0%) usr   0.02 ( 0%) sys   0.73 ( 0%) wall       0 kB ( 0%) ggc
 tree FRE              :  12.97 ( 1%) usr   0.75 ( 1%) sys  17.03 ( 1%) wall   40945 kB ( 4%) ggc
 tree code sinking     :   0.98 ( 0%) usr   0.06 ( 0%) sys   1.27 ( 0%) wall       0 kB ( 0%) ggc
 tree linearize phis   :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.20 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.16 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   1.86 ( 0%) usr   0.03 ( 0%) sys   2.39 ( 0%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.86 ( 0%) usr   0.02 ( 0%) sys   1.09 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.73 ( 0%) usr   0.03 ( 0%) sys   0.95 ( 0%) wall       0 kB ( 0%) ggc
 PHI merge             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      49 kB ( 0%) ggc
 tree loop bounds      :   0.32 ( 0%) usr   0.01 ( 0%) sys   0.39 ( 0%) wall       0 kB ( 0%) ggc
 loop invariant motion :   0.57 ( 0%) usr   0.01 ( 0%) sys   0.70 ( 0%) wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.20 ( 0%) wall       0 kB ( 0%) ggc
 scev constant prop    :   1.47 ( 0%) usr   0.04 ( 0%) sys   1.87 ( 0%) wall    1973 kB ( 0%) ggc
 complete unrolling    :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree loop init        :   5.14 ( 0%) usr   5.80 ( 4%) sys  13.36 ( 1%) wall   58726 kB ( 6%) ggc
 tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.24 ( 0%) usr   0.01 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA uncprop      :   0.43 ( 0%) usr   0.01 ( 0%) sys   0.54 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    : 172.90 (12%) usr   1.50 ( 1%) sys 215.20 (11%) wall   92392 kB ( 9%) ggc
 tree rename SSA copies:   0.60 ( 0%) usr   0.08 ( 0%) sys   0.78 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.64 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   7.54 ( 1%) usr   4.20 ( 3%) sys  14.47 ( 1%) wall  129307 kB (13%) ggc
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       7 kB ( 0%) ggc
 jump                  :   0.73 ( 0%) usr   0.03 ( 0%) sys   0.90 ( 0%) wall       0 kB ( 0%) ggc
 CSE                   :   2.19 ( 0%) usr   1.67 ( 1%) sys   4.78 ( 0%) wall    2169 kB ( 0%) ggc
 loop analysis         :   1.69 ( 0%) usr   0.15 ( 0%) sys   2.25 ( 0%) wall    8645 kB ( 1%) ggc
 branch prediction     :   3.22 ( 0%) usr   0.22 ( 0%) sys   4.17 ( 0%) wall    5998 kB ( 1%) ggc
 flow analysis         :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) wall       0 kB ( 0%) ggc
 combiner              :   4.03 ( 0%) usr   0.11 ( 0%) sys   5.07 ( 0%) wall   31600 kB ( 3%) ggc
 if-conversion         :   2.01 ( 0%) usr   0.11 ( 0%) sys   2.58 ( 0%) wall     344 kB ( 0%) ggc
 local alloc           :   2.87 ( 0%) usr   0.10 ( 0%) sys   3.63 ( 0%) wall   13500 kB ( 1%) ggc
 global alloc          :  25.97 ( 2%) usr  21.91 (17%) sys  57.82 ( 3%) wall   31337 kB ( 3%) ggc
 reload CSE regs       :  42.61 ( 3%) usr   0.54 ( 0%) sys  52.45 ( 3%) wall   12846 kB ( 1%) ggc
 flow 2                :   0.66 ( 0%) usr   0.00 ( 0%) sys   0.82 ( 0%) wall      19 kB ( 0%) ggc
 if-conversion 2       :   0.77 ( 0%) usr   0.06 ( 0%) sys   1.03 ( 0%) wall      14 kB ( 0%) ggc
 rename registers      :   0.95 ( 0%) usr   0.20 ( 0%) sys   1.37 ( 0%) wall      24 kB ( 0%) ggc
 scheduling 2          :   3.51 ( 0%) usr   0.18 ( 0%) sys   4.47 ( 0%) wall   36103 kB ( 4%) ggc
 shorten branches      :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 final                 :   2.35 ( 0%) usr   0.14 ( 0%) sys   3.36 ( 0%) wall    4096 kB ( 0%) ggc
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :1438.01           131.69          1949.05             979456 kB

So, this is a very large file (and only one C routine), but the times for the various passes are very unbalanced; in particular the following three catch anyone's eye:

 tree operand scan     : 628.65 (44%) usr  12.18 ( 9%) sys 814.05 (42%) wall   23896 kB ( 2%) ggc
 dominator optimization: 307.01 (21%) usr   2.88 ( 2%) sys 380.36 (20%) wall   63887 kB ( 7%) ggc
 tree SSA to normal    : 172.90 (12%) usr   1.50 ( 1%) sys 215.20 (11%) wall   92392 kB ( 9%) ggc

I don't know what the compile times are with 4.2; perhaps people who have a 64-bit profiled gcc would like to investigate more what is going on.
Comment 1 Richard Biener 2006-03-25 16:21:48 UTC
Can you do a comparison to 4.0.3?
Comment 2 lucier 2006-03-25 22:22:27 UTC
Subject: Re:  Inordinate compile times on large routines

[lindv2:~/Desktop] lucier% /pkgs/gcc-4.0.3/bin/gcc -mcpu=970 -m64 -no- 
cpp-precomp -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule- 
insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame- 
pointer -fPIC -fno-common -bundle -flat_namespace -undefined suppress  
-I/usr/local/Gambit-C/include/ -ftime-report -fmem-report all.i
gcc: unrecognized option '-no-cpp-precomp'
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             16k         11k        480
16            52k         12k       1144
64            10M       1841k        167k
256         4096         512          56
512           12k       4608         168
1024          96k         95k       1344
2048        4096        2048          56
4096          64k         64k        896
8192          16k         16k        112
32768        288k        288k        504
131072        128k        128k         56
1048576       3072k       3072k        168
2097152       4096k       4096k        112
112           19M         16M        272k
208         6360k       4213k         86k
48          7344k       4315k        114k
32           148k         74k       2664
80            16M       1336k        232k
Total         67M         35M        881k

String pool
entries         155812
identifiers     155812 (100.00%)
slots           262144
bytes           1952k (167k overhead)
table size      2048k
coll/search     0.8640
ins/search      0.1923
avg. entry      12.83 bytes (+/- 7.87)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 1021, 551 elements, 0.816291 collisions

Execution times (seconds)
garbage collection    :   2.11 ( 0%) usr   0.04 ( 0%) sys   2.71  
( 0%) wall
cfg construction      :   0.68 ( 0%) usr   1.22 ( 0%) sys   2.29  
( 0%) wall
cfg cleanup           :  94.99 ( 9%) usr   0.54 ( 0%) sys 120.62  
( 7%) wall
trivially dead code   :   2.87 ( 0%) usr   0.06 ( 0%) sys   3.83  
( 0%) wall
life analysis         :   6.78 ( 1%) usr   3.26 ( 1%) sys  12.56  
( 1%) wall
life info update      :   1.09 ( 0%) usr   0.01 ( 0%) sys   1.34  
( 0%) wall
alias analysis        :   1.89 ( 0%) usr   0.04 ( 0%) sys   2.55  
( 0%) wall
register scan         :   1.25 ( 0%) usr   0.02 ( 0%) sys   1.62  
( 0%) wall
rebuild jump labels   :   0.34 ( 0%) usr   0.01 ( 0%) sys   0.42  
( 0%) wall
preprocessing         :   7.70 ( 1%) usr  12.37 ( 4%) sys  25.83  
( 2%) wall
lexical analysis      :  13.19 ( 1%) usr  25.54 ( 9%) sys  48.16  
( 3%) wall
parser                :  11.06 ( 1%) usr  13.13 ( 5%) sys  30.20  
( 2%) wall
tree gimplify         :   1.61 ( 0%) usr   0.07 ( 0%) sys   2.14  
( 0%) wall
tree eh               :   0.18 ( 0%) usr   0.01 ( 0%) sys   0.21  
( 0%) wall
tree CFG construction :   0.63 ( 0%) usr   0.16 ( 0%) sys   0.97  
( 0%) wall
tree CFG cleanup      :   2.09 ( 0%) usr   0.02 ( 0%) sys   2.62  
( 0%) wall
tree find referenced vars:   0.25 ( 0%) usr   0.01 ( 0%) sys   0.37  
( 0%) wall
tree PTA              : 615.45 (59%) usr 155.84 (55%) sys 967.56  
(58%) wall
tree alias analysis   :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.73  
( 0%) wall
tree PHI insertion    :   4.27 ( 0%) usr   5.94 ( 2%) sys  12.63  
( 1%) wall
tree SSA rewrite      :   3.35 ( 0%) usr   0.10 ( 0%) sys   4.61  
( 0%) wall
tree SSA other        :   8.35 ( 1%) usr   7.78 ( 3%) sys  19.75  
( 1%) wall
tree operand scan     :   5.80 ( 1%) usr   7.91 ( 3%) sys  17.53  
( 1%) wall
dominator optimization:   5.62 ( 1%) usr   0.45 ( 0%) sys   7.42  
( 0%) wall
tree CCP              :   1.78 ( 0%) usr   0.02 ( 0%) sys   2.18  
( 0%) wall
tree split crit edges :   0.30 ( 0%) usr   0.04 ( 0%) sys   0.41  
( 0%) wall
tree remove redundant PHIs:   3.92 ( 0%) usr   0.14 ( 0%) sys   4.96  
( 0%) wall
tree linearize phis   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall
tree forward propagate:   1.22 ( 0%) usr   0.01 ( 0%) sys   1.51  
( 0%) wall
tree conservative DCE :   1.94 ( 0%) usr   0.01 ( 0%) sys   2.51  
( 0%) wall
tree aggressive DCE   :   0.82 ( 0%) usr   0.06 ( 0%) sys   1.05  
( 0%) wall
tree DSE              :   1.35 ( 0%) usr   0.05 ( 0%) sys   1.74  
( 0%) wall
PHI merge             :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.16  
( 0%) wall
tree record loop bounds:   0.29 ( 0%) usr   0.01 ( 0%) sys   0.37  
( 0%) wall
loop invariant motion :   1.25 ( 0%) usr   0.02 ( 0%) sys   1.58  
( 0%) wall
tree canonical iv creation:   0.26 ( 0%) usr   0.01 ( 0%) sys   0.34  
( 0%) wall
tree loop init        :   8.65 ( 1%) usr   2.11 ( 1%) sys  13.35  
( 1%) wall
tree copy headers     :   3.03 ( 0%) usr   1.35 ( 0%) sys   5.42  
( 0%) wall
tree SSA to normal    : 139.82 (13%) usr   1.01 ( 0%) sys 176.26  
(11%) wall
tree rename SSA copies:   0.72 ( 0%) usr   0.10 ( 0%) sys   0.97  
( 0%) wall
dominance frontiers   :   0.76 ( 0%) usr   0.01 ( 0%) sys   0.94  
( 0%) wall
expand                :   5.16 ( 0%) usr   1.32 ( 0%) sys   8.31  
( 0%) wall
varconst              :   0.13 ( 0%) usr   0.02 ( 0%) sys   0.25  
( 0%) wall
jump                  :   0.80 ( 0%) usr   0.03 ( 0%) sys   1.00  
( 0%) wall
CSE                   :   2.27 ( 0%) usr   1.09 ( 0%) sys   4.26  
( 0%) wall
loop analysis         :   2.00 ( 0%) usr   0.15 ( 0%) sys   2.60  
( 0%) wall
branch prediction     :   3.36 ( 0%) usr   0.21 ( 0%) sys   4.44  
( 0%) wall
flow analysis         :   0.28 ( 0%) usr   0.01 ( 0%) sys   0.33  
( 0%) wall
combiner              :   3.82 ( 0%) usr   0.09 ( 0%) sys   4.97  
( 0%) wall
if-conversion         :   2.49 ( 0%) usr   0.08 ( 0%) sys   3.27  
( 0%) wall
local alloc           :   2.85 ( 0%) usr   0.11 ( 0%) sys   3.78  
( 0%) wall
global alloc          :  27.34 ( 3%) usr  23.42 ( 8%) sys  61.76  
( 4%) wall
reload CSE regs       :  27.92 ( 3%) usr   0.77 ( 0%) sys  36.15  
( 2%) wall
flow 2                :   1.80 ( 0%) usr   2.14 ( 1%) sys   4.79  
( 0%) wall
if-conversion 2       :   1.00 ( 0%) usr   0.06 ( 0%) sys   1.26  
( 0%) wall
rename registers      :   0.94 ( 0%) usr   0.19 ( 0%) sys   1.41  
( 0%) wall
scheduling 2          :   3.49 ( 0%) usr   0.19 ( 0%) sys   4.40  
( 0%) wall
shorten branches      :   0.90 ( 0%) usr   0.03 ( 0%) sys   1.28  
( 0%) wall
final                 :   1.68 ( 0%) usr   0.10 ( 0%) sys   2.08  
( 0%) wall
rest of compilation   :   1.52 ( 0%) usr   1.26 ( 0%) sys   3.34  
( 0%) wall
TOTAL                 :1048.39           280.86          1665.66

Comment 3 Jeffrey A. Law 2006-04-19 06:43:30 UTC
I'm peeking at DOM.  

jeff
Comment 4 Jeffrey A. Law 2006-04-19 15:32:31 UTC
OK, as expected, DOM was doing something totally stupid with immediate uses.  On my x86 box I've got a patch which takes us from ~250 seconds in DOM to around 5.
I'm going to get this fix bootstrapped and regression tested, then port it to mainline (where things are slightly different/rearranged, but the same core problem exists).

Unfortunately, those gains are dwarfed by the wall-clock time burned swapping/paging due to memory usage in other passes.

The worst memory offenders (in pain order) are:

  reorder blocks (possible given the number of blocks/edges in this code)
  expand (???  possibly being charged for some other passes time)
  global-alloc

Mainline has a different memory pain profile -- the new RTL invariant code motion pass goes absolutely nuts memory-wise.

I'm not planning to work on any of the memory consumption issues.

Comment 5 Jeffrey A. Law 2006-04-19 22:34:46 UTC
Subject: Bug 26854

Author: law
Date: Wed Apr 19 22:34:41 2006
New Revision: 113099

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=113099
Log:
        PR tree-optimization/26854
        * tree-ssa-dse.c (dse_optimize_stmt): Use has_single_use rather
        than num_imm_uses.
        * tree-ssa-dom.c (simplify_rhs_and_lookup_avail_expr): Similarly.



Modified:
    branches/gcc-4_1-branch/gcc/ChangeLog
    branches/gcc-4_1-branch/gcc/tree-ssa-dom.c
    branches/gcc-4_1-branch/gcc/tree-ssa-dse.c

Comment 6 lucier 2006-04-20 03:18:00 UTC
Subject: Re:  Inordinate compile times on large routines

Thanks a lot.  Here are the timing statistics (with --disable- 
checking) after your patch.

PS:  I'm sorry it took 9 hours to compile on your box.

Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             16k         14k        480
16            52k         12k       1144
64          1276k       1239k         19k
256          484k        452k       6776
512           36k         25k        504
1024         220k        216k       3080
2048          24k         20k        336
4096          68k         68k        952
8192          56k         56k        392
16384         16k         16k         56
32768        288k        288k        504
65536         64k         64k         56
131072        128k        128k         56
262144        512k        512k        112
524288        512k        512k         56
1048576       1024k       1024k         56
2097152       4096k       4096k        112
112           34M         16M        484k
208           40k         38k        560
192         3344k       3287k         45k
160           28k       6240         392
176          564k        261k       7896
48          2088k       1165k         32k
32           144k         68k       2592
80            35M       2063k        499k
Total         85M         32M       1107k

String pool
entries         158128
identifiers     158128 (100.00%)
slots           262144
bytes           1981k (169k overhead)
table size      2048k
coll/search     1.1434
ins/search      0.1946
avg. entry      12.83 bytes (+/- 7.82)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 1021, 598 elements, 0.900368 collisions
DECL_DEBUG_EXPR  hash: size 8191, 0 elements, 1.140991 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
garbage collection    :   1.84 ( 0%) usr   0.04 ( 0%) sys   2.47  
( 0%) wall       0 kB ( 0%) ggc
callgraph construction:   1.79 ( 0%) usr   0.35 ( 0%) sys   2.67  
( 0%) wall   21241 kB ( 2%) ggc
callgraph optimization:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05  
( 0%) wall       0 kB ( 0%) ggc
ipa reference         :   0.42 ( 0%) usr   0.14 ( 0%) sys   0.71  
( 0%) wall       7 kB ( 0%) ggc
cfg construction      :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.48  
( 0%) wall    7224 kB ( 1%) ggc
cfg cleanup           :  95.98 ( 9%) usr   0.62 ( 0%) sys 125.14  
( 8%) wall    2098 kB ( 0%) ggc
trivially dead code   :   2.49 ( 0%) usr   0.06 ( 0%) sys   3.46  
( 0%) wall       0 kB ( 0%) ggc
life analysis         :   5.86 ( 1%) usr   3.35 ( 3%) sys  11.86  
( 1%) wall   18686 kB ( 2%) ggc
life info update      :   0.95 ( 0%) usr   0.02 ( 0%) sys   1.18  
( 0%) wall     526 kB ( 0%) ggc
alias analysis        :   1.67 ( 0%) usr   0.03 ( 0%) sys   2.07  
( 0%) wall   16385 kB ( 2%) ggc
register scan         :   0.93 ( 0%) usr   0.01 ( 0%) sys   1.29  
( 0%) wall       4 kB ( 0%) ggc
rebuild jump labels   :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall       0 kB ( 0%) ggc
preprocessing         :   7.27 ( 1%) usr  13.04 (10%) sys  25.28  
( 2%) wall    2197 kB ( 0%) ggc
lexical analysis      :  13.10 ( 1%) usr  25.59 (20%) sys  47.58  
( 3%) wall       0 kB ( 0%) ggc
parser                :   9.44 ( 1%) usr  12.84 (10%) sys  28.21  
( 2%) wall   72677 kB ( 7%) ggc
tree gimplify         :   1.51 ( 0%) usr   0.08 ( 0%) sys   2.02  
( 0%) wall   30969 kB ( 3%) ggc
tree eh               :   0.17 ( 0%) usr   0.01 ( 0%) sys   0.22  
( 0%) wall       0 kB ( 0%) ggc
tree CFG construction :   0.56 ( 0%) usr   0.14 ( 0%) sys   1.02  
( 0%) wall   76077 kB ( 8%) ggc
tree CFG cleanup      :   5.77 ( 1%) usr   0.06 ( 0%) sys   7.60  
( 0%) wall     955 kB ( 0%) ggc
tree copy propagation :   5.43 ( 0%) usr   0.39 ( 0%) sys   7.83  
( 0%) wall   10484 kB ( 1%) ggc
tree store copy prop  :   0.73 ( 0%) usr   0.04 ( 0%) sys   0.96  
( 0%) wall    1088 kB ( 0%) ggc
tree find ref. vars   :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.23  
( 0%) wall    2502 kB ( 0%) ggc
tree PTA              :   5.49 ( 0%) usr   0.57 ( 0%) sys   7.86  
( 0%) wall   16435 kB ( 2%) ggc
tree alias analysis   :   6.82 ( 1%) usr  10.23 ( 8%) sys  18.62  
( 1%) wall   12810 kB ( 1%) ggc
tree PHI insertion    :   1.05 ( 0%) usr   0.21 ( 0%) sys   1.62  
( 0%) wall   24377 kB ( 2%) ggc
tree SSA rewrite      :   2.50 ( 0%) usr   0.16 ( 0%) sys   3.34  
( 0%) wall   39166 kB ( 4%) ggc
tree SSA other        :   1.10 ( 0%) usr   1.49 ( 1%) sys   3.69  
( 0%) wall       0 kB ( 0%) ggc
tree SSA incremental  :  13.99 ( 1%) usr   3.74 ( 3%) sys  22.60  
( 1%) wall   19165 kB ( 2%) ggc
tree operand scan     : 626.32 (57%) usr  12.24 (10%) sys 833.21  
(52%) wall   23910 kB ( 2%) ggc
dominator optimization:   6.09 ( 1%) usr   0.35 ( 0%) sys   8.22  
( 1%) wall   63874 kB ( 7%) ggc
tree STORE-CCP        :   0.67 ( 0%) usr   0.02 ( 0%) sys   0.87  
( 0%) wall     513 kB ( 0%) ggc
tree CCP              :   0.74 ( 0%) usr   0.02 ( 0%) sys   1.03  
( 0%) wall     514 kB ( 0%) ggc
tree split crit edges :   0.37 ( 0%) usr   0.21 ( 0%) sys   0.85  
( 0%) wall   40362 kB ( 4%) ggc
tree reassociation    :   0.56 ( 0%) usr   0.02 ( 0%) sys   0.69  
( 0%) wall       0 kB ( 0%) ggc
tree FRE              :  12.83 ( 1%) usr   0.67 ( 1%) sys  17.70  
( 1%) wall   40945 kB ( 4%) ggc
tree code sinking     :   0.98 ( 0%) usr   0.06 ( 0%) sys   1.45  
( 0%) wall       0 kB ( 0%) ggc
tree linearize phis   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.30  
( 0%) wall       0 kB ( 0%) ggc
tree forward propagate:   0.16 ( 0%) usr   0.00 ( 0%) sys   0.20  
( 0%) wall       0 kB ( 0%) ggc
tree conservative DCE :   1.87 ( 0%) usr   0.03 ( 0%) sys   2.54  
( 0%) wall       0 kB ( 0%) ggc
tree aggressive DCE   :   0.87 ( 0%) usr   0.01 ( 0%) sys   1.17  
( 0%) wall       0 kB ( 0%) ggc
tree DSE              :   0.73 ( 0%) usr   0.04 ( 0%) sys   0.91  
( 0%) wall       0 kB ( 0%) ggc
PHI merge             :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall      49 kB ( 0%) ggc
tree loop bounds      :   0.35 ( 0%) usr   0.01 ( 0%) sys   0.41  
( 0%) wall       0 kB ( 0%) ggc
loop invariant motion :   0.56 ( 0%) usr   0.01 ( 0%) sys   0.72  
( 0%) wall       0 kB ( 0%) ggc
tree canonical iv     :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.21  
( 0%) wall       0 kB ( 0%) ggc
scev constant prop    :   1.47 ( 0%) usr   0.04 ( 0%) sys   2.02  
( 0%) wall    1973 kB ( 0%) ggc
complete unrolling    :   0.08 ( 0%) usr   0.01 ( 0%) sys   0.08  
( 0%) wall       0 kB ( 0%) ggc
tree loop init        :   5.15 ( 0%) usr   5.40 ( 4%) sys  14.04  
( 1%) wall   58726 kB ( 6%) ggc
tree loop fini        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree copy headers     :   0.24 ( 0%) usr   0.01 ( 0%) sys   0.31  
( 0%) wall       0 kB ( 0%) ggc
tree SSA uncprop      :   0.44 ( 0%) usr   0.01 ( 0%) sys   0.52  
( 0%) wall       0 kB ( 0%) ggc
tree SSA to normal    : 171.82 (16%) usr   1.31 ( 1%) sys 224.37  
(14%) wall  101554 kB (10%) ggc
tree rename SSA copies:   0.60 ( 0%) usr   0.07 ( 0%) sys   1.05  
( 0%) wall       0 kB ( 0%) ggc
dominance frontiers   :   0.54 ( 0%) usr   0.02 ( 0%) sys   0.67  
( 0%) wall       0 kB ( 0%) ggc
expand                :   7.37 ( 1%) usr   4.05 ( 3%) sys  14.98  
( 1%) wall  122832 kB (13%) ggc
varconst              :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.00  
( 0%) wall       7 kB ( 0%) ggc
jump                  :   0.66 ( 0%) usr   0.04 ( 0%) sys   0.88  
( 0%) wall       0 kB ( 0%) ggc
CSE                   :   1.98 ( 0%) usr   1.16 ( 1%) sys   4.00  
( 0%) wall    2442 kB ( 0%) ggc
loop analysis         :   1.55 ( 0%) usr   0.13 ( 0%) sys   2.17  
( 0%) wall    7001 kB ( 1%) ggc
branch prediction     :   2.97 ( 0%) usr   0.18 ( 0%) sys   4.00  
( 0%) wall    7022 kB ( 1%) ggc
flow analysis         :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.38  
( 0%) wall       0 kB ( 0%) ggc
combiner              :   3.85 ( 0%) usr   0.10 ( 0%) sys   5.14  
( 0%) wall   31575 kB ( 3%) ggc
if-conversion         :   1.82 ( 0%) usr   0.10 ( 0%) sys   2.31  
( 0%) wall     325 kB ( 0%) ggc
local alloc           :   2.72 ( 0%) usr   0.11 ( 0%) sys   3.65  
( 0%) wall   13500 kB ( 1%) ggc
global alloc          :  25.23 ( 2%) usr  21.24 (17%) sys  58.29  
( 4%) wall   30563 kB ( 3%) ggc
reload CSE regs       :  28.86 ( 3%) usr   0.35 ( 0%) sys  37.86  
( 2%) wall   12947 kB ( 1%) ggc
flow 2                :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.91  
( 0%) wall      19 kB ( 0%) ggc
if-conversion 2       :   0.68 ( 0%) usr   0.06 ( 0%) sys   0.86  
( 0%) wall      14 kB ( 0%) ggc
rename registers      :   0.89 ( 0%) usr   0.17 ( 0%) sys   1.46  
( 0%) wall      24 kB ( 0%) ggc
scheduling 2          :   3.47 ( 0%) usr   0.18 ( 0%) sys   4.52  
( 0%) wall   35672 kB ( 4%) ggc
shorten branches      :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall       0 kB ( 0%) ggc
final                 :   2.08 ( 0%) usr   0.13 ( 0%) sys   2.74  
( 0%) wall    4096 kB ( 0%) ggc
symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
TOTAL                 :1106.93           125.15           
1593.33             977727 kB

Comment 7 Jeffrey A. Law 2006-04-20 03:28:08 UTC
Subject: Re:  Inordinate compile times on
	large routines

On Thu, 2006-04-20 at 03:18 +0000, lucier at math dot purdue dot edu
wrote:
> 
> ------- Comment #6 from lucier at math dot purdue dot edu  2006-04-20 03:18 -------
> Subject: Re:  Inordinate compile times on large routines
> 
> Thanks a lot.  Here are the timing statistics (with --disable- 
> checking) after your patch.
> 
> PS:  I'm sorry it took 9 hours to compile on your box.
No worries.  I've got several boxes and having one busy overnight
isn't that big of a deal.

Clearly there's something different between PPC and x86 for
your testcase as you're not getting hit in bb-reorder or
expand.

Operand scanning is clearly the #1 issue when run on PPC (> 50% of
compile time, OUCH).  This may actually be an indication of a pass
going nuts and marking too many things for rescanning though.

You'll likely get radically different pain points with mainline
as well.  The RTL loop invariant code goes crazy memory-wise
for me, tree PRE and FRE also suck up large amounts of time.

Jeff

Comment 8 lucier 2006-04-20 03:39:08 UTC
Subject: Re:  Inordinate compile times on large routines


On Apr 19, 2006, at 10:28 PM, law at redhat dot com wrote:

> You'll likely get radically different pain points with mainline
> as well.  The RTL loop invariant code goes crazy memory-wise
> for me, tree PRE and FRE also suck up large amounts of time.

Mainline doesn't build with -m64 -mcpu=970; this was reported as

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26892

which is still marked as UNCONFIRMED; I just realized today that this  
could be listed as a 4.1 regression.  In my limited understanding, I  
suspect it's a configure problem, as I mentioned in

http://gcc.gnu.org/ml/gcc/2006-04/msg00265.html

Brad
Comment 9 Jeffrey A. Law 2006-04-20 16:13:17 UTC
Subject: Bug 26854

Author: law
Date: Thu Apr 20 16:13:12 2006
New Revision: 113120

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=113120
Log:

	PR tree-optimization/26854
	* tree-ssa-dse.c (dse_optimize_stmt): Avoid num_imm_uses when
	checking for zero or one use.
	* tree-ssa-dom.c (propagate_rhs_into_lhs): Similarly.
	* tree-cfgcleanup.c (merge_phi_nodes): Similarly.
	* tree-ssa-reassoc.c (negate_value): Similarly.
	(reassociate_bb): Similarly.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfgcleanup.c
    trunk/gcc/tree-ssa-dom.c
    trunk/gcc/tree-ssa-dse.c
    trunk/gcc/tree-ssa-reassoc.c

Comment 10 Jeffrey A. Law 2006-04-20 16:17:11 UTC
PRE/FRE for mainline need some TLC on their compile-time performance as indicated by this PR as well.  They're #3 & #4 respectively behind the operator scanning code and store-ccp and way out of line when compared with the rest of the tree optimization passes.
Comment 11 Daniel Berlin 2006-04-20 16:21:29 UTC
(In reply to comment #10)
> PRE/FRE for mainline need some TLC on their compile-time performance as
> indicated by this PR as well.  They're #3 & #4 respectively behind the operator
> scanning code and store-ccp and way out of line when compared with the rest of
> the tree optimization passes.
> 

I'll look into this in the next few weeks.
Comment 12 Andrew Macleod 2006-04-26 18:59:29 UTC
I have a patch to change the implementation of immediate uses forthcoming which, as a side effect, cleans up the operand scanner time in this file:

on my x86 cross powerpc64:

before patch:
tree operand scan     : 366.20 (31%) usr   2.59 (18%) sys 371.20 (31%) wall
TOTAL                 :1177.57            14.10          1200.53

after patch:
tree operand scan     :   3.07 ( 0%) usr   1.72 (12%) sys   4.69 ( 1%) wall
TOTAL                 : 829.50            14.13           866.35


I will also take a look at the out-of-ssa time and see what can be done.  Part of the problem there is a conflict graph is being built with 650,000,000 conflicts... thats not condusive to fast compile times!  Thats a lot of SSA_NAMe version of a base variable!!!!
Comment 13 Andrew Macleod 2006-04-27 02:29:12 UTC
The patch for speeding up the operand cache has been posted to gcc-patches:

http://gcc.gnu.org/ml/gcc-patches/2006-04/msg01017.html
Comment 14 Andrew Macleod 2006-04-27 02:30:11 UTC
I should point out that its a patch for mainline. Conversion to 4.1 requires some minor tweaking.
Comment 15 Andrew Macleod 2006-04-27 20:22:23 UTC
Subject: Bug 26854

Author: amacleod
Date: Thu Apr 27 20:22:17 2006
New Revision: 113321

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=113321
Log:
Implement new immediate use iterators.

2006-04-27  Andrew MacLeod  <amacleod@redhat.com>

	PR tree-optimization/26854
	* tree-vrp.c (remove_range_assertions): Use new Immuse iterator.
	* doc/tree-ssa.texi: Update immuse iterator documentation.
	* tree-ssa-math-opts.c (execute_cse_reciprocals_1): Use new iterator.
	* tree-ssa-dom.c (propagate_rhs_into_lhs): Use new iterator.
	* tree-flow-inline.h (end_safe_imm_use_traverse, end_safe_imm_use_p,
	first_safe_imm_use, next_safe_imm_use): Remove.
	(end_imm_use_stmt_p): New.  Check for end of immuse stmt traversal.
	(end_imm_use_stmt_traverse): New.  Terminate immuse stmt traversal.
	(move_use_after_head): New.  Helper function to sort immuses in a stmt.
	(link_use_stmts_after): New.  Link all immuses in a stmt consescutively.
	(first_imm_use_stmt): New.  Get first stmt in an immuse list.
	(next_imm_use_stmt): New.  Get next stmt in an immuse list.
	(first_imm_use_on_stmt): New.  Get first immuse on a stmt.
	(end_imm_use_on_stmt_p): New.  Check for end of immuses on a stmt.
	(next_imm_use_on_stmt): New.  Move to next immuse on a stmt.
	* tree-ssa-forwprop.c (forward_propagate_addr_expr): Use new iterator.
	* lambda-code.c (lambda_loopnest_to_gcc_loopnest): Use new iterator.
	(perfect_nestify): Use new iterator.
	* tree-vect-transform.c (vect_create_epilog_for_reduction): Use new 
	iterator.
	* tree-flow.h (struct immediate_use_iterator_d): Add comments.
	(next_imm_name): New field in struct immediate_use_iterator_d.
	(FOR_EACH_IMM_USE_SAFE, BREAK_FROM_SAFE_IMM_USE): Remove.
	(FOR_EACH_IMM_USE_STMT, BREAK_FROM_IMM_USE_STMT, 
	FOR_EACH_IMM_USE_ON_STMT): New immediate use iterator macros.
	* tree-cfg.c (replace_uses_by): Use new iterator.
	* tree-ssa-threadedge.c (lhs_of_dominating_assert): Use new iterator.
	* tree-ssa-operands.c (correct_use_link): Remove.
	(finalize_ssa_use_ops): No longer call correct_use_link.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/tree-ssa.texi
    trunk/gcc/lambda-code.c
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-flow-inline.h
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa-dom.c
    trunk/gcc/tree-ssa-forwprop.c
    trunk/gcc/tree-ssa-math-opts.c
    trunk/gcc/tree-ssa-operands.c
    trunk/gcc/tree-ssa-threadedge.c
    trunk/gcc/tree-vect-transform.c
    trunk/gcc/tree-vrp.c

Comment 16 lucier 2006-11-30 04:36:01 UTC
I now get a segfault when trying this with the current 4.2.0 branch:

[descartes:~/Desktop] lucier% time /pkgs/gcc-4.2.0-64-test/bin/gcc -mcpu=970 -m64 -no-cpp-precomp -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -bundle -flat_namespace -undefined suppress -I/usr/local/Gambit-C/include/ -ftime-report -fmem-report all.i
gcc: unrecognized option '-no-cpp-precomp'
all.c: In function '___H__20_all_2e_o1':
all.c:132856: internal compiler error: Bus error
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
2100.522u 139.425s 49:12.72 75.8%       0+0k 0+13io 0pf+0w

running gdb with it gives no more information.

Some details:

The STAGE1 compiler was host=powerpc64-darwin and target=powerpc64-darwin:

[descartes:~/programs/gcc/4.2.0] gcc-test% /pkgs/gcc-4.2.0/bin/gcc -v
Using built-in specs.
Target: powerpc-apple-darwin8.8.0
Configured with: ../configure --with-gmp=/pkgs/gmp-4.2.1 --with-mpfr=/pkgs/gmp-4.2.1 --prefix=/pkgs/gcc-4.2.0 --enable-languages=c --disable-checking
Thread model: posix
gcc version 4.2.0 20061021 (prerelease)

This was the compiler that segfaulted:

(gdb) [descartes:~/Desktop] lucier% /pkgs/gcc-4.2.0-64-test/bin/gcc -v
Using built-in specs.
Target: powerpc64-apple-darwin8.8.0
Configured with: ../configure --build=powerpc64-apple-darwin8.8.0 --host=powerpc64-apple-darwin8.8.0 --target=powerpc64-apple-darwin8.8.0 --enable-languages=c --prefix=/pkgs/gcc-4.2.0-64-test --with-gmp=/pkgs/gmp-4.2.1-64/ --with-mpfr=/pkgs/gmp-4.2.1-64/
Thread model: posix
gcc version 4.2.0 20061129 (prerelease)
[descartes:~/programs/gcc/4.2.0] gcc-test% cat gcc/BASE-VER 
4.2.0
[descartes:~/programs/gcc/4.2.0] gcc-test% cat gcc/DATESTAMP 
20061129
[descartes:~/programs/gcc/4.2.0] gcc-test% cat LAST_UPDATED 
Wed Nov 29 17:51:48 EST 2006
Wed Nov 29 22:51:48 UTC 2006 (revision 119334M)
[descartes:~/programs/gcc/4.2.0] gcc-test% gdb -v
GNU gdb 6.3.50-20050815 (Apple version gdb-573) (Fri Oct 20 15:54:33 GMT 2006)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-apple-darwin".


A full bootstrap was done.


Comment 17 Daniel Berlin 2006-11-30 04:54:08 UTC
Subject: Re:  Inordinate compile times on large routines

On 30 Nov 2006 04:36:05 -0000, lucier at math dot purdue dot edu
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #16 from lucier at math dot purdue dot edu  2006-11-30 04:36 -------
> I now get a segfault when trying this with the current 4.2.0 branch:
>
> [descartes:~/Desktop] lucier% time /pkgs/gcc-4.2.0-64-test/bin/gcc -mcpu=970
> -m64 -no-cpp-precomp -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2
> -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
> -fno-common -bundle -flat_namespace -undefined suppress
> -I/usr/local/Gambit-C/include/ -ftime-report -fmem-report all.i
> gcc: unrecognized option '-no-cpp-precomp'
> all.c: In function '___H__20_all_2e_o1':
> all.c:132856: internal compiler error: Bus error
> Please submit a full bug report,
> with preprocessed source if appropriate

It shouldn't crash, but i'm still finishing the patch to not make it
take a ridiculous amount of time, which will need to be applied to the
4.2 branch.
Should be done this week (I sent a preview to the mailing list)
Comment 18 lucier 2006-12-07 17:32:49 UTC
Well, I decided to try it with 4.3.0 on powerpc64-apple-darwin8.8.0 and didn't get any better results:

[descartes:~/Desktop] lucier% time /pkgs/gcc-4.3.0-64/bin/gcc -mcpu=970 -m64 -no-cpp-precomp -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -bundle -flat_namespace -undefined suppress -I/usr/local/Gambit-C/include/ -ftime-report -fmem-report all.i
gcc: unrecognized option '-no-cpp-precomp'
all.c: In function '___H__20_all_2e_o1':
all.c:132856: internal compiler error: Bus error
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
923.482u 110.120s 22:51.89 75.3%        0+0k 0+12io 0pf+0w
[descartes:~/Desktop] lucier% /pkgs/gcc-4.3.0-64/bin/gcc -v
Using built-in specs.
Target: powerpc64-apple-darwin8.8.0
Configured with: ../configure --build=powerpc64-apple-darwin8.8.0 --host=powerpc64-apple-darwin8.8.0 --target=powerpc64-apple-darwin8.8.0 --with-gmp=/pkgs/gmp-4.2.1-64/ --with-mpfr=/pkgs/gmp-4.2.1-64/ --prefix=/pkgs/gcc-4.3.0-64 --enable-languages=c --enable-checking=no
Thread model: posix
gcc version 4.3.0 20061206 (experimental)

This is the branch that you installed your changes on, right Dan?

I suppose I should try it on another architecture to see whether the problem might be darwin-specific, or ppc-specific, or 64-bit specific, or ...

Who knows?
Comment 19 Daniel Berlin 2006-12-07 17:54:23 UTC
Subject: Re:  Inordinate compile times on large routines

> This is the branch that you installed your changes on, right Dan?

yes
>
> I suppose I should try it on another architecture to see whether the problem
> might be darwin-specific, or ppc-specific, or 64-bit specific, or ...
>
> Who knows?
>

We now spend basically no time in PTA, and  about 800 seconds in
remove_ssa_form.

Sometime later on, we run out of memory and crash.
(IE it's somewhere other than the PTA, alias analysis, or tree-ssa
that we run out of memory).
Comment 20 Daniel Berlin 2006-12-07 17:54:41 UTC
Subject: Re:  Inordinate compile times on large routines

>
> We now spend basically no time in PTA, and  about 800 seconds in
> remove_ssa_form.
>
> Sometime later on, we run out of memory and crash.
> (IE it's somewhere other than the PTA, alias analysis, or tree-ssa
> that we run out of memory).
>

Sorry, forgot to mention this is on darwin.
Comment 21 lucier 2006-12-07 21:51:16 UTC
Subject: Re:  Inordinate compile times on large routines

I reran things on mainline on my patched RHEL box.  It took almost  
7GB of memory, peak, to compile this routine (this was very near the  
end of cc1).

All things considered, on mainline the CPU time for this routine is  
not so bad (alias analysis and FRE are two obvious hot-spots), but  
the memory required is very large.

Back to 4.2.0 branch for more testing ...

euler-157% time /pkgs/gcc-mainline/bin/gcc -no-cpp-precomp -Wall -W - 
Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math - 
fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common - 
ftime-report -fmem-report -c all.i
gcc: unrecognized option '-no-cpp-precomp'
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             16k         13k        480
16          8472k       8083k        182k
64            39M         22M        625k
256         4096        2816          56
512         4096        1024          56
1024         236k        236k       3304
2048          40k         26k        560
4096          80k         80k       1120
8192          64k         64k        448
16384         16k         16k         56
32768         64k         64k        112
65536        960k        960k        840
131072        512k        512k        224
262144        768k        768k        168
1048576       7168k       5120k        392
2097152       2048k       2048k         56
112          144k         92k       2016
208           44k         41k        616
192           14M      10094k        198k
160           40k         37k        560
176         7972k       4786k        108k
96            18M         16M        263k
448           28k         27k        392
128         9696k       6860k        132k
48            30M         13M        484k
224          424k        385k       5936
32            72M         72M       1296k
80            65M         38M        923k
Total        278M        201M       4231k

String pool
entries         125055
identifiers     125055 (100.00%)
slots           262144
bytes           1675k (137k overhead)
table size      2048k
coll/search     0.8888
ins/search      0.1979
avg. entry      13.72 bytes (+/- 8.99)
longest entry   71

??? tree nodes created

(No per-node statistics)
Type hash: size 1021, 577 elements, 0.695294 collisions
DECL_DEBUG_EXPR  hash: size 8191, 2893 elements, 1.005820 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
garbage collection    :   1.01 ( 0%) usr   0.00 ( 0%) sys   1.01  
( 0%) wall       0 kB ( 0%) ggc
callgraph construction:   0.61 ( 0%) usr   0.09 ( 1%) sys   0.72  
( 0%) wall   17017 kB ( 2%) ggc
callgraph optimization:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall       0 kB ( 0%) ggc
ipa reference         :   0.17 ( 0%) usr   0.05 ( 0%) sys   0.23  
( 0%) wall       7 kB ( 0%) ggc
cfg cleanup           :   7.69 ( 2%) usr   0.00 ( 0%) sys   8.03  
( 2%) wall      37 kB ( 0%) ggc
trivially dead code   :   0.96 ( 0%) usr   0.00 ( 0%) sys   1.00  
( 0%) wall       0 kB ( 0%) ggc
life analysis         :  19.95 ( 4%) usr   0.01 ( 0%) sys  20.77  
( 4%) wall   12767 kB ( 2%) ggc
life info update      :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57  
( 0%) wall       0 kB ( 0%) ggc
alias analysis        :   0.80 ( 0%) usr   0.00 ( 0%) sys   0.80  
( 0%) wall    7174 kB ( 1%) ggc
register scan         :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.46  
( 0%) wall       1 kB ( 0%) ggc
rebuild jump labels   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       0 kB ( 0%) ggc
preprocessing         :   0.51 ( 0%) usr   0.90 ( 8%) sys   1.26  
( 0%) wall    1794 kB ( 0%) ggc
lexical analysis      :   0.57 ( 0%) usr   1.42 (12%) sys   2.33  
( 0%) wall       0 kB ( 0%) ggc
parser                :   1.29 ( 0%) usr   0.95 ( 8%) sys   2.29  
( 0%) wall   59589 kB ( 8%) ggc
integration           :   0.24 ( 0%) usr   0.09 ( 1%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
tree gimplify         :   0.74 ( 0%) usr   0.04 ( 0%) sys   0.83  
( 0%) wall   42732 kB ( 6%) ggc
tree eh               :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08  
( 0%) wall       0 kB ( 0%) ggc
tree CFG construction :   0.30 ( 0%) usr   0.05 ( 0%) sys   0.35  
( 0%) wall   59312 kB ( 8%) ggc
tree CFG cleanup      :   3.53 ( 1%) usr   0.00 ( 0%) sys   3.51  
( 1%) wall    3716 kB ( 1%) ggc
tree copy propagation :   1.21 ( 0%) usr   0.00 ( 0%) sys   1.22  
( 0%) wall    2220 kB ( 0%) ggc
tree store copy prop  :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.41  
( 0%) wall     576 kB ( 0%) ggc
tree find ref. vars   :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall    1186 kB ( 0%) ggc
tree PTA              :   2.52 ( 1%) usr   0.03 ( 0%) sys   2.57  
( 1%) wall    2280 kB ( 0%) ggc
tree alias analysis   : 121.67 (27%) usr   0.50 ( 4%) sys 123.10  
(26%) wall   18481 kB ( 3%) ggc
tree PHI insertion    :   1.40 ( 0%) usr   0.07 ( 1%) sys   1.54  
( 0%) wall   69532 kB ( 9%) ggc
tree SSA rewrite      :   2.40 ( 1%) usr   0.02 ( 0%) sys   2.47  
( 1%) wall   28127 kB ( 4%) ggc
tree SSA other        :   0.09 ( 0%) usr   0.10 ( 1%) sys   0.23  
( 0%) wall       0 kB ( 0%) ggc
tree SSA incremental  :   8.18 ( 2%) usr   0.09 ( 1%) sys   8.20  
( 2%) wall   19181 kB ( 3%) ggc
tree operand scan     :   1.47 ( 0%) usr   0.58 ( 5%) sys   2.01  
( 0%) wall   26491 kB ( 4%) ggc
dominator optimization:   2.51 ( 1%) usr   0.01 ( 0%) sys   2.55  
( 1%) wall   46004 kB ( 6%) ggc
tree STORE-CCP        :   0.58 ( 0%) usr   0.00 ( 0%) sys   0.58  
( 0%) wall    1024 kB ( 0%) ggc
tree CCP              :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.61  
( 0%) wall    1024 kB ( 0%) ggc
tree PHI const/copy prop:   0.19 ( 0%) usr   0.00 ( 0%) sys   0.20  
( 0%) wall       9 kB ( 0%) ggc
tree split crit edges :   0.09 ( 0%) usr   0.02 ( 0%) sys   0.12  
( 0%) wall   27005 kB ( 4%) ggc
tree reassociation    :   0.45 ( 0%) usr   0.01 ( 0%) sys   0.45  
( 0%) wall       0 kB ( 0%) ggc
tree FRE              : 194.08 (42%) usr   0.18 ( 2%) sys 202.72  
(42%) wall   23470 kB ( 3%) ggc
tree code sinking     :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.48  
( 0%) wall       0 kB ( 0%) ggc
tree linearize phis   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13  
( 0%) wall       0 kB ( 0%) ggc
tree forward propagate:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
tree conservative DCE :   1.14 ( 0%) usr   0.00 ( 0%) sys   1.15  
( 0%) wall       0 kB ( 0%) ggc
tree aggressive DCE   :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.41  
( 0%) wall       0 kB ( 0%) ggc
tree DSE              :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.31  
( 0%) wall       0 kB ( 0%) ggc
PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00  
( 0%) wall       2 kB ( 0%) ggc
tree loop bounds      :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14  
( 0%) wall       0 kB ( 0%) ggc
loop invariant motion :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.30  
( 0%) wall       0 kB ( 0%) ggc
tree canonical iv     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08  
( 0%) wall       0 kB ( 0%) ggc
scev constant prop    :   0.58 ( 0%) usr   0.00 ( 0%) sys   0.57  
( 0%) wall    1756 kB ( 0%) ggc
complete unrolling    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
( 0%) wall       0 kB ( 0%) ggc
tree iv optimization  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00  
( 0%) wall       0 kB ( 0%) ggc
tree loop init        :   2.07 ( 0%) usr   0.07 ( 1%) sys   2.17  
( 0%) wall   41825 kB ( 6%) ggc
tree loop fini        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree copy headers     :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       1 kB ( 0%) ggc
tree SSA uncprop      :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.21  
( 0%) wall       0 kB ( 0%) ggc
tree SSA to normal    :  30.22 ( 7%) usr   0.07 ( 1%) sys  30.74  
( 6%) wall   54480 kB ( 7%) ggc
tree rename SSA copies:   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall       0 kB ( 0%) ggc
dominance frontiers   :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.40  
( 0%) wall       0 kB ( 0%) ggc
dominance computation :   2.01 ( 0%) usr   0.01 ( 0%) sys   2.04  
( 0%) wall       0 kB ( 0%) ggc
expand                :   5.02 ( 1%) usr   0.12 ( 1%) sys   5.29  
( 1%) wall   93938 kB (13%) ggc
varconst              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       6 kB ( 0%) ggc
jump                  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall       0 kB ( 0%) ggc
CSE                   :   0.55 ( 0%) usr   0.01 ( 0%) sys   0.58  
( 0%) wall     130 kB ( 0%) ggc
loop analysis         :  17.46 ( 4%) usr   5.59 (48%) sys  24.20  
( 5%) wall    5483 kB ( 1%) ggc
branch prediction     :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.74  
( 0%) wall    1532 kB ( 0%) ggc
flow analysis         :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall       0 kB ( 0%) ggc
combiner              :   1.66 ( 0%) usr   0.01 ( 0%) sys   1.68  
( 0%) wall   21082 kB ( 3%) ggc
if-conversion         :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.54  
( 0%) wall     350 kB ( 0%) ggc
mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
local alloc           :   1.29 ( 0%) usr   0.00 ( 0%) sys   1.30  
( 0%) wall    7039 kB ( 1%) ggc
global alloc          :   7.10 ( 2%) usr   0.41 ( 3%) sys   7.52  
( 2%) wall    6574 kB ( 1%) ggc
reload CSE regs       :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.73  
( 0%) wall   10842 kB ( 1%) ggc
flow 2                :   0.37 ( 0%) usr   0.00 ( 0%) sys   0.38  
( 0%) wall    3114 kB ( 0%) ggc
if-conversion 2       :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.21  
( 0%) wall       9 kB ( 0%) ggc
rename registers      :   0.54 ( 0%) usr   0.04 ( 0%) sys   0.58  
( 0%) wall      24 kB ( 0%) ggc
scheduling 2          :   2.36 ( 1%) usr   0.04 ( 0%) sys   2.40  
( 0%) wall   10954 kB ( 1%) ggc
machine dep reorg     :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.44  
( 0%) wall     135 kB ( 0%) ggc
final                 :   1.06 ( 0%) usr   0.01 ( 0%) sys   1.08  
( 0%) wall    2050 kB ( 0%) ggc
TOTAL                 : 456.98            11.72            
481.71             734771 kB
457.718u 12.529s 8:03.35 97.2%  0+0k 0+0io 0pf+0w
euler-158% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-mainline --with-gmp=/ 
pkgs/gmp-4.2.1 --with-mpfr=/pkgs/gmp-4.2.1 --enable-checking=no -- 
enable-languages=c
Thread model: posix
gcc version 4.3.0 20061207 (experimental)

Comment 22 lucier 2006-12-08 01:24:05 UTC
Subject: Re:  Inordinate compile times on large routines

And here's the same data for 4.2.0 branch; Dan, your changes have  
clearly helped a lot.

It seems to take about 5% more memory at the maximum, though, on  
4.3.0 (6.9GB vs 6.6GB); but both these numbers are just from visual  
inspection of "top" as things were running, so they are likely not  
accurate.

Brad

euler-56% time /pkgs/gcc-4.2.0-test/bin/gcc  -no-cpp-precomp -Wall -W  
-Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math - 
fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common - 
ftime-report -fmem-report -c all.i
gcc: unrecognized option '-no-cpp-precomp'
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8             16k         13k        480
16          8556k       7998k        183k
64            36M         22M        587k
256         4096        2816          56
512         4096        1536          56
1024         224k        222k       3136
2048          40k         26k        560
4096          76k         76k       1064
8192          64k         64k        448
16384         16k         16k         56
32768         64k         64k        112
65536        704k        704k        616
131072        512k        512k        224
262144        768k        768k        168
1048576       6144k       5120k        336
2097152       2048k       2048k         56
112          144k         92k       2016
208           44k         40k        616
192           14M         10M        209k
160           40k         37k        560
176         7996k       4802k        109k
96            19M         16M        267k
416           28k         23k        392
128           10M       7187k        142k
48            18M       9508k        302k
224          420k        384k       5880
32            71M         71M       1279k
80            62M         46M        880k
Total        261M        205M       3979k

String pool
entries         125047
identifiers     125047 (100.00%)
slots           262144
bytes           1675k (137k overhead)
table size      2048k
coll/search     0.8967
ins/search      0.1974
avg. entry      13.72 bytes (+/- 8.99)
longest entry   71

??? tree nodes created

(No per-node statistics)
Type hash: size 1021, 573 elements, 0.687728 collisions
DECL_DEBUG_EXPR  hash: size 8191, 2981 elements, 0.983076 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
garbage collection    :   1.10 ( 0%) usr   0.00 ( 0%) sys   1.10  
( 0%) wall       0 kB ( 0%) ggc
callgraph construction:   0.60 ( 0%) usr   0.12 ( 1%) sys   0.74  
( 0%) wall   17017 kB ( 2%) ggc
callgraph optimization:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
( 0%) wall       0 kB ( 0%) ggc
ipa reference         :   0.17 ( 0%) usr   0.05 ( 0%) sys   0.25  
( 0%) wall       7 kB ( 0%) ggc
cfg cleanup           :   7.68 ( 0%) usr   0.00 ( 0%) sys   7.69  
( 0%) wall      38 kB ( 0%) ggc
trivially dead code   :   1.13 ( 0%) usr   0.00 ( 0%) sys   1.10  
( 0%) wall       0 kB ( 0%) ggc
life analysis         :  20.05 ( 1%) usr   0.01 ( 0%) sys  20.10  
( 1%) wall   12032 kB ( 2%) ggc
life info update      :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.61  
( 0%) wall       0 kB ( 0%) ggc
alias analysis        :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.89  
( 0%) wall    7174 kB ( 1%) ggc
register scan         :   0.51 ( 0%) usr   0.00 ( 0%) sys   0.50  
( 0%) wall       1 kB ( 0%) ggc
rebuild jump labels   :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.20  
( 0%) wall       0 kB ( 0%) ggc
preprocessing         :   0.63 ( 0%) usr   0.92 ( 8%) sys   1.50  
( 0%) wall    1794 kB ( 0%) ggc
lexical analysis      :   0.61 ( 0%) usr   1.67 (14%) sys   2.39  
( 0%) wall       0 kB ( 0%) ggc
parser                :   1.27 ( 0%) usr   0.82 ( 7%) sys   2.30  
( 0%) wall   59584 kB ( 8%) ggc
integration           :   0.27 ( 0%) usr   0.10 ( 1%) sys   0.39  
( 0%) wall       0 kB ( 0%) ggc
tree gimplify         :   0.68 ( 0%) usr   0.02 ( 0%) sys   0.75  
( 0%) wall   23041 kB ( 3%) ggc
tree eh               :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       0 kB ( 0%) ggc
tree CFG construction :   0.32 ( 0%) usr   0.06 ( 0%) sys   0.40  
( 0%) wall   59313 kB ( 8%) ggc
tree CFG cleanup      :   4.18 ( 0%) usr   0.00 ( 0%) sys   4.20  
( 0%) wall    3716 kB ( 1%) ggc
tree copy propagation :   1.51 ( 0%) usr   0.01 ( 0%) sys   1.55  
( 0%) wall    2219 kB ( 0%) ggc
tree store copy prop  :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.51  
( 0%) wall     576 kB ( 0%) ggc
tree find ref. vars   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13  
( 0%) wall    1186 kB ( 0%) ggc
tree PTA              : 857.74 (53%) usr   0.47 ( 4%) sys 859.20  
(53%) wall    2331 kB ( 0%) ggc
tree alias analysis   : 383.35 (24%) usr   0.64 ( 5%) sys 385.19  
(24%) wall   15963 kB ( 2%) ggc
tree PHI insertion    :   1.57 ( 0%) usr   0.10 ( 1%) sys   1.75  
( 0%) wall   69532 kB (10%) ggc
tree SSA rewrite      :   3.05 ( 0%) usr   0.03 ( 0%) sys   3.07  
( 0%) wall   28127 kB ( 4%) ggc
tree SSA other        :   0.18 ( 0%) usr   0.10 ( 1%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
tree SSA incremental  :   9.45 ( 1%) usr   0.07 ( 1%) sys   9.53  
( 1%) wall   20443 kB ( 3%) ggc
tree operand scan     :   6.22 ( 0%) usr   0.53 ( 4%) sys   7.09  
( 0%) wall   26490 kB ( 4%) ggc
dominator optimization:   3.86 ( 0%) usr   0.05 ( 0%) sys   3.92  
( 0%) wall   48855 kB ( 7%) ggc
tree STORE-CCP        :   0.47 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       8 kB ( 0%) ggc
tree CCP              :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.63  
( 0%) wall      16 kB ( 0%) ggc
tree PHI const/copy prop:   0.25 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       9 kB ( 0%) ggc
tree split crit edges :   0.12 ( 0%) usr   0.04 ( 0%) sys   0.17  
( 0%) wall   27005 kB ( 4%) ggc
tree reassociation    :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.53  
( 0%) wall       0 kB ( 0%) ggc
tree FRE              : 181.71 (11%) usr   0.02 ( 0%) sys 181.71  
(11%) wall   18940 kB ( 3%) ggc
tree code sinking     :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.58  
( 0%) wall       0 kB ( 0%) ggc
tree linearize phis   :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13  
( 0%) wall       0 kB ( 0%) ggc
tree forward propagate:   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall       0 kB ( 0%) ggc
tree conservative DCE :   1.32 ( 0%) usr   0.00 ( 0%) sys   1.31  
( 0%) wall       0 kB ( 0%) ggc
tree aggressive DCE   :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.51  
( 0%) wall       0 kB ( 0%) ggc
tree DSE              :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.40  
( 0%) wall       1 kB ( 0%) ggc
PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       2 kB ( 0%) ggc
tree loop bounds      :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.23  
( 0%) wall       0 kB ( 0%) ggc
loop invariant motion :   0.37 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall       0 kB ( 0%) ggc
tree canonical iv     :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12  
( 0%) wall       0 kB ( 0%) ggc
scev constant prop    :   0.67 ( 0%) usr   0.00 ( 0%) sys   0.66  
( 0%) wall    1756 kB ( 0%) ggc
complete unrolling    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall       0 kB ( 0%) ggc
tree iv optimization  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree loop init        :   2.35 ( 0%) usr   0.08 ( 1%) sys   2.47  
( 0%) wall   45903 kB ( 6%) ggc
tree copy headers     :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13  
( 0%) wall       1 kB ( 0%) ggc
tree SSA uncprop      :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
tree SSA to normal    :  56.18 ( 4%) usr   0.05 ( 0%) sys  56.26  
( 3%) wall   55617 kB ( 8%) ggc
tree rename SSA copies:   0.45 ( 0%) usr   0.00 ( 0%) sys   0.45  
( 0%) wall       0 kB ( 0%) ggc
dominance frontiers   :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.51  
( 0%) wall       0 kB ( 0%) ggc
dominance computation :   2.43 ( 0%) usr   0.01 ( 0%) sys   2.39  
( 0%) wall       0 kB ( 0%) ggc
expand                :   7.24 ( 0%) usr   0.08 ( 1%) sys   7.44  
( 0%) wall   95504 kB (13%) ggc
jump                  :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
CSE                   :   0.82 ( 0%) usr   0.00 ( 0%) sys   0.83  
( 0%) wall    1108 kB ( 0%) ggc
loop analysis         :  18.66 ( 1%) usr   5.45 (45%) sys  24.87  
( 2%) wall    5844 kB ( 1%) ggc
branch prediction     :   0.92 ( 0%) usr   0.01 ( 0%) sys   0.93  
( 0%) wall    1532 kB ( 0%) ggc
flow analysis         :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall       0 kB ( 0%) ggc
combiner              :   1.60 ( 0%) usr   0.02 ( 0%) sys   1.63  
( 0%) wall   19153 kB ( 3%) ggc
if-conversion         :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.65  
( 0%) wall     365 kB ( 0%) ggc
mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
local alloc           :   1.32 ( 0%) usr   0.01 ( 0%) sys   1.32  
( 0%) wall    5154 kB ( 1%) ggc
global alloc          :   7.14 ( 0%) usr   0.37 ( 3%) sys   7.57  
( 0%) wall    9514 kB ( 1%) ggc
reload CSE regs       :   0.82 ( 0%) usr   0.00 ( 0%) sys   0.83  
( 0%) wall   11516 kB ( 2%) ggc
flow 2                :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.41  
( 0%) wall    2940 kB ( 0%) ggc
if-conversion 2       :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.25  
( 0%) wall       1 kB ( 0%) ggc
rename registers      :   0.58 ( 0%) usr   0.03 ( 0%) sys   0.60  
( 0%) wall      15 kB ( 0%) ggc
scheduling 2          :   2.33 ( 0%) usr   0.03 ( 0%) sys   2.38  
( 0%) wall   10644 kB ( 1%) ggc
machine dep reorg     :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.49  
( 0%) wall      91 kB ( 0%) ggc
final                 :   1.23 ( 0%) usr   0.02 ( 0%) sys   1.25  
( 0%) wall    2050 kB ( 0%) ggc
TOTAL                 :1604.05            12.16           
1620.25             716501 kB
1604.799u 13.117s 27:05.13 99.5%        0+0k 0+0io 0pf+0w
euler-57% /pkgs/gcc-4.2.0-test/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-4.2.0-test --with- 
gmp=/pkgs/gmp-4.2.1 --with-mpfr=/pkgs/gmp-4.2.1 --enable-checking=no  
--enable-languages=c
Thread model: posix
gcc version 4.2.0 20061207 (prerelease)

Comment 23 lucier 2006-12-11 06:27:48 UTC
Subject: Re:  Inordinate compile times on large routines

After Andrew MacLeod's changes here

http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00691.html

I see

tree SSA to normal    :   5.23 ( 1%) usr   0.06 ( 0%) sys   5.30  
( 1%) wall   52594 kB ( 7%) ggc

instead of

tree SSA to normal    :  30.22 ( 7%) usr   0.07 ( 1%) sys  30.74  
( 6%) wall   54480 kB ( 7%) ggc

Very nice.

Other passes with noticeable run times remaining are

tree alias analysis   : 125.17 (28%) usr   0.52 ( 4%) sys 126.90  
(27%) wall   18481 kB ( 3%) ggc
tree FRE              : 207.84 (46%) usr   0.19 ( 1%) sys 208.56  
(45%) wall   23470 kB ( 3%) ggc

Brad
Comment 24 lucier 2007-01-10 18:49:35 UTC
Tried it again with today's 4.2.0:

euler-34% /pkgs/gcc-4.2.0-test/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-4.2.0-test --with-gmp=/pkgs/gmp-4.2.1 --with-mpfr=/pkgs/gmp-4.2.1
Thread model: posix
gcc version 4.2.0 20070110 (prerelease)

The two hot spots were

 tree SSA to normal    :  52.63 (16%) usr   0.04 ( 0%) sys  52.69 (15%) wall   55617 kB ( 8%) ggc
 tree FRE              : 150.81 (46%) usr   0.20 ( 2%) sys 154.00 (45%) wall   18940 kB ( 3%) ggc

while 

 tree alias analysis   :   1.94 ( 1%) usr   0.58 ( 5%) sys   2.18 ( 1%) wall     387 kB ( 0%) ggc

is now very low.

Is there a patch than can be back-ported from mainline to fix tree SSA to normal?  On mainline there is the tremendous result of

tree SSA to normal    :   5.23 ( 1%) usr   0.06 ( 0%) sys   5.30 ( 1%) wall   52594 kB ( 7%) ggc

On another note, can I change this to be reported against 4.2.0?
Comment 25 Andrew Macleod 2007-01-10 19:47:59 UTC
There were numerous factors in the mainline speedup of SSA->normal, including a massive rewrite, but there are a couple of big wins that are backportable, and were in fact considered. It was just that they were too late in stage 3 at the time:

I don't remember which one(s) affected this test case the most.

live range speedup:
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00895.html

TER speedup
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00896.html

this one was applied later, and may or may not make any difference whatsoever. I  changes the coalesce list from a linked list to a hash table.  I seem to recall only one test case it affected. 
http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01515.html
Comment 26 Steven Bosscher 2007-11-14 09:56:01 UTC
Could someone test this with GCC 4.3, and report the results here?
Comment 27 Richard Biener 2007-11-14 10:07:00 UTC
http://www.suse.de/~gcctest/c++bench/random/ tracks this testcase (on x86_64 that is).
Comment 28 Steven Bosscher 2007-11-14 12:04:19 UTC
Then I suggest we close this bug report.
Comment 29 lucier 2007-11-14 12:40:07 UTC
Subject: Re:  Inordinate compile times on large routines

It appears to me from the raw logs at

http://www.suse.de/~gcctest/c++bench/random/

that all runs except for the -O0 fail with an out-of-memory failure,  
so I don't know what this is really testing.

Relevant excerpt from the logs follows.

> TEST: pr26854.c
> total: 782967 kB
>
> Execution times (seconds)
>  garbage collection    :   0.88 ( 2%) usr   0.01 ( 0%) sys   0.90  
> ( 2%) wall       0 kB ( 0%) ggc
>  callgraph construction:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02  
> ( 0%) wall       0 kB ( 0%) ggc
>  cfg cleanup           :   3.13 ( 7%) usr   0.00 ( 0%) sys   3.14  
> ( 7%) wall     186 kB ( 0%) ggc
>  trivially dead code   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.15  
> ( 0%) wall       0 kB ( 0%) ggc
>  df live regs          :   0.30 ( 1%) usr   0.01 ( 0%) sys   0.31  
> ( 1%) wall       0 kB ( 0%) ggc
>  df reg dead/unused notes:   0.26 ( 1%) usr   0.01 ( 0%) sys   0.27  
> ( 1%) wall   12048 kB ( 2%) ggc
>  register information  :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.31  
> ( 1%) wall       0 kB ( 0%) ggc
>  alias analysis        :   0.24 ( 1%) usr   0.00 ( 0%) sys   0.24  
> ( 1%) wall    4096 kB ( 1%) ggc
>  rebuild jump labels   :   0.27 ( 1%) usr   0.00 ( 0%) sys   0.28  
> ( 1%) wall       0 kB ( 0%) ggc
>  preprocessing         :   0.83 ( 2%) usr   0.98 (19%) sys   1.87  
> ( 4%) wall    2978 kB ( 1%) ggc
>  lexical analysis      :   0.64 ( 2%) usr   1.80 (35%) sys   2.57  
> ( 5%) wall       0 kB ( 0%) ggc
>  parser                :   2.57 ( 6%) usr   1.31 (26%) sys   3.70  
> ( 8%) wall  106641 kB (22%) ggc
>  inline heuristics     :   0.78 ( 2%) usr   0.18 ( 4%) sys   0.97  
> ( 2%) wall       0 kB ( 0%) ggc
>  tree gimplify         :   1.21 ( 3%) usr   0.11 ( 2%) sys   1.32  
> ( 3%) wall   90819 kB (19%) ggc
>  tree eh               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10  
> ( 0%) wall       0 kB ( 0%) ggc
>  tree CFG construction :   0.48 ( 1%) usr   0.05 ( 1%) sys   0.54  
> ( 1%) wall   68530 kB (14%) ggc
>  tree CFG cleanup      :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.15  
> ( 0%) wall       0 kB ( 0%) ggc
>  dominance computation :   0.16 ( 0%) usr   0.03 ( 1%) sys   0.20  
> ( 0%) wall       0 kB ( 0%) ggc
>  expand                :   3.52 ( 8%) usr   0.30 ( 6%) sys   3.82  
> ( 8%) wall  130942 kB (27%) ggc
>  varconst              :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06  
> ( 0%) wall    1571 kB ( 0%) ggc
>  jump                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
> ( 0%) wall       0 kB ( 0%) ggc
>  local alloc           :   2.49 ( 6%) usr   0.04 ( 1%) sys   2.53  
> ( 5%) wall    4099 kB ( 1%) ggc
>  global alloc          :  21.22 (50%) usr   0.13 ( 3%) sys  21.36  
> (45%) wall   48602 kB (10%) ggc
>  thread pro- & epilogue:   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08  
> ( 0%) wall       3 kB ( 0%) ggc
>  final                 :   2.14 ( 5%) usr   0.09 ( 2%) sys   2.21  
> ( 5%) wall     763 kB ( 0%) ggc
>  symout                :   0.19 ( 0%) usr   0.02 ( 0%) sys   0.22  
> ( 0%) wall   16699 kB ( 3%) ggc
>  TOTAL                 :  42.26             5.08             
> 47.39             488944 kB
> TIME: 44.33
> FILESIZE: text data bss dec hex filename 2899378 423808 3040  
> 3326226 32c112 ./out.o
>
> cc1: out of memory allocating 4064 bytes after a total of  
> 1020792832 bytes
> total: 1884587 kB
>
> cc1: out of memory allocating 4064 bytes after a total of  
> 1020895232 bytes
> Command exited with non-zero status 1
> TIME: 79.98
> FILESIZE: text data bss dec hex filename 12492 872 336 13700 3584 ./ 
> out.o
>
> cc1: out of memory allocating 4064 bytes after a total of 993718272  
> bytes
> total: 1884827 kB
>
> cc1: out of memory allocating 4064 bytes after a total of 993726464  
> bytes
> Command exited with non-zero status 1
> TIME: 132.93
> FILESIZE: text data bss dec hex filename 12492 872 336 13700 3584 ./ 
> out.o
>
> cc1: out of memory allocating 4064 bytes after a total of 916152320  
> bytes
> total: 1884835 kB
>
> cc1: out of memory allocating 4064 bytes after a total of 916217856  
> bytes
> Command exited with non-zero status 1
> TIME: 143.76
> FILESIZE: text data bss dec hex filename 12492 872 336 13700 3584 ./ 
> out.o
Comment 30 Richard Biener 2007-11-14 13:13:57 UTC
Right - the tester is limited to using 1GB of ram artificially.  I probably need
to fix the setup to report errors instead of "sofar" numbers in the oom cases.
Comment 31 lucier 2007-11-14 13:37:54 UTC
Subject: Re:  Inordinate compile times on large routines

To answer Steven's original question, here is a run with

euler-20% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline  
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/ 
gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2
Thread model: posix
gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC)

Memory usage peaked at 10.3GB (just from monitoring top).

Brad

euler-19% time /pkgs/gcc-mainline/bin/gcc -Wall -W -Wno-unused -O1 - 
fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict- 
aliasing -fwrapv -fomit-frame-pointer -fPIC -I/usr/local/Gambit-C/ 
include/ -ftime-report -fmem-report -c all.i
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           4096          16         120
16           108k         18k       2376
128         8192        2816         112
256          504k        464k       7056
512         4096        1024          56
1024         112k        110k       1568
2048          28k         22k        392
4096          76k         76k       1064
8192          48k         48k        336
16384         32k         32k        112
32768         32k         32k         56
131072        256k        256k        112
262144        512k        512k        112
524288       1024k       1024k        112
1048576       2048k       2048k        112
160         2764k       2669k         37k
176          144k        126k       2016
432           28k         21k        392
96            65M         14M        918k
48          2100k       1171k         32k
208          688k        325k       9632
64          1288k       1237k         20k
32           172k         64k       3096
80            30M       2060k        421k
Total        107M         26M       1459k

String pool
entries         159078
identifiers     159078 (100.00%)
slots           262144
bytes           1992k (170k overhead)
table size      2048k
coll/search     0.8632
ins/search      0.2065
avg. entry      12.83 bytes (+/- 7.80)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 2039, 919 elements, 0.860792 collisions
DECL_DEBUG_EXPR  hash: size 16381, 0 elements, 1.211012 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
garbage collection    :   1.19 ( 0%) usr   0.00 ( 0%) sys   1.19  
( 0%) wall       0 kB ( 0%) ggc
callgraph construction:   0.76 ( 0%) usr   0.11 ( 1%) sys   0.88  
( 0%) wall   33780 kB ( 4%) ggc
callgraph optimization:   1.23 ( 1%) usr   0.00 ( 0%) sys   1.23  
( 0%) wall       6 kB ( 0%) ggc
ipa reference         :   0.22 ( 0%) usr   0.03 ( 0%) sys   0.25  
( 0%) wall       7 kB ( 0%) ggc
cfg cleanup           :   2.17 ( 1%) usr   0.01 ( 0%) sys   2.17  
( 1%) wall     162 kB ( 0%) ggc
trivially dead code   :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall       0 kB ( 0%) ggc
df reaching defs      :  10.08 ( 4%) usr   4.09 (24%) sys  14.18  
( 6%) wall       0 kB ( 0%) ggc
df live regs          :   7.77 ( 3%) usr   0.01 ( 0%) sys   7.77  
( 3%) wall       0 kB ( 0%) ggc
df live&initialized regs:  82.60 (35%) usr   2.60 (15%) sys  85.23  
(33%) wall       0 kB ( 0%) ggc
df use-def / def-use chains:   8.23 ( 3%) usr   2.51 (14%) sys  10.73  
( 4%) wall       0 kB ( 0%) ggc
df reg dead/unused notes:   0.97 ( 0%) usr   0.00 ( 0%) sys   0.97  
( 0%) wall   10939 kB ( 1%) ggc
register information  :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.55  
( 0%) wall       0 kB ( 0%) ggc
alias analysis        :   0.90 ( 0%) usr   0.00 ( 0%) sys   0.89  
( 0%) wall    7168 kB ( 1%) ggc
register scan         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
rebuild jump labels   :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall       0 kB ( 0%) ggc
preprocessing         :   0.62 ( 0%) usr   0.96 ( 6%) sys   1.66  
( 1%) wall    2932 kB ( 0%) ggc
lexical analysis      :   0.62 ( 0%) usr   1.98 (11%) sys   2.31  
( 1%) wall       0 kB ( 0%) ggc
parser                :   1.29 ( 1%) usr   0.86 ( 5%) sys   2.37  
( 1%) wall   68897 kB ( 8%) ggc
inline heuristics     :   0.67 ( 0%) usr   0.17 ( 1%) sys   0.84  
( 0%) wall       0 kB ( 0%) ggc
tree gimplify         :   1.11 ( 0%) usr   0.06 ( 0%) sys   1.16  
( 0%) wall   63192 kB ( 8%) ggc
tree eh               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.12  
( 0%) wall       0 kB ( 0%) ggc
tree CFG construction :   0.51 ( 0%) usr   0.06 ( 0%) sys   0.57  
( 0%) wall   68527 kB ( 8%) ggc
tree CFG cleanup      :   7.12 ( 3%) usr   0.00 ( 0%) sys   7.10  
( 3%) wall    3525 kB ( 0%) ggc
tree copy propagation :   2.01 ( 1%) usr   0.05 ( 0%) sys   2.06  
( 1%) wall    5125 kB ( 1%) ggc
tree store copy prop  :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.49  
( 0%) wall     576 kB ( 0%) ggc
tree find ref. vars   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall    1826 kB ( 0%) ggc
tree PTA              :   1.93 ( 1%) usr   0.13 ( 1%) sys   2.06  
( 1%) wall    3734 kB ( 0%) ggc
tree alias analysis   :   0.11 ( 0%) usr   0.08 ( 0%) sys   0.20  
( 0%) wall       0 kB ( 0%) ggc
tree call clobbering  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall       0 kB ( 0%) ggc
tree flow sensitive alias:   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall    2146 kB ( 0%) ggc
tree memory partitioning:   1.24 ( 1%) usr   0.00 ( 0%) sys   1.25  
( 0%) wall       0 kB ( 0%) ggc
tree PHI insertion    :   0.61 ( 0%) usr   0.04 ( 0%) sys   0.65  
( 0%) wall   18541 kB ( 2%) ggc
tree SSA rewrite      :   1.94 ( 1%) usr   0.03 ( 0%) sys   1.95  
( 1%) wall   35021 kB ( 4%) ggc
tree SSA other        :   0.17 ( 0%) usr   0.12 ( 1%) sys   0.30  
( 0%) wall       0 kB ( 0%) ggc
tree SSA incremental  :   8.55 ( 4%) usr   0.08 ( 0%) sys   8.64  
( 3%) wall   14256 kB ( 2%) ggc
tree operand scan     :   0.71 ( 0%) usr   0.22 ( 1%) sys   0.91  
( 0%) wall   28110 kB ( 3%) ggc
dominator optimization:   2.73 ( 1%) usr   0.02 ( 0%) sys   2.75  
( 1%) wall   42635 kB ( 5%) ggc
tree SRA              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree STORE-CCP        :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57  
( 0%) wall    1024 kB ( 0%) ggc
tree CCP              :   1.18 ( 0%) usr   0.01 ( 0%) sys   1.19  
( 0%) wall    1537 kB ( 0%) ggc
tree PHI const/copy prop:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.23  
( 0%) wall      11 kB ( 0%) ggc
tree split crit edges :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.13  
( 0%) wall   33706 kB ( 4%) ggc
tree reassociation    :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.62  
( 0%) wall       1 kB ( 0%) ggc
tree FRE              :   2.72 ( 1%) usr   0.06 ( 0%) sys   2.77  
( 1%) wall   49006 kB ( 6%) ggc
tree code sinking     :   0.47 ( 0%) usr   0.00 ( 0%) sys   0.48  
( 0%) wall       6 kB ( 0%) ggc
tree linearize phis   :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.27  
( 0%) wall       0 kB ( 0%) ggc
tree forward propagate:   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33  
( 0%) wall     426 kB ( 0%) ggc
tree conservative DCE :   1.60 ( 1%) usr   0.00 ( 0%) sys   1.61  
( 1%) wall       0 kB ( 0%) ggc
tree aggressive DCE   :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
tree DSE              :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.36  
( 0%) wall       1 kB ( 0%) ggc
PHI merge             :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall    7192 kB ( 1%) ggc
tree loop bounds      :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall       2 kB ( 0%) ggc
loop invariant motion :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.32  
( 0%) wall       0 kB ( 0%) ggc
tree canonical iv     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall       0 kB ( 0%) ggc
scev constant prop    :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.64  
( 0%) wall   17787 kB ( 2%) ggc
complete unrolling    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree loop init        :   3.12 ( 1%) usr   0.08 ( 0%) sys   3.22  
( 1%) wall   45438 kB ( 6%) ggc
tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree copy headers     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall       0 kB ( 0%) ggc
tree SSA uncprop      :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
tree SSA to normal    :  11.49 ( 5%) usr   0.08 ( 0%) sys  11.56  
( 5%) wall   83279 kB (10%) ggc
tree rename SSA copies:   0.53 ( 0%) usr   0.02 ( 0%) sys   0.56  
( 0%) wall       0 kB ( 0%) ggc
dominance frontiers   :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       0 kB ( 0%) ggc
dominance computation :   2.40 ( 1%) usr   0.03 ( 0%) sys   2.40  
( 1%) wall       0 kB ( 0%) ggc
expand                :  14.26 ( 6%) usr   1.89 (11%) sys  16.13  
( 6%) wall   92077 kB (11%) ggc
lower subreg          :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.24  
( 0%) wall       0 kB ( 0%) ggc
jump                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
( 0%) wall       0 kB ( 0%) ggc
CSE                   :   0.78 ( 0%) usr   0.00 ( 0%) sys   0.77  
( 0%) wall    1426 kB ( 0%) ggc
dead code elimination :   0.51 ( 0%) usr   0.00 ( 0%) sys   0.51  
( 0%) wall       0 kB ( 0%) ggc
dead store elim1      :   0.42 ( 0%) usr   0.06 ( 0%) sys   0.48  
( 0%) wall    7944 kB ( 1%) ggc
dead store elim2      :   0.48 ( 0%) usr   0.02 ( 0%) sys   0.49  
( 0%) wall    8878 kB ( 1%) ggc
loop analysis         :   0.60 ( 0%) usr   0.01 ( 0%) sys   0.63  
( 0%) wall      70 kB ( 0%) ggc
branch prediction     :   0.96 ( 0%) usr   0.02 ( 0%) sys   0.97  
( 0%) wall    1541 kB ( 0%) ggc
combiner              :   2.64 ( 1%) usr   0.04 ( 0%) sys   2.67  
( 1%) wall   27876 kB ( 3%) ggc
if-conversion         :   1.36 ( 1%) usr   0.01 ( 0%) sys   1.37  
( 1%) wall     667 kB ( 0%) ggc
local alloc           :   4.09 ( 2%) usr   0.02 ( 0%) sys   4.11  
( 2%) wall    7074 kB ( 1%) ggc
global alloc          :  26.15 (11%) usr   0.38 ( 2%) sys  26.54  
(10%) wall    5112 kB ( 1%) ggc
reload CSE regs       :   1.20 ( 1%) usr   0.01 ( 0%) sys   1.21  
( 0%) wall   12243 kB ( 1%) ggc
thread pro- & epilogue:   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
if-conversion 2       :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.38  
( 0%) wall      82 kB ( 0%) ggc
rename registers      :   0.61 ( 0%) usr   0.04 ( 0%) sys   0.65  
( 0%) wall      31 kB ( 0%) ggc
scheduling 2          :   2.61 ( 1%) usr   0.07 ( 0%) sys   2.70  
( 1%) wall       0 kB ( 0%) ggc
machine dep reorg     :   0.51 ( 0%) usr   0.00 ( 0%) sys   0.51  
( 0%) wall     146 kB ( 0%) ggc
reorder blocks        :   0.26 ( 0%) usr   0.01 ( 0%) sys   0.26  
( 0%) wall    6770 kB ( 1%) ggc
final                 :   1.20 ( 1%) usr   0.03 ( 0%) sys   1.22  
( 0%) wall       0 kB ( 0%) ggc
tree if-combine       :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall     228 kB ( 0%) ggc
TOTAL                 : 238.24            17.40            
255.72             824659 kB
239.030u 17.901s 4:17.09 99.9%  0+0k 0+0io 0pf+0w
euler-20% 
Comment 32 Richard Biener 2007-11-14 14:08:44 UTC
So, re-confirmed then.
Comment 33 Daniel Berlin 2007-11-14 16:57:48 UTC
Subject: Re:  Inordinate compile times on large routines

On 14 Nov 2007 13:37:54 -0000, lucier at math dot purdue dot edu
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #31 from lucier at math dot purdue dot edu  2007-11-14 13:37 -------
> Subject: Re:  Inordinate compile times on large routines
>
> To answer Steven's original question, here is a run with
>
> euler-20% /pkgs/gcc-mainline/bin/gcc -v
> Using built-in specs.
> Target: x86_64-unknown-linux-gnu
> Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
> --enable-languages=c --enable-checking=release --with-gmp=/pkgs/
> gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2
> Thread model: posix
> gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC)
>
> Memory usage peaked at 10.3GB (just from monitoring top).
>

Any idea where?

None of the numbers below give any interesting suspects, IMHO.
Comment 34 lucier 2007-11-14 19:04:59 UTC
Subject: Re:  Inordinate compile times on large routines


On Nov 14, 2007, at 11:57 AM, dberlin at dberlin dot org wrote:

>> Memory usage peaked at 10.3GB (just from monitoring top).
>
> Any idea where?

Not really, but I ran cc1 through gdb to generate the following data;  
I hope it's helpful.

The first interrupt was when top was reporting:

30359 lucier    25   0 9935m 9.6g 4128 T    0 61.7   2:19.65 cc1

At the second point in the compile (relatively stable top reports of  
memory usage):

30359 lucier    25   0 4121m 4.0g 4352 T   21 25.4   2:58.86 cc1

This is with

euler-24% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline  
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/ 
gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2
Thread model: posix
gcc version 4.3.0 20071113 (experimental) [trunk revision 130159] (GCC)

Brad


euler-23% !gdb
gdb /pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host  
libthread_db library "/lib64/tls/libthread_db.so.1".

(gdb) run -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 - 
fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer - 
fPIC -ftime-report -fmem-report all.i
Starting program: /export/pkgs/gcc-mainline/libexec/gcc/x86_64- 
unknown-linux-gnu/4.3.0/cc1 -Wall -W -Wno-unused -O1 -fno-math-errno - 
fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv - 
fomit-frame-pointer -fPIC -ftime-report -fmem-report all.i
__sputc __istype __isctype __wcwidth ___H__20_all_2e_o1 ___init_proc  
____20_all_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<visibility> <early_local_cleanups> {GC 294991k -> 188566k} <inline>  
<static-var> <pure-const>Assembling functions:
___H__20_all_2e_o1 {GC 382279k -> 277065k}
Program received signal SIGINT, Interrupt.
free_alloc_pool (pool=0xe7d8f60) at ../../../mainline/gcc/alloc- 
pool.c:199
199           free (block);
(gdb) where
#0  free_alloc_pool (pool=0xe7d8f60) at ../../../mainline/gcc/alloc- 
pool.c:199
#1  0x00000000004b12d3 in df_chain_remove_problem () at ../../../ 
mainline/gcc/df-problems.c:1935
#2  0x00000000004b1569 in df_chain_fully_remove_problem ()  
at ../../../mainline/gcc/df-problems.c:1981
#3  0x00000000004ad1a0 in df_finish_pass (verify=Variable "verify" is  
not available.
) at ../../../mainline/gcc/df-core.c:663
#4  0x000000000058791a in execute_one_pass (pass=0xc46960)  
at ../../../mainline/gcc/passes.c:1140
#5  0x0000000000587a60 in execute_pass_list (pass=0xc46960)  
at ../../../mainline/gcc/passes.c:1171
#6  0x0000000000587a75 in execute_pass_list (pass=0xc46840)  
at ../../../mainline/gcc/passes.c:1172
#7  0x0000000000587a75 in execute_pass_list (pass=0xc46d60)  
at ../../../mainline/gcc/passes.c:1172
#8  0x000000000062fae4 in tree_rest_of_compilation  
(fndecl=0x2a990e84e0) at ../../../mainline/gcc/tree-optimize.c:404
#9  0x000000000073a232 in cgraph_expand_function (node=0x2a9865da00)  
at ../../../mainline/gcc/cgraphunit.c:1151
#10 0x000000000073bc64 in cgraph_optimize () at ../../../mainline/gcc/ 
cgraphunit.c:1214
#11 0x000000000041225b in c_write_global_declarations () at ../../../ 
mainline/gcc/c-decl.c:8081
#12 0x00000000005fcfac in toplev_main (argc=Variable "argc" is not  
available.
) at ../../../mainline/gcc/toplev.c:1055
#13 0x00000030fd11c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#14 0x000000000040423a in _start ()
#15 0x0000007fbffff4e8 in ?? ()
#16 0x000000000000001c in ?? ()
#17 0x000000000000000f in ?? ()
#18 0x0000007fbffff7b2 in ?? ()
#19 0x0000007fbffff7fb in ?? ()
#20 0x0000007fbffff801 in ?? ()
#21 0x0000007fbffff804 in ?? ()
#22 0x0000007fbffff810 in ?? ()
#23 0x0000007fbffff814 in ?? ()
#24 0x0000007fbffff824 in ?? ()
#25 0x0000007fbffff836 in ?? ()
#26 0x0000007fbffff849 in ?? ()
#27 0x0000007fbffff85e in ?? ()
#28 0x0000007fbffff866 in ?? ()
#29 0x0000007fbffff87b in ?? ()
#30 0x0000007fbffff881 in ?? ()
#31 0x0000007fbffff88f in ?? ()
#32 0x0000007fbffff89c in ?? ()
#33 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x00000000004687c9 in bitmap_elt_insert_after (head=0x963b0f0,  
elt=0xd30a7a70, indx=561) at ../../../mainline/gcc/bitmap.c:203
203             if (element->next)
(gdb) where
#0  0x00000000004687c9 in bitmap_elt_insert_after (head=0x963b0f0,  
elt=0xd30a7a70, indx=561) at ../../../mainline/gcc/bitmap.c:203
#1  0x000000000046a19b in bitmap_ior_into (a=0x963b0f0, b=Variable  
"b" is not available.
) at ../../../mainline/gcc/bitmap.c:913
#2  0x00000000004adce6 in df_worklist_dataflow (dataflow=0x7829f20,  
blocks_to_consider=0x9c1f250, blocks_in_postorder=0x2ab81c6010,  
n_blocks=Variable "n_blocks" is not available.
)
     at ../../../mainline/gcc/df-core.c:875
#3  0x00000000004acd7e in df_analyze_problem (dflow=0x7829f20,  
blocks_to_consider=0x9c1f250, postorder=0x2ab81c6010, n_blocks=59465)
     at ../../../mainline/gcc/df-core.c:1060
#4  0x00000000004ad00a in df_analyze () at ../../../mainline/gcc/df- 
core.c:1150
#5  0x00000000008faee7 in if_convert () at ../../../mainline/gcc/ 
ifcvt.c:4045
#6  0x00000000008fb429 in rest_of_handle_if_after_combine ()  
at ../../../mainline/gcc/ifcvt.c:4161
#7  0x00000000005878c2 in execute_one_pass (pass=0xc4b620)  
at ../../../mainline/gcc/passes.c:1118
#8  0x0000000000587a60 in execute_pass_list (pass=0xc4b620)  
at ../../../mainline/gcc/passes.c:1171
#9  0x0000000000587a75 in execute_pass_list (pass=0xc46d60)  
at ../../../mainline/gcc/passes.c:1172
#10 0x000000000062fae4 in tree_rest_of_compilation  
(fndecl=0x2a990e84e0) at ../../../mainline/gcc/tree-optimize.c:404
#11 0x000000000073a232 in cgraph_expand_function (node=0x2a9865da00)  
at ../../../mainline/gcc/cgraphunit.c:1151
#12 0x000000000073bc64 in cgraph_optimize () at ../../../mainline/gcc/ 
cgraphunit.c:1214
#13 0x000000000041225b in c_write_global_declarations () at ../../../ 
mainline/gcc/c-decl.c:8081
#14 0x00000000005fcfac in toplev_main (argc=Variable "argc" is not  
available.
) at ../../../mainline/gcc/toplev.c:1055
#15 0x00000030fd11c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#16 0x000000000040423a in _start ()
#17 0x0000007fbffff4e8 in ?? ()
#18 0x000000000000001c in ?? ()
#19 0x000000000000000f in ?? ()
#20 0x0000007fbffff7b2 in ?? ()
#21 0x0000007fbffff7fb in ?? ()
#22 0x0000007fbffff801 in ?? ()
#23 0x0000007fbffff804 in ?? ()
#24 0x0000007fbffff810 in ?? ()
#25 0x0000007fbffff814 in ?? ()
#26 0x0000007fbffff824 in ?? ()
#27 0x0000007fbffff836 in ?? ()
#28 0x0000007fbffff849 in ?? ()
#29 0x0000007fbffff85e in ?? ()
#30 0x0000007fbffff866 in ?? ()
#31 0x0000007fbffff87b in ?? ()
#32 0x0000007fbffff881 in ?? ()
#33 0x0000007fbffff88f in ?? ()
#34 0x0000007fbffff89c in ?? ()
#35 0x0000000000000000 in ?? ()
(gdb) c
Continuing.
___init_proc ____20_all_2e_o1 {GC 466968k -> 26603k}Memory still  
allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           4096          32         120
16            72k         18k       1584
128         2144k       2135k         29k
256         8192        1536         112
512         4096        1024          56
1024         112k        110k       1568
2048          28k         22k        392
4096          76k         76k       1064
8192          48k         48k        336
16384         32k         32k        112
32768         32k         32k         56
131072        256k        256k        112
262144        512k        512k        112
524288       1024k       1024k        112
1048576       2048k       2048k        112
192          616k        300k       8624
144           20k       3024         280
160          132k        115k       1848
432           28k         21k        392
96            66M         14M        925k
48          2100k       1172k         32k
208          420k        375k       5880
64          1288k       1237k         20k
32           176k         72k       3168
80            30M       2060k        422k
Total        107M         25M       1455k

String pool
entries         159225
identifiers     159225 (100.00%)
slots           262144
bytes           1995k (172k overhead)
table size      2048k
coll/search     0.8692
ins/search      0.2066
avg. entry      12.83 bytes (+/- 7.80)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 2039, 920 elements, 0.860000 collisions
DECL_DEBUG_EXPR  hash: size 16381, 0 elements, 1.303078 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
garbage collection    :   1.17 ( 0%) usr   0.00 ( 0%) sys   1.17  
( 0%) wall       0 kB ( 0%) ggc
callgraph construction:   0.79 ( 0%) usr   0.11 ( 1%) sys   0.92  
( 0%) wall   31928 kB ( 4%) ggc
callgraph optimization:   1.18 ( 0%) usr   0.00 ( 0%) sys   1.16  
( 0%) wall       6 kB ( 0%) ggc
ipa reference         :   0.22 ( 0%) usr   0.03 ( 0%) sys   0.25  
( 0%) wall       7 kB ( 0%) ggc
cfg cleanup           :   2.16 ( 1%) usr   0.00 ( 0%) sys   2.16  
( 0%) wall     162 kB ( 0%) ggc
trivially dead code   :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall       0 kB ( 0%) ggc
df reaching defs      :  10.01 ( 4%) usr   3.74 (22%) sys  13.81  
( 3%) wall       0 kB ( 0%) ggc
df live regs          :   8.10 ( 3%) usr   0.01 ( 0%) sys   8.13  
( 2%) wall       0 kB ( 0%) ggc
df live&initialized regs:  93.27 (37%) usr   2.67 (16%) sys 204.59  
(41%) wall       0 kB ( 0%) ggc
df use-def / def-use chains:   8.56 ( 3%) usr   2.67 (16%) sys  11.27  
( 2%) wall       0 kB ( 0%) ggc
df reg dead/unused notes:   1.00 ( 0%) usr   0.01 ( 0%) sys   1.00  
( 0%) wall   10937 kB ( 1%) ggc
register information  :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.52  
( 0%) wall       0 kB ( 0%) ggc
alias analysis        :   0.93 ( 0%) usr   0.01 ( 0%) sys   0.91  
( 0%) wall    7168 kB ( 1%) ggc
register scan         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
rebuild jump labels   :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
preprocessing         :   0.71 ( 0%) usr   1.05 ( 6%) sys   1.69  
( 0%) wall    2918 kB ( 0%) ggc
lexical analysis      :   0.54 ( 0%) usr   1.82 (11%) sys   2.39  
( 0%) wall       0 kB ( 0%) ggc
parser                :   1.34 ( 1%) usr   1.00 ( 6%) sys   2.37  
( 0%) wall   66046 kB ( 8%) ggc
inline heuristics     :   0.67 ( 0%) usr   0.16 ( 1%) sys   0.83  
( 0%) wall       0 kB ( 0%) ggc
tree gimplify         :   1.07 ( 0%) usr   0.04 ( 0%) sys   1.13  
( 0%) wall   62339 kB ( 8%) ggc
tree eh               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
tree CFG construction :   0.51 ( 0%) usr   0.07 ( 0%) sys   0.57  
( 0%) wall   68526 kB ( 8%) ggc
tree CFG cleanup      :   7.11 ( 3%) usr   0.00 ( 0%) sys   7.16  
( 1%) wall    3524 kB ( 0%) ggc
tree copy propagation :   2.52 ( 1%) usr   0.06 ( 0%) sys   2.61  
( 1%) wall    5702 kB ( 1%) ggc
tree find ref. vars   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall    1819 kB ( 0%) ggc
tree PTA              :   1.96 ( 1%) usr   0.12 ( 1%) sys   2.08  
( 0%) wall    3734 kB ( 0%) ggc
tree alias analysis   :   0.06 ( 0%) usr   0.12 ( 1%) sys   0.19  
( 0%) wall       0 kB ( 0%) ggc
tree call clobbering  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall       0 kB ( 0%) ggc
tree flow sensitive alias:   0.17 ( 0%) usr   0.00 ( 0%) sys   0.16  
( 0%) wall    2146 kB ( 0%) ggc
tree memory partitioning:   1.30 ( 1%) usr   0.00 ( 0%) sys   1.30  
( 0%) wall       0 kB ( 0%) ggc
tree PHI insertion    :   0.64 ( 0%) usr   0.04 ( 0%) sys   0.69  
( 0%) wall   18541 kB ( 2%) ggc
tree SSA rewrite      :   1.93 ( 1%) usr   0.03 ( 0%) sys   1.95  
( 0%) wall   35021 kB ( 4%) ggc
tree SSA other        :   0.23 ( 0%) usr   0.09 ( 1%) sys   0.24  
( 0%) wall       0 kB ( 0%) ggc
tree SSA incremental  :   8.52 ( 3%) usr   0.08 ( 0%) sys   8.57  
( 2%) wall   14256 kB ( 2%) ggc
tree operand scan     :   0.73 ( 0%) usr   0.24 ( 1%) sys   0.97  
( 0%) wall   28110 kB ( 3%) ggc
dominator optimization:   2.75 ( 1%) usr   0.03 ( 0%) sys   2.80  
( 1%) wall   42635 kB ( 5%) ggc
tree STORE-CCP        :   0.59 ( 0%) usr   0.00 ( 0%) sys   0.59  
( 0%) wall    1024 kB ( 0%) ggc
tree CCP              :   1.20 ( 0%) usr   0.01 ( 0%) sys   1.21  
( 0%) wall    1537 kB ( 0%) ggc
tree PHI const/copy prop:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.25  
( 0%) wall      11 kB ( 0%) ggc
tree split crit edges :   0.11 ( 0%) usr   0.03 ( 0%) sys   0.13  
( 0%) wall   33706 kB ( 4%) ggc
tree reassociation    :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.64  
( 0%) wall       1 kB ( 0%) ggc
tree FRE              :   2.66 ( 1%) usr   0.05 ( 0%) sys   2.77  
( 1%) wall   49006 kB ( 6%) ggc
tree code sinking     :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.48  
( 0%) wall       6 kB ( 0%) ggc
tree linearize phis   :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.28  
( 0%) wall       0 kB ( 0%) ggc
tree forward propagate:   0.34 ( 0%) usr   0.00 ( 0%) sys   0.32  
( 0%) wall     426 kB ( 0%) ggc
tree conservative DCE :   1.60 ( 1%) usr   0.00 ( 0%) sys   1.62  
( 0%) wall       0 kB ( 0%) ggc
tree aggressive DCE   :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
tree DSE              :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.36  
( 0%) wall       1 kB ( 0%) ggc
PHI merge             :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall    7192 kB ( 1%) ggc
tree loop bounds      :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall       2 kB ( 0%) ggc
loop invariant motion :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.32  
( 0%) wall       0 kB ( 0%) ggc
tree canonical iv     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall       0 kB ( 0%) ggc
scev constant prop    :   0.63 ( 0%) usr   0.00 ( 0%) sys   0.64  
( 0%) wall   17787 kB ( 2%) ggc
complete unrolling    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree loop init        :   3.07 ( 1%) usr   0.09 ( 1%) sys   3.21  
( 1%) wall   45438 kB ( 6%) ggc
tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
tree copy headers     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07  
( 0%) wall       0 kB ( 0%) ggc
tree SSA uncprop      :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
tree SSA to normal    :  11.15 ( 4%) usr   0.07 ( 0%) sys  11.26  
( 2%) wall   81126 kB (10%) ggc
tree rename SSA copies:   0.55 ( 0%) usr   0.01 ( 0%) sys   0.56  
( 0%) wall       0 kB ( 0%) ggc
dominance frontiers   :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.48  
( 0%) wall       0 kB ( 0%) ggc
dominance computation :   2.49 ( 1%) usr   0.05 ( 0%) sys   2.52  
( 1%) wall       0 kB ( 0%) ggc
expand                :  14.26 ( 6%) usr   1.80 (10%) sys 144.06  
(29%) wall   92074 kB (11%) ggc
lower subreg          :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.24  
( 0%) wall       0 kB ( 0%) ggc
jump                  :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05  
( 0%) wall       0 kB ( 0%) ggc
CSE                   :   0.77 ( 0%) usr   0.00 ( 0%) sys   0.78  
( 0%) wall    1426 kB ( 0%) ggc
dead code elimination :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.52  
( 0%) wall       0 kB ( 0%) ggc
dead store elim1      :   0.43 ( 0%) usr   0.06 ( 0%) sys   0.49  
( 0%) wall    7944 kB ( 1%) ggc
dead store elim2      :   0.49 ( 0%) usr   0.01 ( 0%) sys   0.51  
( 0%) wall    8877 kB ( 1%) ggc
loop analysis         :   0.60 ( 0%) usr   0.01 ( 0%) sys   0.61  
( 0%) wall      70 kB ( 0%) ggc
branch prediction     :   0.95 ( 0%) usr   0.02 ( 0%) sys   0.98  
( 0%) wall    1541 kB ( 0%) ggc
combiner              :   2.65 ( 1%) usr   0.04 ( 0%) sys   2.70  
( 1%) wall   27893 kB ( 3%) ggc
if-conversion         :   1.55 ( 1%) usr   0.00 ( 0%) sys   1.55  
( 0%) wall     655 kB ( 0%) ggc
local alloc           :   4.01 ( 2%) usr   0.02 ( 0%) sys   4.05  
( 1%) wall    7074 kB ( 1%) ggc
global alloc          :  25.75 (10%) usr   0.36 ( 2%) sys  26.20  
( 5%) wall    5111 kB ( 1%) ggc
reload CSE regs       :   1.21 ( 0%) usr   0.01 ( 0%) sys   1.24  
( 0%) wall   12243 kB ( 1%) ggc
thread pro- & epilogue:   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
if-conversion 2       :   0.39 ( 0%) usr   0.00 ( 0%) sys   0.36  
( 0%) wall      82 kB ( 0%) ggc
rename registers      :   0.62 ( 0%) usr   0.04 ( 0%) sys   0.65  
( 0%) wall      31 kB ( 0%) ggc
scheduling 2          :   2.69 ( 1%) usr   0.05 ( 0%) sys   2.77  
( 1%) wall       0 kB ( 0%) ggc
machine dep reorg     :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.52  
( 0%) wall     149 kB ( 0%) ggc
reorder blocks        :   0.25 ( 0%) usr   0.01 ( 0%) sys   0.26  
( 0%) wall    6758 kB ( 1%) ggc
final                 :   1.26 ( 1%) usr   0.01 ( 0%) sys   1.27  
( 0%) wall       0 kB ( 0%) ggc
tree if-combine       :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06  
( 0%) wall     223 kB ( 0%) ggc
TOTAL                 : 249.32            17.19            
503.80             816827 kB

Program exited normally.


Comment 35 lucier 2007-11-14 19:06:41 UTC
Subject: Re:  Inordinate compile times on large routines

PS:  Should the "Reported against" field in bugzilla be changed to  
4.3.0?

Comment 36 lucier 2007-12-19 21:48:58 UTC
I changed the "reported against" field to 4.3.0 (see my previous comments).
Comment 37 Steven Bosscher 2007-12-19 22:13:02 UTC
Brad,

I am looking at your dump and your backtraces (many many thanks!!!) and I think I have an idea how to improve the situation a bit here:

> Program received signal SIGINT, Interrupt.
> 0x00000000004687c9 in bitmap_elt_insert_after (head=0x963b0f0,  
> elt=0xd30a7a70, indx=561) at ../../../mainline/gcc/bitmap.c:203
> 203             if (element->next)
> (gdb) where
> #0  0x00000000004687c9 in bitmap_elt_insert_after (head=0x963b0f0,  
> elt=0xd30a7a70, indx=561) at ../../../mainline/gcc/bitmap.c:203
> #1  0x000000000046a19b in bitmap_ior_into (a=0x963b0f0, b=Variable  
> "b" is not available.
> ) at ../../../mainline/gcc/bitmap.c:913
> #2  0x00000000004adce6 in df_worklist_dataflow (dataflow=0x7829f20,  
> blocks_to_consider=0x9c1f250, blocks_in_postorder=0x2ab81c6010,  
> n_blocks=Variable "n_blocks" is not available.
> )
>      at ../../../mainline/gcc/df-core.c:875
> #3  0x00000000004acd7e in df_analyze_problem (dflow=0x7829f20,  
blocks_to_consider=0x9c1f250, postorder=0x2ab81c6010, n_blocks=59465)
>      at ../../../mainline/gcc/df-core.c:1060

... and ...

> df live&initialized regs:  93.27 (37%) usr   2.67 (16%) sys 204.59  
(41%) wall       0 kB ( 0%) ggc

I have seen this before :-)  In fact, I already attached a patch implementing this idea in another bug report, bug 34400.

This may be asking a lot, but could you do something for me please?  Could you install the patches df_hack2.diff and df_double_queue_worklist.diff, and redo the timings?  Both patches are attached to bug 34400.

If adding the patches I mentioned does not help, could you try to interrupt gdb a few times more, and then look a few times in df_analyze_problem which problem dflow is?  I.e. "p dflow" or "p (timevar_id_t) dflow->tv_id", or whatever works to see which problem we are in?  I suspect we may be creating dataflow problems that are too large to handle.

Many thanks for your help!
Comment 38 lucier 2007-12-19 23:31:29 UTC
Subject: Re:  Inordinate compile times on large routines


On Dec 19, 2007, at 5:13 PM, steven at gcc dot gnu dot org wrote:

> This may be asking a lot, but could you do something for me  
> please?  Could you
> install the patches df_hack2.diff and  
> df_double_queue_worklist.diff, and redo
> the timings?  Both patches are attached to bug 34400.

Your patches definitely help, for some value of "help".  The top  
memory usage (just from watching "top") went from 9998 MB to 6803MB  
(of course I could have missed the peak memory usage of both jobs),  
and the CPU time went down, too.  Here are details.

Before your patches:

euler-32% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline  
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/ 
gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2
Thread model: posix
gcc version 4.3.0 20071219 (experimental) [trunk revision 131091] (GCC)
euler-33% /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule- 
insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame- 
pointer -fPIC -fno-common -ftime-report -fmem-report -c all.i
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           4096          16         120
16            72k         18k       1584
128         2144k       2135k         29k
256         4096        1536          56
512         4096        1024          56
1024         112k        110k       1568
2048          28k         22k        392
4096          76k         76k       1064
8192          48k         48k        336
16384         32k         32k        112
32768         32k         32k         56
131072        256k        256k        112
262144        512k        512k        112
524288       1024k       1024k        112
1048576       2048k       2048k        112
192          616k        300k       8624
144           20k       3024         280
160          132k        115k       1848
432           28k         21k        392
96            15M         14M        215k
48          2136k       1171k         33k
208          420k        375k       5880
64          1288k       1237k         20k
32           164k         64k       2952
80            29M       2060k        417k
Total         56M         25M        741k

String pool
entries         159286
identifiers     159286 (100.00%)
slots           262144
bytes           1995k (171k overhead)
table size      2048k
coll/search     0.9209
ins/search      0.2067
avg. entry      12.83 bytes (+/- 7.80)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 2039, 920 elements, 0.860000 collisions
DECL_DEBUG_EXPR  hash: size 16381, 0 elements, 1.332565 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
  garbage collection    :   1.05 ( 0%) usr   0.00 ( 0%) sys   1.06  
( 0%) wall       0 kB ( 0%) ggc
  callgraph construction:   0.79 ( 0%) usr   0.09 ( 1%) sys   0.89  
( 0%) wall   31928 kB ( 4%) ggc
  callgraph optimization:   1.02 ( 0%) usr   0.00 ( 0%) sys   1.03  
( 0%) wall       6 kB ( 0%) ggc
  ipa reference         :   0.21 ( 0%) usr   0.03 ( 0%) sys   0.24  
( 0%) wall       7 kB ( 0%) ggc
  cfg cleanup           :   2.16 ( 1%) usr   0.00 ( 0%) sys   2.16  
( 1%) wall     164 kB ( 0%) ggc
  trivially dead code   :   0.35 ( 0%) usr   0.01 ( 0%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
  df reaching defs      :   9.53 ( 4%) usr   3.29 (20%) sys  12.83  
( 5%) wall       0 kB ( 0%) ggc
  df live regs          :   8.09 ( 3%) usr   0.01 ( 0%) sys   8.11  
( 3%) wall       0 kB ( 0%) ggc
  df live&initialized regs:  98.09 (41%) usr   2.81 (17%) sys 100.95  
(39%) wall       0 kB ( 0%) ggc
  df use-def / def-use chains:   8.16 ( 3%) usr   2.38 (15%) sys   
10.53 ( 4%) wall       0 kB ( 0%) ggc
  df reg dead/unused notes:   0.95 ( 0%) usr   0.00 ( 0%) sys   0.95  
( 0%) wall   10801 kB ( 1%) ggc
  register information  :   0.52 ( 0%) usr   0.01 ( 0%) sys   0.51  
( 0%) wall       0 kB ( 0%) ggc
  alias analysis        :   0.85 ( 0%) usr   0.01 ( 0%) sys   0.87  
( 0%) wall    7168 kB ( 1%) ggc
  register scan         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
  rebuild jump labels   :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.33  
( 0%) wall       0 kB ( 0%) ggc
  preprocessing         :   0.68 ( 0%) usr   0.90 ( 6%) sys   1.66  
( 1%) wall    2918 kB ( 0%) ggc
  lexical analysis      :   0.55 ( 0%) usr   1.97 (12%) sys   2.18  
( 1%) wall       0 kB ( 0%) ggc
  parser                :   1.29 ( 1%) usr   0.90 ( 6%) sys   2.45  
( 1%) wall   66023 kB ( 8%) ggc
  inline heuristics     :   0.66 ( 0%) usr   0.15 ( 1%) sys   0.82  
( 0%) wall       0 kB ( 0%) ggc
  tree gimplify         :   1.08 ( 0%) usr   0.06 ( 0%) sys   1.14  
( 0%) wall   62339 kB ( 8%) ggc
  tree eh               :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
  tree CFG construction :   0.49 ( 0%) usr   0.05 ( 0%) sys   0.55  
( 0%) wall   68526 kB ( 9%) ggc
  tree CFG cleanup      :   6.94 ( 3%) usr   0.01 ( 0%) sys   6.94  
( 3%) wall    3575 kB ( 0%) ggc
  tree copy propagation :   2.41 ( 1%) usr   0.06 ( 0%) sys   2.47  
( 1%) wall    4818 kB ( 1%) ggc
  tree find ref. vars   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall    1819 kB ( 0%) ggc
  tree PTA              :   1.93 ( 1%) usr   0.10 ( 1%) sys   2.03  
( 1%) wall    3734 kB ( 0%) ggc
  tree alias analysis   :   0.11 ( 0%) usr   0.08 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
  tree call clobbering  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall       0 kB ( 0%) ggc
  tree flow sensitive alias:   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall    2146 kB ( 0%) ggc
  tree memory partitioning:   1.25 ( 1%) usr   0.00 ( 0%) sys   1.25  
( 0%) wall       0 kB ( 0%) ggc
  tree PHI insertion    :   0.59 ( 0%) usr   0.03 ( 0%) sys   0.64  
( 0%) wall   18541 kB ( 2%) ggc
  tree SSA rewrite      :   1.94 ( 1%) usr   0.03 ( 0%) sys   1.97  
( 1%) wall   35021 kB ( 5%) ggc
  tree SSA other        :   0.18 ( 0%) usr   0.08 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA incremental  :   9.06 ( 4%) usr   0.34 ( 2%) sys   9.43  
( 4%) wall   14359 kB ( 2%) ggc
  tree operand scan     :   0.69 ( 0%) usr   0.28 ( 2%) sys   0.98  
( 0%) wall   27918 kB ( 4%) ggc
  dominator optimization:   2.86 ( 1%) usr   0.02 ( 0%) sys   2.96  
( 1%) wall   44597 kB ( 6%) ggc
  tree SRA              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00  
( 0%) wall       0 kB ( 0%) ggc
  tree STORE-CCP        :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57  
( 0%) wall    1024 kB ( 0%) ggc
  tree CCP              :   1.14 ( 0%) usr   0.00 ( 0%) sys   1.16  
( 0%) wall    1537 kB ( 0%) ggc
  tree PHI const/copy prop:   0.23 ( 0%) usr   0.00 ( 0%) sys   0.23  
( 0%) wall      11 kB ( 0%) ggc
  tree split crit edges :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12  
( 0%) wall   33698 kB ( 4%) ggc
  tree reassociation    :   0.64 ( 0%) usr   0.00 ( 0%) sys   0.62  
( 0%) wall       1 kB ( 0%) ggc
  tree FRE              :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.25  
( 0%) wall       5 kB ( 0%) ggc
  tree code sinking     :   0.47 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       6 kB ( 0%) ggc
  tree linearize phis   :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27  
( 0%) wall       0 kB ( 0%) ggc
  tree forward propagate:   0.33 ( 0%) usr   0.00 ( 0%) sys   0.35  
( 0%) wall     426 kB ( 0%) ggc
  tree conservative DCE :   1.59 ( 1%) usr   0.00 ( 0%) sys   1.59  
( 1%) wall       0 kB ( 0%) ggc
  tree aggressive DCE   :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall       0 kB ( 0%) ggc
  tree DSE              :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.36  
( 0%) wall       1 kB ( 0%) ggc
  PHI merge             :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07  
( 0%) wall    7192 kB ( 1%) ggc
  tree loop bounds      :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16  
( 0%) wall       2 kB ( 0%) ggc
  loop invariant motion :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31  
( 0%) wall       0 kB ( 0%) ggc
  tree canonical iv     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall       0 kB ( 0%) ggc
  scev constant prop    :   0.66 ( 0%) usr   0.01 ( 0%) sys   0.67  
( 0%) wall   17793 kB ( 2%) ggc
  complete unrolling    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall       0 kB ( 0%) ggc
  tree loop init        :   3.15 ( 1%) usr   0.10 ( 1%) sys   3.17  
( 1%) wall   45121 kB ( 6%) ggc
  tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
  tree copy headers     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA uncprop      :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA to normal    :  11.37 ( 5%) usr   0.10 ( 1%) sys  11.47  
( 4%) wall   90617 kB (12%) ggc
  tree rename SSA copies:   0.55 ( 0%) usr   0.02 ( 0%) sys   0.56  
( 0%) wall       0 kB ( 0%) ggc
  dominance frontiers   :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.44  
( 0%) wall       0 kB ( 0%) ggc
  dominance computation :   2.38 ( 1%) usr   0.04 ( 0%) sys   2.42  
( 1%) wall       0 kB ( 0%) ggc
  expand                :  13.82 ( 6%) usr   1.53 ( 9%) sys  15.43  
( 6%) wall   91541 kB (12%) ggc
  lower subreg          :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.22  
( 0%) wall       0 kB ( 0%) ggc
  jump                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
( 0%) wall       0 kB ( 0%) ggc
  CSE                   :   0.80 ( 0%) usr   0.00 ( 0%) sys   0.78  
( 0%) wall    1403 kB ( 0%) ggc
  dead code elimination :   0.48 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       0 kB ( 0%) ggc
  dead store elim1      :   0.41 ( 0%) usr   0.03 ( 0%) sys   0.44  
( 0%) wall    7973 kB ( 1%) ggc
  dead store elim2      :   0.47 ( 0%) usr   0.01 ( 0%) sys   0.48  
( 0%) wall    8688 kB ( 1%) ggc
  loop analysis         :   0.57 ( 0%) usr   0.01 ( 0%) sys   0.58  
( 0%) wall      70 kB ( 0%) ggc
  branch prediction     :   0.93 ( 0%) usr   0.01 ( 0%) sys   0.94  
( 0%) wall    1541 kB ( 0%) ggc
  combiner              :   2.62 ( 1%) usr   0.04 ( 0%) sys   2.67  
( 1%) wall   28000 kB ( 4%) ggc
  if-conversion         :   1.55 ( 1%) usr   0.03 ( 0%) sys   1.54  
( 1%) wall     586 kB ( 0%) ggc
  local alloc           :   4.00 ( 2%) usr   0.01 ( 0%) sys   4.01  
( 2%) wall    7070 kB ( 1%) ggc
  global alloc          :  17.58 ( 7%) usr   0.30 ( 2%) sys  17.89  
( 7%) wall    4961 kB ( 1%) ggc
  reload CSE regs       :   1.17 ( 0%) usr   0.02 ( 0%) sys   1.18  
( 0%) wall   12069 kB ( 2%) ggc
  thread pro- & epilogue:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09  
( 0%) wall       4 kB ( 0%) ggc
  if-conversion 2       :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall     119 kB ( 0%) ggc
  rename registers      :   0.61 ( 0%) usr   0.02 ( 0%) sys   0.63  
( 0%) wall      29 kB ( 0%) ggc
  scheduling 2          :   2.52 ( 1%) usr   0.04 ( 0%) sys   2.55  
( 1%) wall       0 kB ( 0%) ggc
  machine dep reorg     :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.50  
( 0%) wall     148 kB ( 0%) ggc
  reorder blocks        :   0.28 ( 0%) usr   0.01 ( 0%) sys   0.27  
( 0%) wall    6727 kB ( 1%) ggc
  final                 :   1.19 ( 0%) usr   0.03 ( 0%) sys   1.25  
( 0%) wall       0 kB ( 0%) ggc
  tree if-combine       :   0.05 ( 0%) usr   0.01 ( 0%) sys   0.06  
( 0%) wall     224 kB ( 0%) ggc
  TOTAL                 : 241.56            16.30            
257.94             776880 kB
euler-34%

after your patches:

euler-43% patch < df-prob.patch
patching file df-problems.c
Hunk #1 succeeded at 1329 (offset 6 lines).
Hunk #3 succeeded at 1411 (offset 6 lines).
Hunk #5 succeeded at 1470 (offset 6 lines).
Hunk #7 succeeded at 1536 (offset 6 lines).

(The other one applied cleanly.)

euler-62% /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule- 
insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame- 
pointer -fPIC -fno-common -ftime-report -fmem-report -c all.i
Memory still allocated at the end of the compilation process
Size   Allocated        Used    Overhead
8           4096          16         120
16            72k         18k       1584
128         2144k       2135k         29k
256         4096        1536          56
512         4096        1024          56
1024         112k        110k       1568
2048          28k         22k        392
4096          76k         76k       1064
8192          48k         48k        336
16384         32k         32k        112
32768         32k         32k         56
131072        256k        256k        112
262144        512k        512k        112
524288       1024k       1024k        112
1048576       2048k       2048k        112
192          616k        300k       8624
144           20k       3024         280
160          132k        115k       1848
432           28k         21k        392
96            15M         14M        215k
48          2136k       1171k         33k
208          420k        375k       5880
64          1288k       1237k         20k
32           164k         64k       2952
80            29M       2060k        417k
Total         56M         25M        741k

String pool
entries         159286
identifiers     159286 (100.00%)
slots           262144
bytes           1995k (171k overhead)
table size      2048k
coll/search     0.9209
ins/search      0.2067
avg. entry      12.83 bytes (+/- 7.80)
longest entry   67

??? tree nodes created

(No per-node statistics)
Type hash: size 2039, 920 elements, 0.860000 collisions
DECL_DEBUG_EXPR  hash: size 16381, 0 elements, 1.332565 collisions
DECL_VALUE_EXPR  hash: size 1021, 0 elements, 0.000000 collisions

Execution times (seconds)
  garbage collection    :   1.03 ( 1%) usr   0.00 ( 0%) sys   1.03  
( 1%) wall       0 kB ( 0%) ggc
  callgraph construction:   0.77 ( 0%) usr   0.09 ( 1%) sys   0.88  
( 0%) wall   31928 kB ( 4%) ggc
  callgraph optimization:   1.04 ( 1%) usr   0.00 ( 0%) sys   1.03  
( 1%) wall       6 kB ( 0%) ggc
  ipa reference         :   0.21 ( 0%) usr   0.04 ( 0%) sys   0.24  
( 0%) wall       7 kB ( 0%) ggc
  cfg cleanup           :   2.20 ( 1%) usr   0.00 ( 0%) sys   2.21  
( 1%) wall     164 kB ( 0%) ggc
  trivially dead code   :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.35  
( 0%) wall       0 kB ( 0%) ggc
  df reaching defs      :  18.18 (10%) usr   3.25 (24%) sys  21.44  
(11%) wall       0 kB ( 0%) ggc
  df live regs          :  11.56 ( 7%) usr   0.00 ( 0%) sys  11.53  
( 6%) wall       0 kB ( 0%) ggc
  df live&initialized regs:  15.71 ( 9%) usr   0.02 ( 0%) sys  15.77  
( 8%) wall       0 kB ( 0%) ggc
  df use-def / def-use chains:   8.02 ( 5%) usr   2.28 (17%) sys   
10.30 ( 5%) wall       0 kB ( 0%) ggc
  df reg dead/unused notes:   0.95 ( 1%) usr   0.00 ( 0%) sys   0.95  
( 1%) wall   10801 kB ( 1%) ggc
  register information  :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.52  
( 0%) wall       0 kB ( 0%) ggc
  alias analysis        :   0.87 ( 0%) usr   0.00 ( 0%) sys   0.87  
( 0%) wall    7168 kB ( 1%) ggc
  register scan         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10  
( 0%) wall       4 kB ( 0%) ggc
  rebuild jump labels   :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall       0 kB ( 0%) ggc
  preprocessing         :   0.71 ( 0%) usr   1.05 ( 8%) sys   1.61  
( 1%) wall    2918 kB ( 0%) ggc
  lexical analysis      :   0.45 ( 0%) usr   1.86 (14%) sys   2.36  
( 1%) wall       0 kB ( 0%) ggc
  parser                :   1.37 ( 1%) usr   0.90 ( 7%) sys   2.38  
( 1%) wall   66023 kB ( 8%) ggc
  inline heuristics     :   0.69 ( 0%) usr   0.15 ( 1%) sys   0.82  
( 0%) wall       0 kB ( 0%) ggc
  tree gimplify         :   1.08 ( 1%) usr   0.05 ( 0%) sys   1.13  
( 1%) wall   62339 kB ( 8%) ggc
  tree eh               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall       0 kB ( 0%) ggc
  tree CFG construction :   0.50 ( 0%) usr   0.05 ( 0%) sys   0.54  
( 0%) wall   68526 kB ( 9%) ggc
  tree CFG cleanup      :   6.94 ( 4%) usr   0.00 ( 0%) sys   6.90  
( 4%) wall    3575 kB ( 0%) ggc
  tree copy propagation :   2.39 ( 1%) usr   0.05 ( 0%) sys   2.44  
( 1%) wall    4818 kB ( 1%) ggc
  tree find ref. vars   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.15  
( 0%) wall    1819 kB ( 0%) ggc
  tree PTA              :   1.93 ( 1%) usr   0.10 ( 1%) sys   2.04  
( 1%) wall    3734 kB ( 0%) ggc
  tree alias analysis   :   0.07 ( 0%) usr   0.10 ( 1%) sys   0.14  
( 0%) wall       0 kB ( 0%) ggc
  tree call clobbering  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02  
( 0%) wall       0 kB ( 0%) ggc
  tree flow sensitive alias:   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17  
( 0%) wall    2146 kB ( 0%) ggc
  tree memory partitioning:   1.25 ( 1%) usr   0.00 ( 0%) sys   1.25  
( 1%) wall       0 kB ( 0%) ggc
  tree PHI insertion    :   0.60 ( 0%) usr   0.03 ( 0%) sys   0.64  
( 0%) wall   18541 kB ( 2%) ggc
  tree SSA rewrite      :   1.92 ( 1%) usr   0.03 ( 0%) sys   1.98  
( 1%) wall   35021 kB ( 5%) ggc
  tree SSA other        :   0.19 ( 0%) usr   0.12 ( 1%) sys   0.29  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA incremental  :   9.05 ( 5%) usr   0.40 ( 3%) sys   9.35  
( 5%) wall   14359 kB ( 2%) ggc
  tree operand scan     :   0.69 ( 0%) usr   0.20 ( 1%) sys   0.90  
( 0%) wall   27918 kB ( 4%) ggc
  dominator optimization:   2.86 ( 2%) usr   0.04 ( 0%) sys   2.93  
( 2%) wall   44597 kB ( 6%) ggc
  tree SRA              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
  tree STORE-CCP        :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57  
( 0%) wall    1024 kB ( 0%) ggc
  tree CCP              :   1.14 ( 1%) usr   0.01 ( 0%) sys   1.15  
( 1%) wall    1537 kB ( 0%) ggc
  tree PHI const/copy prop:   0.24 ( 0%) usr   0.00 ( 0%) sys   0.22  
( 0%) wall      11 kB ( 0%) ggc
  tree split crit edges :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11  
( 0%) wall   33698 kB ( 4%) ggc
  tree reassociation    :   0.63 ( 0%) usr   0.01 ( 0%) sys   0.62  
( 0%) wall       1 kB ( 0%) ggc
  tree FRE              :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       5 kB ( 0%) ggc
  tree code sinking     :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       6 kB ( 0%) ggc
  tree linearize phis   :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
  tree forward propagate:   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33  
( 0%) wall     426 kB ( 0%) ggc
  tree conservative DCE :   1.58 ( 1%) usr   0.00 ( 0%) sys   1.59  
( 1%) wall       0 kB ( 0%) ggc
  tree aggressive DCE   :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall       0 kB ( 0%) ggc
  tree DSE              :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.37  
( 0%) wall       1 kB ( 0%) ggc
  PHI merge             :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.07  
( 0%) wall    7192 kB ( 1%) ggc
  tree loop bounds      :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16  
( 0%) wall       2 kB ( 0%) ggc
  loop invariant motion :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.31  
( 0%) wall       0 kB ( 0%) ggc
  tree canonical iv     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03  
( 0%) wall       0 kB ( 0%) ggc
  scev constant prop    :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.62  
( 0%) wall   17793 kB ( 2%) ggc
  complete unrolling    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
  tree loop init        :   3.13 ( 2%) usr   0.08 ( 1%) sys   3.26  
( 2%) wall   45121 kB ( 6%) ggc
  tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
  tree copy headers     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA uncprop      :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.26  
( 0%) wall       0 kB ( 0%) ggc
  tree SSA to normal    :  11.37 ( 7%) usr   0.09 ( 1%) sys  11.48  
( 6%) wall   90617 kB (12%) ggc
  tree NRV optimization :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01  
( 0%) wall       0 kB ( 0%) ggc
  tree rename SSA copies:   0.54 ( 0%) usr   0.02 ( 0%) sys   0.55  
( 0%) wall       0 kB ( 0%) ggc
  dominance frontiers   :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.45  
( 0%) wall       0 kB ( 0%) ggc
  dominance computation :   2.37 ( 1%) usr   0.05 ( 0%) sys   2.44  
( 1%) wall       0 kB ( 0%) ggc
  expand                :  13.62 ( 8%) usr   1.64 (12%) sys  15.22  
( 8%) wall   91541 kB (12%) ggc
  lower subreg          :   0.21 ( 0%) usr   0.01 ( 0%) sys   0.23  
( 0%) wall       0 kB ( 0%) ggc
  jump                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04  
( 0%) wall       0 kB ( 0%) ggc
  CSE                   :   0.76 ( 0%) usr   0.01 ( 0%) sys   0.77  
( 0%) wall    1403 kB ( 0%) ggc
  dead code elimination :   0.47 ( 0%) usr   0.00 ( 0%) sys   0.47  
( 0%) wall       0 kB ( 0%) ggc
  dead store elim1      :   0.42 ( 0%) usr   0.04 ( 0%) sys   0.46  
( 0%) wall    7973 kB ( 1%) ggc
  dead store elim2      :   0.47 ( 0%) usr   0.01 ( 0%) sys   0.48  
( 0%) wall    8688 kB ( 1%) ggc
  loop analysis         :   0.57 ( 0%) usr   0.02 ( 0%) sys   0.57  
( 0%) wall      70 kB ( 0%) ggc
  branch prediction     :   0.93 ( 1%) usr   0.00 ( 0%) sys   0.95  
( 1%) wall    1541 kB ( 0%) ggc
  combiner              :   2.61 ( 1%) usr   0.03 ( 0%) sys   2.64  
( 1%) wall   28000 kB ( 4%) ggc
  if-conversion         :   1.49 ( 1%) usr   0.00 ( 0%) sys   1.51  
( 1%) wall     586 kB ( 0%) ggc
  local alloc           :   7.94 ( 5%) usr   0.02 ( 0%) sys   7.97  
( 4%) wall    7070 kB ( 1%) ggc
  global alloc          :  17.58 (10%) usr   0.29 ( 2%) sys  17.88  
(10%) wall    4961 kB ( 1%) ggc
  reload CSE regs       :   1.18 ( 1%) usr   0.02 ( 0%) sys   1.18  
( 1%) wall   12069 kB ( 2%) ggc
  thread pro- & epilogue:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09  
( 0%) wall       4 kB ( 0%) ggc
  if-conversion 2       :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.34  
( 0%) wall     119 kB ( 0%) ggc
  rename registers      :   0.61 ( 0%) usr   0.03 ( 0%) sys   0.64  
( 0%) wall      29 kB ( 0%) ggc
  scheduling 2          :   2.51 ( 1%) usr   0.05 ( 0%) sys   2.55  
( 1%) wall       0 kB ( 0%) ggc
  machine dep reorg     :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.50  
( 0%) wall     148 kB ( 0%) ggc
  reorder blocks        :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.25  
( 0%) wall    6727 kB ( 1%) ggc
  final                 :   1.17 ( 1%) usr   0.02 ( 0%) sys   1.19  
( 1%) wall       0 kB ( 0%) ggc
  tree if-combine       :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05  
( 0%) wall     224 kB ( 0%) ggc
  TOTAL                 : 174.55            13.49            
188.09             776880 kB


Comment 39 Steven Bosscher 2007-12-20 00:02:38 UTC
We badly need a way to track memory in DF.  Because DF uses alloc_pools for almost all its data structures, the memory statistics are only recorded if you configure with --gather-detailed-mem-stats.  I think it would be good if the DF problems would report an estimate of their memory usage in the dump files, or if a function would be available that you can call from GDB to give such an estimate.
Comment 40 lucier 2007-12-20 02:29:32 UTC
Created attachment 14798 [details]
detailed memory usage report

I rebuilt mainline with --enable-gather-detailed-mem-stats and this is the output for the run in comment 38.
Comment 41 Kenneth Zadeck 2007-12-20 03:06:56 UTC
Subject: Re:  Inordinate compile times on large
 routines

lucier at math dot purdue dot edu wrote:
> ------- Comment #40 from lucier at math dot purdue dot edu  2007-12-20 02:29 -------
> Created an attachment (id=14798)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14798&action=view)
> detailed memory usage report
>
> I rebuilt mainline with --enable-gather-detailed-mem-stats and this is the
> output for the run in comment 38.
>
>
>   
you should look at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34400#c42

kenny
Comment 42 lucier 2007-12-20 03:52:38 UTC
Created attachment 14799 [details]
memory details for an unpatched mainline

Here is the same information without Steven's two patches for mainline.
Comment 43 Kenneth Zadeck 2007-12-20 14:49:11 UTC
Subject: Re:  Inordinate compile times on large
 routines

lucier at math dot purdue dot edu wrote:
> ------- Comment #42 from lucier at math dot purdue dot edu  2007-12-20 03:52 -------
> Created an attachment (id=14799)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
> memory details for an unpatched mainline
>
> Here is the same information without Steven's two patches for mainline.
>
>
>   
Could you add the attached patch in and rerun your example?

It will add 4 lines to indicate what kinds of def-use and use-def chains
are being created.
A lot of the space is being used by these chains and I want to find out
how many of those chains are for artificial uses and defs.

thanks

kenny

Index: df-problems.c
===================================================================
--- df-problems.c	(revision 131096)
+++ df-problems.c	(working copy)
@@ -1855,13 +1855,23 @@ df_live_verify_transfer_functions (void)
 
 #define df_chain_problem_p(FLAG) (((enum df_chain_flags)df_chain->local_flags)&(FLAG))
 
+static long df_chain_counters[4];
+
 /* Create a du or ud chain from SRC to DST and link it into SRC.   */
 
 struct df_link *
 df_chain_create (struct df_ref *src, struct df_ref *dst)
 {
   struct df_link *head = DF_REF_CHAIN (src);
-  struct df_link *link = pool_alloc (df_chain->block_pool);;
+  struct df_link *link = pool_alloc (df_chain->block_pool);
+  int index = 0;
+
+  if (!src->insn)
+    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
+  if (!dst->insn)
+    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
+
+  df_chain_counters[index]++;
   
   DF_REF_CHAIN (src) = link;
   link->next = head;
@@ -2156,11 +2166,18 @@ df_chain_finalize (bitmap all_blocks)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
-  
+
+  memset (df_chain_counters, 0, 4*sizeof(long));
+
   EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
     {
       df_chain_create_bb (bb_index);
     }
+
+  fprintf (stderr, "real -> real = %ld\n", df_chain_counters[0]);
+  fprintf (stderr, "real -> art  = %ld\n", df_chain_counters[1]);
+  fprintf (stderr, "art  -> real = %ld\n", df_chain_counters[2]);
+  fprintf (stderr, "art  -> art  = %ld\n", df_chain_counters[3]);
 }
 
 
Comment 44 stevenb.gcc@gmail.com 2007-12-20 15:08:16 UTC
Subject: Re:  Inordinate compile times on large routines

On 20 Dec 2007 14:49:12 -0000, zadeck at naturalbridge dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #43 from zadeck at naturalbridge dot com  2007-12-20 14:49 -------
> Subject: Re:  Inordinate compile times on large
>  routines
>
> lucier at math dot purdue dot edu wrote:
> > ------- Comment #42 from lucier at math dot purdue dot edu  2007-12-20 03:52 -------
> > Created an attachment (id=14799)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
> >  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
> > memory details for an unpatched mainline
> >
> > Here is the same information without Steven's two patches for mainline.
> >
> >
> >
> Could you add the attached patch in and rerun your example?
>
> It will add 4 lines to indicate what kinds of def-use and use-def chains
> are being created.
> A lot of the space is being used by these chains and I want to find out
> how many of those chains are for artificial uses and defs.
>
> thanks
>
> kenny
>  struct df_link *
>  df_chain_create (struct df_ref *src, struct df_ref *dst)
>  {
>    struct df_link *head = DF_REF_CHAIN (src);
> -  struct df_link *link = pool_alloc (df_chain->block_pool);;
> +  struct df_link *link = pool_alloc (df_chain->block_pool);
> +  int index = 0;
> +
> +  if (!src->insn)
> +    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
> +  if (!dst->insn)
> +    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
> +
> +  df_chain_counters[index]++;

Watch for segfaults. Index will be 1, 2, 3, or 4.
df_chain_counters[4] does not exist.
Comment 45 Kenneth Zadeck 2007-12-20 15:31:07 UTC
Subject: Re:  Inordinate compile times on large
 routines

stevenb dot gcc at gmail dot com wrote:
> ------- Comment #44 from stevenb dot gcc at gmail dot com  2007-12-20 15:08 -------
> Subject: Re:  Inordinate compile times on large routines
>
> On 20 Dec 2007 14:49:12 -0000, zadeck at naturalbridge dot com
> <gcc-bugzilla@gcc.gnu.org> wrote:
>   
>> ------- Comment #43 from zadeck at naturalbridge dot com  2007-12-20 14:49 -------
>> Subject: Re:  Inordinate compile times on large
>>  routines
>>
>> lucier at math dot purdue dot edu wrote:
>>     
>>> ------- Comment #42 from lucier at math dot purdue dot edu  2007-12-20 03:52 -------
>>> Created an attachment (id=14799)
>>>       
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
>   
>>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
>>     
>>>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799&action=view)
>>> memory details for an unpatched mainline
>>>
>>> Here is the same information without Steven's two patches for mainline.
>>>
>>>
>>>
>>>       
>> Could you add the attached patch in and rerun your example?
>>
>> It will add 4 lines to indicate what kinds of def-use and use-def chains
>> are being created.
>> A lot of the space is being used by these chains and I want to find out
>> how many of those chains are for artificial uses and defs.
>>
>> thanks
>>
>> kenny
>>  struct df_link *
>>  df_chain_create (struct df_ref *src, struct df_ref *dst)
>>  {
>>    struct df_link *head = DF_REF_CHAIN (src);
>> -  struct df_link *link = pool_alloc (df_chain->block_pool);;
>> +  struct df_link *link = pool_alloc (df_chain->block_pool);
>> +  int index = 0;
>> +
>> +  if (!src->insn)
>> +    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
>> +  if (!dst->insn)
>> +    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
>> +
>> +  df_chain_counters[index]++;
>>     
>
> Watch for segfaults. Index will be 1, 2, 3, or 4.
> df_chain_counters[4] does not exist.
>
>
>   
indexes will be 0, 1, 2, 3.

there are no def-def chains, and in particular there are no artificial
def to artificial def chains.  those increments only happen for
artificial defs or uses. Regular uses or defs have an insn.   a normal
def-use chain will have index 0.

Comment 46 Kenneth Zadeck 2007-12-20 16:06:24 UTC
Subject: Re:  Inordinate compile times on large
 routines


> indexes will be 0, 1, 2, 3.
>
> there are no def-def chains, and in particular there are no artificial
> def to artificial def chains.  those increments only happen for
> artificial defs or uses. Regular uses or defs have an insn.   a normal
> def-use chain will have index 0.
>
>
>   

however there is a bug with the patch that steven did not notice,  try
this one instead.


Index: df-problems.c
===================================================================
--- df-problems.c	(revision 131096)
+++ df-problems.c	(working copy)
@@ -1855,13 +1855,23 @@ df_live_verify_transfer_functions (void)
 
 #define df_chain_problem_p(FLAG) (((enum df_chain_flags)df_chain->local_flags)&(FLAG))
 
+static long df_chain_counters[4];
+
 /* Create a du or ud chain from SRC to DST and link it into SRC.   */
 
 struct df_link *
 df_chain_create (struct df_ref *src, struct df_ref *dst)
 {
   struct df_link *head = DF_REF_CHAIN (src);
-  struct df_link *link = pool_alloc (df_chain->block_pool);;
+  struct df_link *link = pool_alloc (df_chain->block_pool);
+  int index = 0;
+
+  if (!src->insn)
+    index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
+  if (!dst->insn)
+    index += (dst->type == DF_REF_REG_DEF) ? 2 : 1;
+
+  df_chain_counters[index]++;
   
   DF_REF_CHAIN (src) = link;
   link->next = head;
@@ -2156,11 +2166,18 @@ df_chain_finalize (bitmap all_blocks)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
-  
+
+  memset (df_chain_counters, 0, 4*sizeof(long));
+
   EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
     {
       df_chain_create_bb (bb_index);
     }
+
+  fprintf (stderr, "real -> real = %ld\n", df_chain_counters[0]);
+  fprintf (stderr, "real -> art  = %ld\n", df_chain_counters[1]);
+  fprintf (stderr, "art  -> real = %ld\n", df_chain_counters[2]);
+  fprintf (stderr, "art  -> art  = %ld\n", df_chain_counters[3]);
 }
 
 
Comment 47 lucier 2007-12-20 16:11:30 UTC
Subject: Re:  Inordinate compile times on large routines

I don't know what's happening here, the patch doesn't apply; first I get

euler-13% patch < zadeck2.patch
patching file df-problems.c
patch: **** malformed patch at line 8: df_chain_flags)df_chain- 
 >local_flags)&(FLAG))

and then after I join this line to the previous one (I think bugzilla  
reformatted those lines), I get

euler-15% !pa
patch < zadeck2.patch
patching file df-problems.c
Hunk #1 FAILED at 1855.
1 out of 2 hunks FAILED -- saving rejects to file df-problems.c.rej
euler-16% cat df-problems.c.rej
***************
*** 1855,1867 ****

   #define df_chain_problem_p(FLAG) (((enum df_chain_flags)df_chain- 
 >local_flags)&(FLAG))

   /* Create a du or ud chain from SRC to DST and link it into SRC.   */

   struct df_link *
   df_chain_create (struct df_ref *src, struct df_ref *dst)
   {
     struct df_link *head = DF_REF_CHAIN (src);
-   struct df_link *link = pool_alloc (df_chain->block_pool);;

     DF_REF_CHAIN (src) = link;
     link->next = head;
--- 1855,1877 ----

   #define df_chain_problem_p(FLAG) (((enum df_chain_flags)df_chain- 
 >local_flags)&(FLAG))

+ static long df_chain_counters[4];
+
   /* Create a du or ud chain from SRC to DST and link it into SRC.   */

   struct df_link *
   df_chain_create (struct df_ref *src, struct df_ref *dst)
   {
     struct df_link *head = DF_REF_CHAIN (src);
+   struct df_link *link = pool_alloc (df_chain->block_pool);
+   int index = 0;
+
+   if (!src->insn)
+     index += (src->type == DF_REF_REG_DEF) ? 2 : 1;
+   if (!dst->insn)
+     index += (dst->type == DF_REF_REG_DEF) ? 2 : 1;
+
+   df_chain_counters[index]++;

     DF_REF_CHAIN (src) = link;
     link->next = head;

Comment 48 Kenneth Zadeck 2007-12-20 17:28:14 UTC
Created attachment 14801 [details]
patch to count different types of def-use chains

this patch replaces the one munged by bugzilla
Comment 49 lucier 2007-12-20 18:56:02 UTC
Subject: Re:  Inordinate compile times on large routines

I think this is the extra information you wanted:

> real -> real = 163962912
> real -> art  = 0
> art  -> real = 0
> art  -> art  = 0


Brad
Comment 50 Kenneth Zadeck 2008-01-17 21:20:39 UTC
Subject: 

Mark,

Am I allowed to set the target milestone for a patch or is that your job?

26854 is not going to get fixed for 4.3. We made a lot of progress on it
with the patches to 34400, but largest remaining problem is the space
that the current representation of def-use and use-def chains requires. 
I should be able to almost cut this in half if we move to something like
a vec rather than a linked list.

But this is a big patch and i do not want to start this until stage I. 

kenny
Comment 51 Richard Biener 2008-01-17 21:43:53 UTC
As this isn't even marked at a regression, you can fix it whenever you like ;)

Only regressions have a target milestone before they are actually fixed, though.
Comment 52 Kenneth Zadeck 2008-01-17 21:46:37 UTC
Subject: Re:  Inordinate compile times on large
 routines

rguenth at gcc dot gnu dot org wrote:
> ------- Comment #51 from rguenth at gcc dot gnu dot org  2008-01-17 21:43 -------
> As this isn't even marked at a regression, you can fix it whenever you like ;)
>
> Only regressions have a target milestone before they are actually fixed,
> though.
>
>
>   
just between you and me this is most likely a regression, on the other
hand, i think that people who write functions this large should be
thrown into a pit.

kenny
Comment 53 lucier 2008-01-17 21:53:27 UTC
Subject: Re:  Inordinate compile times on large routines


On Jan 17, 2008, at 4:46 PM, zadeck at naturalbridge dot com wrote:

> just between you and me this is most likely a regression,

I, too, believe it is a regression; if you like I can come up with  
results from older compilers

> on the other
> hand, i think that people who write functions this large should be
> thrown into a pit.

Luckily, it was written by a code-generator, and not by hand. ;-)
Comment 54 lucier 2008-01-17 22:39:31 UTC
Created attachment 14963 [details]
memory details for 131610

This is the detailed memory usage for the compiler

euler-5% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2 --with-mpfr=/pkgs/gmp-4.2.2 --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.3.0 20080117 (experimental) [trunk revision 131610] (GCC) 

The maximum memory I observed in top was 10.2 GB.

Kenny, I can't tell whether your patch from

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34400#c50

has been committed; will that improve the situation, too?
Comment 55 Kenneth Zadeck 2008-01-17 22:57:48 UTC
Subject: Re:  Inordinate compile times on large
 routines

lucier at math dot purdue dot edu wrote:
> ------- Comment #54 from lucier at math dot purdue dot edu  2008-01-17 22:39 -------
> Created an attachment (id=14963)
>  --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14963&action=view)
> memory details for 131610
>
> This is the detailed memory usage for the compiler
>
> euler-5% /pkgs/gcc-mainline/bin/gcc -v
> Using built-in specs.
> Target: x86_64-unknown-linux-gnu
> Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
> --enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2
> --with-mpfr=/pkgs/gmp-4.2.2 --enable-gather-detailed-mem-stats
> Thread model: posix
> gcc version 4.3.0 20080117 (experimental) [trunk revision 131610] (GCC) 
>
> The maximum memory I observed in top was 10.2 GB.
>
> Kenny, I can't tell whether your patch from
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34400#c50
>
> has been committed; will that improve the situation, too?
>
>
>   
it could, but it is not the big issue here, the big issue is the size of
the def use chains.

Comment 56 lucier 2008-01-18 01:38:32 UTC
gcc is now 5-6 times faster than it was nearly two years ago when this was first reported; many changes have made significant improvements in cpu time.

But Steven Bosscher's patch from December still improved things more on this test case.

In particular, on 12/20/2007, without the patch, CPU time from

http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799

was

 TOTAL                 : 300.21            19.16           319.52             778432 kB

After Steven Bosscher's patch

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34400#c28

it was

 TOTAL                 : 210.97            15.80           226.88             778432 kB

and today it's

 TOTAL                 : 281.08            18.03           299.41             776514 kB

Would it still be a good idea to apply Steven's patch?
Comment 57 Kenneth Zadeck 2008-01-18 02:10:32 UTC
Subject: Re:  Inordinate compile times on large
 routines

lucier at math dot purdue dot edu wrote:
> ------- Comment #56 from lucier at math dot purdue dot edu  2008-01-18 01:38 -------
> gcc is now 5-6 times faster than it was nearly two years ago when this was
> first reported; many changes have made significant improvements in cpu time.
>
> But Steven Bosscher's patch from December still improved things more on this
> test case.
>
> In particular, on 12/20/2007, without the patch, CPU time from
>
> http://gcc.gnu.org/bugzilla/attachment.cgi?id=14799
>
> was
>
>  TOTAL                 : 300.21            19.16           319.52            
> 778432 kB
>
> After Steven Bosscher's patch
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34400#c28
>
> it was
>
>  TOTAL                 : 210.97            15.80           226.88            
> 778432 kB
>
> and today it's
>
>  TOTAL                 : 281.08            18.03           299.41            
> 776514 kB
>
> Would it still be a good idea to apply Steven's patch?
>
>
>   
the plan is to apply all of the patches,  they each deal with a
different problem and the improvement should be additive.

kenny
Comment 58 zadeck@gcc.gnu.org 2008-01-19 00:39:22 UTC
Subject: Bug 26854

Author: zadeck
Date: Sat Jan 19 00:38:34 2008
New Revision: 131649

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131649
Log:
2008-01-18  Kenneth Zadeck  <zadeck@naturalbridge.com>
	    Steven Bosscher  <stevenb.gcc@gmail.com>

	PR rtl-optimization/26854
	PR rtl-optimization/34400
	* df-problems.c (df_live_scratch): New scratch bitmap.
	(df_live_alloc): Allocate df_live_scratch when doing df_live.
	(df_live_reset): Clear the proper bitmaps.
	(df_live_bb_local_compute): Only process the artificial defs once
	since the order is not important.
	(df_live_init): Init the df_live sets only with the variables
	found live by df_lr.
	(df_live_transfer_function): Use the df_lr sets to prune the
	df_live sets as they are being computed.  
	(df_live_free): Free df_live_scratch.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/df-problems.c

Comment 59 zadeck@gcc.gnu.org 2008-01-20 01:49:14 UTC
Subject: Bug 26854

Author: zadeck
Date: Sun Jan 20 01:48:25 2008
New Revision: 131670

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131670
Log:
2008-01-19  Kenneth Zadeck <zadeck@naturalbridge.com>

	PR rtl-optimization/26854
	PR rtl-optimization/34400
	* ddg.c (create_ddg_dep_from_intra_loop_link): Do not use
	DF_RD->gen.
	* df.h (df_changeable_flags.DF_RD_NO_TRIM): New.
	(df_rd_bb_info.expanded_lr_out): New.
	* loop_invariant.c (find_defs): Added DF_RD_NO_TRIM flag.
	* loop_iv.c (iv_analysis_loop_init): Ditto.
	* df-problems.c (df_rd_free_bb_info, df_rd_alloc, df_rd_confluence_n,
	df_rd_bb_local_compute, df_rd_transfer_function, df_rd_free):
	Added code to allocate, initialize or free expanded_lr_out.
	(df_rd_bb_local_compute_process_def): Restructured to make
	more understandable.
	(df_rd_confluence_n): Add code to do nothing with fake edges and
	code to no apply invalidate_by_call sets if the sets are being trimmed.
	(df_lr_local_finalize): Renamed to df_lr_finalize.
	(df_live_local_finalize): Renamed to df_live_finalize.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ddg.c
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/loop-invariant.c
    trunk/gcc/loop-iv.c

Comment 60 zadeck@gcc.gnu.org 2008-01-22 13:57:48 UTC
Subject: Bug 26854

Author: zadeck
Date: Tue Jan 22 13:57:01 2008
New Revision: 131719

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131719
Log:
2008-01-22  Kenneth Zadeck <zadeck@naturalbridge.com>

	PR rtl-optimization/26854
	PR rtl-optimization/34400
	PR rtl-optimization/34884
	* ddg.c (create_ddg_dep_from_intra_loop_link): Use
	DF_RD->gen.
	* df.h (df_changeable_flags.DF_RD_NO_TRIM): Deleted
	(df_rd_bb_info.expanded_lr_out): Deleted
	* loop_invariant.c (find_defs): Deleted DF_RD_NO_TRIM flag.
	* loop_iv.c (iv_analysis_loop_init): Ditto.  * df-problems.c
	(df_rd_free_bb_info, df_rd_alloc, df_rd_confluence_n,
	df_rd_bb_local_compute, df_rd_transfer_function, df_rd_free):
	Removed code to allocate, initialize or free expanded_lr_out.
	(df_rd_bb_local_compute_process_def): Restructured to make more
	understandable.
	(df_rd_confluence_n): Removed code to no apply invalidate_by_call
	sets if the sets are being trimmed.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ddg.c
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/loop-invariant.c
    trunk/gcc/loop-iv.c

Comment 61 lucier 2008-01-23 15:03:45 UTC
Subject: Re:  Inordinate compile times on large routines

Kenny:

Even after you backed out this latest patch the CPU usage was down  
(to 203 seconds from 280 seconds on my machine) and the maximum  
memory usage was down (to 7.3 GB from 10.2 GB).  That's a big  
improvement.

Brad
Comment 62 lucier 2008-05-15 02:48:17 UTC
I thought I might test the ira branch with

euler-3% /pkgs/gcc-ira/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../ira/configure --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2/ --with-mpfr=/pkgs/gmp-4.2.2/ --prefix=/pkgs/gcc-ira --enable-languages=c --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20080328 (experimental) [ira revision 135280] (GCC) 

The command line was

/pkgs/gcc-ira/bin/gcc -fno-ira -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -ftime-report -fmem-report -c all.i

with -fira and with -fno-ira.

The ira branch takes a lot longer to compile this code with -fira than without it; the relevant lines seem to be:

for -fira:

 integrated RA         : 373.36 (66%) usr   0.33 ( 2%) sys 375.87 (64%) wall   12064 kB ( 2%) ggc
 TOTAL                 : 563.85            15.94           582.98             763565 kB

for -fno-ira:

 local alloc           :   8.42 ( 4%) usr   0.03 ( 0%) sys   8.43 ( 4%) wall    7073 kB ( 1%) ggc
 global alloc          :  20.91 (11%) usr   0.30 ( 2%) sys  21.23 (10%) wall    4961 kB ( 1%) ggc
 TOTAL                 : 196.25            17.55           213.84             766052 kB

I'll add the complete reports as the next two attachments.
Comment 63 lucier 2008-05-15 02:50:38 UTC
Created attachment 15639 [details]
statistics for ira branch with -fno-ira

This is the output of the command in the previous comment with -fno-ira
Comment 64 lucier 2008-05-15 02:51:40 UTC
Created attachment 15640 [details]
statistics for ira branch with -fira

This is the output of the command in the previous comment with -fira
Comment 65 Steven Bosscher 2008-05-15 05:59:11 UTC
 integrated RA         : 373.36 (66%) usr   0.33 ( 2%) sys 375.87 (64%) wall   12064 kB ( 2%) ggc

'nuff said.

Oh, not entirely yet: IRA should have more than one timevar.
Comment 66 Vladimir Makarov 2008-05-19 02:00:06 UTC
The problem with IRA was in too many allocnos to be chosen for spilling.  The most tome was spent in choosing the best allocno for spilling.  The patch solving the problem is coming.
Comment 67 Vladimir Makarov 2008-05-19 02:03:37 UTC
Subject: Bug 26854

Author: vmakarov
Date: Mon May 19 02:02:52 2008
New Revision: 135523

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=135523
Log:
2008-05-18  Vladimir Makarov  <vmakarov@redhat.com>
	PR tree-optimization/26854

	* timevar.def (TV_RELOAD): New timer.

	* ira.c (ira): Use TV_IRA and TV_RELOAD.
	(pass_ira): Remove TV_IRA.

	* Makefile.in (ira-color.o): Add SPLAY_TREE_H.

	* ira-conflicts.c (DEF_VEC_P, DEF_ALLOCC_P): Move to ira-int.h.

	* ira-int.h (DEF_VEC_P, DEF_ALLOCC_P): Move from ira-conflicts.c and
	ira-color.c.
	(struct allocno): New bitfield splay_removed_p.
	(ALLOCNO_MAY_BE_SPILLED_P): New macro.

	* ira-color.c (splay-tree.h): Add the header.
	(allocno_spill_priority_compare, splay_tree_allocate,
	splay_tree_free): New functions.
	(DEF_VEC_P, DEF_ALLOCC_P): Move to ira-int.h.
	(sorted_allocnos_for_spilling): Rename to allocnos_for_spilling.
	(splay_tree_node_pool, removed_splay_allocno_vec,
	uncolorable_allocnos_num, uncolorable_allocnos_splay_tree): New
	global variables.
	(add_allocno_to_bucket, add_allocno_to_ordered_bucket,
	delete_allocno_from_bucket): Update uncolorable_allocnos_num.
	(USE_SPLAY_P): New macro.
	(push_allocno_to_stack): Remove allocno from the splay tree.
	(push_allocnos_to_stack): Use the splay trees.
	(do_coloring): Create and finish splay_tree_node_pool.
	Move allocation/deallocation of allocnos_for_spilling to here...
	(initiate_ira_assign, finish_ira_assign): Move
	allocnos_for_spilling from here...
	(ira_color): Allocate/deallocate removed_splay_allocno_vec.

	* ira-build.c (DEF_VEC_P, DEF_ALLOCC_P): Move to ira-int.h.
	(create_allocno): Initiate ALLOCNO_SPLAY_REMOVED_P.


Modified:
    branches/ira/gcc/ChangeLog
    branches/ira/gcc/Makefile.in
    branches/ira/gcc/ira-build.c
    branches/ira/gcc/ira-color.c
    branches/ira/gcc/ira-conflicts.c
    branches/ira/gcc/ira-int.h
    branches/ira/gcc/ira.c
    branches/ira/gcc/timevar.def

Comment 68 Vladimir Makarov 2008-05-19 02:08:24 UTC
The patch solving IRA problem is described in
http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01093.html
Comment 69 lucier 2008-05-19 17:54:19 UTC
That really smashed the problem.  I find the following timings without IRA:

 local alloc           :   8.53 ( 4%) usr   0.01 ( 0%) sys   8.59 ( 3%) wall    7073 kB ( 1%) ggc
 global alloc          :  30.44 (14%) usr   0.33 ( 2%) sys  30.83 (12%) wall    4961 kB ( 1%) ggc

 TOTAL                 : 211.48            17.00           261.74             766052 kB

and with IRA:

 integrated RA         :  10.58 ( 5%) usr   0.37 ( 2%) sys  11.05 ( 5%) wall    7138 kB ( 1%) ggc
 reload                :  11.89 ( 6%) usr   0.01 ( 0%) sys  11.96 ( 5%) wall    4925 kB ( 1%) ggc
 TOTAL                 : 200.18            16.10           221.53             763565 kB

Thanks!

Brad
Comment 70 lucier 2008-07-10 17:36:44 UTC
Created attachment 15893 [details]
detailed memory stats for trunk revision 137644

These are the detailed memory stats for

euler-11% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2/ --with-mpfr=/pkgs/gmp-4.2.2/ --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20080708 (experimental) [trunk revision 137644] (GCC) 

applied to this problem, with command line

/pkgs/gcc-mainline/bin/gcc -Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -ftime-report -fmem-report -c all.i >& mainline-stats-O3

The run time isn't so bad, but the memory usage still peaks at 7.3 gigs.

Now that distributions have started shipping 4.2.whatever (Ubuntu 8.04 ships 4.2.3), this problem is showing up more an more as a regression against previous releases of gcc.
Comment 71 lucier 2008-07-10 17:44:57 UTC
Here are additional informal comparisons of 4.2.3 with Apple's 4.0.1 and gcc 3.4.5 on mingw:

https://webmail.iro.umontreal.ca/pipermail/gambit-list/2008-July/002450.html
Comment 72 Richard Biener 2008-07-10 19:37:43 UTC
The memory counters for DF even overflow ;)
Comment 73 Kenneth Zadeck 2008-07-10 19:40:04 UTC
Subject: Re:  Inordinate compile times on large
 routines

rguenth at gcc dot gnu dot org wrote:
> ------- Comment #72 from rguenth at gcc dot gnu dot org  2008-07-10 19:37 -------
> The memory counters for DF even overflow ;)
>
>
>   
we have our best people working on it.  this is what fuds are supposed 
to fix.

kenny
Comment 74 lucier 2008-09-10 13:39:31 UTC
This need for more memory is a regression from earlier versions of gcc.

Can this bug be marked with

  	[4.3/4.4 Regression]

in the subject line?
Comment 75 lucier 2008-09-18 01:19:06 UTC
Created attachment 16350 [details]
statistics with checking enabled and using longs to count bytes

Using the patch from

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html

I gathered statistics using 64-bit longs for this test case. Using it, one finds that 10,292,897,120 bytes of bitmaps and 6,449,831,120 bytes in alloc-pools are allocated with mainline for this test case (at least when checking is enabled).
Comment 76 lucier 2008-09-26 15:43:38 UTC
Created attachment 16411 [details]
memory and cpu time statistics for 2008-09-19

There has been a 13% compile-time regression on this PR since September 19.  Looking at the statistics, it appears that there is a general increase in cpu time in things that deal with df-chains.

This is the timings from 9/18; I'll include last night's times next.
Comment 77 lucier 2008-09-26 15:44:24 UTC
Created attachment 16412 [details]
memory and cpu statistics for 9/25

Here is a timing report from today.
Comment 78 lucier 2008-09-26 15:45:46 UTC
Created attachment 16413 [details]
Memory and cpu statistics from 9/16

Sorry, I included the wrong file; this should be the correct one from 9/16.
Comment 79 Richard Biener 2009-01-24 10:19:29 UTC
GCC 4.3.3 is being released, adjusting target milestone.
Comment 80 Paolo Bonzini 2009-02-04 12:45:24 UTC
Brad, can you produce new stats?
Comment 81 lucier 2009-02-04 17:27:16 UTC
Created attachment 17243 [details]
Memory and CPU statistics for 2009/02/04
Comment 82 lucier 2009-02-04 17:28:32 UTC
I still have the bitmap.c patch from

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html

in my tree so I don't get meaningless statistics for bitmaps.  (Kenny installed in the trunk something like the patch above for alloc-pool.c.)

There are more bitmaps allocated than on 2008-09-26 (13GB instead of 12GB).

3GB was allocated in alloc-pool.

Execution time was worse, 228.17 user seconds versus 168 seconds.

I didn't watch top to estimate the maximum memory usage.

This is with

euler-8% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20090204 (experimental) [trunk revision 143922] (GCC) 

Brad
Comment 83 Daniel Berlin 2009-02-04 18:24:14 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

These numbers claim a leak of the graph->preds bitmap (and related
bitmaps) which are quite clearly freed all the time.
These bitmaps are allocated onto the predbitmap obstack, which is
released through remove_preds_and_fake_succs.
It always executes, so i have trouble understanding why it considers
this a leak.


On Wed, Feb 4, 2009 at 12:28 PM, lucier at math dot purdue dot edu
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #82 from lucier at math dot purdue dot edu  2009-02-04 17:28 -------
> I still have the bitmap.c patch from
>
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html
>
> in my tree so I don't get meaningless statistics for bitmaps.  (Kenny installed
> in the trunk something like the patch above for alloc-pool.c.)
>
> There are more bitmaps allocated than on 2008-09-26 (13GB instead of 12GB).
>
> 3GB was allocated in alloc-pool.
>
> Execution time was worse, 228.17 user seconds versus 168 seconds.
>
> I didn't watch top to estimate the maximum memory usage.
>
> This is with
>
> euler-8% /pkgs/gcc-mainline/bin/gcc -v
> Using built-in specs.
> Target: x86_64-unknown-linux-gnu
> Configured with: ../../mainline/configure --enable-checking=release
> --prefix=/pkgs/gcc-mainline --enable-languages=c
> --enable-gather-detailed-mem-stats
> Thread model: posix
> gcc version 4.4.0 20090204 (experimental) [trunk revision 143922] (GCC)
>
> Brad
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 84 Richard Biener 2009-02-11 13:22:23 UTC
Btw, for further analyzing it would be nice to have a "smaller" testcase.
Smaller being an order of magnitude less states in ___H__20_all_2e_o1()
(an order of magnitude less label addresses in ___hlbl_tbl).

The source looks sort-of autogenerated, so, is it possible to produce such
a smaller testcase?  Thanks!
Comment 85 Richard Biener 2009-02-13 11:08:42 UTC
*** Bug 39157 has been marked as a duplicate of this bug. ***
Comment 86 lucier 2009-02-13 15:40:37 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

It's unfortunate that the discussion from 39157 will be somewhat hard to
find now that that bug is closed.

Steven wrote in a comment for 39157:

        It's not like there will not be any loop invariant code motion
        (LICM) at all anymore if the RTL LICM pass is disabled.  There
        is an LICM pass on GIMPLE, and there is also PRE for GIMPLE (and
        lazy code motion for RTL but I think it disables itself for your
        test case).
        
        The RTL LICM pass mostly cleans up after expand, i.e. moves
        things that are not exposed in GIMPLE. This is mostly just
        address calculations.


The loop in _num.i that I mentioned in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157#c19

is the loop in PR 33928 that is no longer fully optimized after Paolo
(and you, I guess, your name is on the patch) added PRE and disabled
some optimizations in CSE, and what is no longer optimized in that loop
are address calculations.  I don't know whether those address
calculations fall under LICM, the only point I'm trying to make right
now is that address calculations are no longer optimized as much as they
were before 

http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118475

and address calculations are an important class of calculations to
optimize.

Comment 87 Paolo Bonzini 2009-02-13 16:54:54 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

> It's unfortunate that the discussion from 39157 will be somewhat hard to
> find now that that bug is closed.

Well, the patch there is not lost, I suppose Jakub will finish it and
post it.

The problem is that -O1 was never meant to give "very fast" code.  You
are using it only because our throttling of expensive passes is
insufficient.  Fixing that has two sides, as done in PR39157's
discussion: 1) disabling more passes at -O1, 2) establishing some
parameters to throttle down passes at -O2.

Ultimately, the goal should be that you can use -O2.

Paolo

Comment 88 Jakub Jelinek 2009-02-13 17:06:17 UTC
The patch in PR39157 is IMHO finished and has been bootstrapped/regtested on x86_64-linux and i686-linux.  I haven't posted it looked like Richard, Zdenek and Steven prefer some other solution for it.  If this isn't solved for 4.4 soon, I'm going to post that patch.
Comment 89 lucier 2009-02-13 17:30:14 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Fri, 2009-02-13 at 17:06 +0000, jakub at gcc dot gnu dot org wrote:
> 
> 
> ------- Comment #88 from jakub at gcc dot gnu dot org  2009-02-13 17:06 -------
> The patch in PR39157 is IMHO finished and has been bootstrapped/regtested on
> x86_64-linux and i686-linux.  I haven't posted it looked like Richard, Zdenek
> and Steven prefer some other solution for it.  If this isn't solved for 4.4
> soon, I'm going to post that patch.

I have to leave town within the hour and I may not be able to look at
this properly until Wednesday or so, but it would be interesting to me
to know how large (how many nodes?) are the 139 loops in _num.i referred
to in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39157#c19

This information may suggest how large the default parameters should be
for -O1 and -O2.  (For example, if all the non-whole-function loops have
< 2000 instructions, then 5000 might be a reasonable limit for -O1
loops.)

Comment 90 lucier 2009-02-13 17:37:28 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Fri, 2009-02-13 at 16:54 +0000, bonzini at gnu dot org wrote:
> 
> 
> ------- Comment #87 from bonzini at gnu dot org  2009-02-13 16:54 -------

> The problem is that -O1 was never meant to give "very fast" code.

I'm not looking for "very fast" code, I'm looking for code that doesn't
get > 30% slower from one SVN revision number to the next.

> You
> are using it only because our throttling of expensive passes is
> insufficient.

I am using -O1 because code of this type compiled with -O2 runs
significantly more slowly than code of this type compiled with -O1. I
have never used -O2 on this type of code.

> Fixing that has two sides, as done in PR39157's
> discussion: 1) disabling more passes at -O1, 2) establishing some
> parameters to throttle down passes at -O2.

I don't see that (1) and (2) form the main strategy to fix "that", it
seems that understanding the existing optimizations that are being
disabled in preference for new ones is a good start.  And generally
ensuring that -O1 code doesn't get significantly slower while compile
times get significantly higher.

Comment 91 lucier 2009-02-13 17:43:51 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Fri, 2009-02-13 at 17:37 +0000, lucier at math dot purdue dot edu
wrote:
> ------- Comment #90 from lucier at math dot purdue dot edu  2009-02-13 17:37 -------
> Subject: Re:  [4.3/4.4 Regression] Inordinate
>  compile times on large routines
> 
> On Fri, 2009-02-13 at 16:54 +0000, bonzini at gnu dot org wrote:
> > 
> > 
> > ------- Comment #87 from bonzini at gnu dot org  2009-02-13 16:54 -------
> 
> > The problem is that -O1 was never meant to give "very fast" code.
> 
> I'm not looking for "very fast" code, I'm looking for code that doesn't
> get > 30% slower from one SVN revision number to the next.

Sorry, this comment refers to PR 33928, not this PR.


Comment 92 stevenb.gcc@gmail.com 2009-02-14 14:42:36 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

Re: Comment #88

I think the patch is perfectly acceptable as a stop-gap solution.  I
don't think we have anything better for 4.4.  Maybe you can add a
FIXME, though...
Comment 93 lucier 2009-02-14 21:58:10 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate compile times on large routines

I instrumented the compiler and looked how many nodes were in each  
loop processed by LICM for the Gambit runtime and compiler.

For generated code, except for the "loop" that contained the entire  
function, the greatest number of nodes was 30.  (Because computed  
gotos are used in the code that checks for heap and stack overflows  
after allocations and for waiting interrupts, it's hard to go long in  
Scheme code without hitting the "big loop".)  For hand-written code,  
the greatest number of nodes in a loop was 123.

When bootstrapping gcc with --enable-languages=c, the largest number  
of nodes in a loop was 803, and there were 12 loops detected that had  
over 500 nodes.  548 loops had 100 nodes or greater. (This is a  
bootstrap, so some files were compiled twice with the instrumented  
compiler.)

So perhaps an -O1 default for LICM of 100 nodes is reasonable, or  
perhaps one might up it to 1000 just to catch everything "reasonable".

Brad
Comment 94 Daniel Berlin 2009-02-14 23:06:53 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

One of the reasons LCM in RTL is so slow is because it uses a crappy
iteration order.
With the right iteration order, it should be fast enough to turn it
back on and remove the address calculations in these testcases.

If it was block based, it would have been converted to use the DF
solver and gotten this automatically, but because it's edge based,
pretty much nobody has touched it since it was created :)

Even adding qsorts in the right place that sort the worklists into the
right order on each iteration would probably help orders of magnitude
here (though moving to the two worklist solver that DF now uses would
be even better).

On Sat, Feb 14, 2009 at 4:58 PM, lucier at math dot purdue dot edu
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #93 from lucier at math dot purdue dot edu  2009-02-14 21:58 -------
> Subject: Re:  [4.3/4.4 Regression] Inordinate compile times on large routines
>
> I instrumented the compiler and looked how many nodes were in each
> loop processed by LICM for the Gambit runtime and compiler.
>
> For generated code, except for the "loop" that contained the entire
> function, the greatest number of nodes was 30.  (Because computed
> gotos are used in the code that checks for heap and stack overflows
> after allocations and for waiting interrupts, it's hard to go long in
> Scheme code without hitting the "big loop".)  For hand-written code,
> the greatest number of nodes in a loop was 123.
>
> When bootstrapping gcc with --enable-languages=c, the largest number
> of nodes in a loop was 803, and there were 12 loops detected that had
> over 500 nodes.  548 loops had 100 nodes or greater. (This is a
> bootstrap, so some files were compiled twice with the instrumented
> compiler.)
>
> So perhaps an -O1 default for LICM of 100 nodes is reasonable, or
> perhaps one might up it to 1000 just to catch everything "reasonable".
>
> Brad
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 95 stevenb.gcc@gmail.com 2009-02-15 11:26:40 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

Re: Comment #94
The trouble with LCM in RTL (i.e. GCSE-PRE) is not that it is slow (or
that it is disabled -- istr it is enabled at -O2), and also not that
it is edge based. The problem is that it doesn't handle cascading
expressions, because that just doesn't fit in the LCM framework. You
have to iterate RTL GCSE-PRE to move the same invariants as what RTL
LICM (i.e. loop-invariant.c) can achieve.

(GCSE-PRE is old code from a time when GCC didn't really have a proper
CFG. It is edge based because for block based you need critical edge
splitting, which was was prohibitively expensive in the Old Days.
Nowadays, gcse.c+lcm.c works in cfglayout mode and pre-splitting
critical edges would be cheap, so it would be a good idea to
experiment with a block based GCSE-PRE rewrite...)
Comment 96 Daniel Berlin 2009-02-16 02:07:46 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

Uh, it's most certainly disabled on testcases like his.
look at is_too_expensive in gcse.c

This is in fact done because LCM iteration takes too long on
flowgraphs like that, because of it's iteration order.




On Sun, Feb 15, 2009 at 6:26 AM, stevenb dot gcc at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #95 from stevenb dot gcc at gmail dot com  2009-02-15 11:26 -------
> Subject: Re:  [4.3/4.4 Regression] Inordinate
>        compile times on large routines
>
> Re: Comment #94
> The trouble with LCM in RTL (i.e. GCSE-PRE) is not that it is slow (or
> that it is disabled -- istr it is enabled at -O2), and also not that
> it is edge based. The problem is that it doesn't handle cascading
> expressions, because that just doesn't fit in the LCM framework. You
> have to iterate RTL GCSE-PRE to move the same invariants as what RTL
> LICM (i.e. loop-invariant.c) can achieve.
>
> (GCSE-PRE is old code from a time when GCC didn't really have a proper
> CFG. It is edge based because for block based you need critical edge
> splitting, which was was prohibitively expensive in the Old Days.
> Nowadays, gcse.c+lcm.c works in cfglayout mode and pre-splitting
> critical edges would be cheap, so it would be a good idea to
> experiment with a block based GCSE-PRE rewrite...)
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 97 Jakub Jelinek 2009-02-20 13:03:25 UTC
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144320
limits now RTL LICM to loops with less than 10000 bbs (-O{2,3,s}) resp. 1000 bbs (-O1).
Comment 98 lucier 2009-02-20 19:52:20 UTC
Thank you, that indeed "fixes" the LICM problem.

Based on some comments for this PR and for PR 39157 I thought that a similar patch might apply to PRE.  So with

euler-14% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release --prefix=/pkgs/gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20090220 (experimental) [trunk revision 144328] (GCC) 

I ran this command

/pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report compiler.i -save-temps > & ! report-compiler

where compiler.i is found at

http://www.math.purdue.edu/~lucier/bugzilla/8/

and I killed the job after it required 17GB of RAM.  This job compiles just fine with

euler-15% /pkgs/gcc-4.1.2/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/pkgs/gcc-4.1.2
Thread model: posix
gcc version 4.1.2

in about 1.5 GB of RAM.

To derive some statistics I ran

/pkgs/gcc-mainline/bin/gcc -v -c -O2 -fmem-report -ftime-report _num.i -save-temps > & ! report-num

where the smaller file _num.i is also found at

http://www.math.purdue.edu/~lucier/bugzilla/8/

I'll attach report-num to this PR.  The highlights are

 PRE                   :  23.28 (24%) usr   0.01 ( 0%) sys  23.51 (24%) wall     681 kB ( 0%) ggc
 integrated RA         :  12.70 (13%) usr   0.00 ( 0%) sys  12.83 (13%) wall    3709 kB ( 2%) ggc
 TOTAL                 :  95.93             2.73            99.72             227422 kB

and that's about it, nothing else above 5%.  There are also accurate memory statistics, as I've added a patch to my local sources so that memory statistics don't overflow 32-bit counters.

I think the -O1 and -O2 limits for LICM are quite reasonable; would it be possible to limit PRE similarly so that one could compile compiler.i with -O2 in a reasonable amount of memory?

Comment 99 lucier 2009-02-20 19:54:39 UTC
Created attachment 17336 [details]
Memory and CPU statistics when compiling _num.i with -O2
Comment 100 lucier 2009-02-20 19:56:33 UTC
The large memory requirements for LICM at -O1 and -O2 is still a regression for the 4.2 and 4.3 branches.  Jakub's patch is short and elegant; do you think it would be a good idea to backport it to the other open branches?
Comment 101 Daniel Berlin 2009-02-21 04:13:49 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

PRE already gives up on this testcase, at least on my computer, and
takes no memory.
All of the memory here is being eaten by IRA and DF.
The actual time sink is SCCVN's DFS, which builds a large SCC then
counts it's size and gives up (which in turn causes PRE to give up).

It's not clear you can really modify this to give up earlier than it
does (since you don't know the size of the SCC until it's already done
all the work anyway) without a ton of work.

I'm replacing this algorithm with a non-SCC based one in 4.5.

On Fri, Feb 20, 2009 at 2:52 PM, lucier at math dot purdue dot edu
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #98 from lucier at math dot purdue dot edu  2009-02-20 19:52 -------
> Thank you, that indeed "fixes" the LICM problem.
>
> Based on some comments for this PR and for PR 39157 I thought that a similar
> patch might apply to PRE.
..

>I think the -O1 and -O2 limits for LICM are quite reasonable; would it be
>possible to limit PRE similarly so that one could compile compiler.i with -O2
>in a reasonable amount of memory?
Comment 102 lucier 2009-02-21 18:30:54 UTC
Please humor me:

PRE = Partial Redundancy Elimination
IRA = Integrated Register Allocator
DF = ???
SCCVN = ??? Value Numbering?
DFS = ???
SCC = ? Confict ?
Comment 103 rguenther@suse.de 2009-02-21 18:42:30 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

On Sat, 21 Feb 2009, lucier at math dot purdue dot edu wrote:

> ------- Comment #102 from lucier at math dot purdue dot edu  2009-02-21 18:30 -------
> Please humor me:
> 
> PRE = Partial Redundancy Elimination
> IRA = Integrated Register Allocator
> DF = ???
> SCCVN = ??? Value Numbering?
> DFS = ???
> SCC = ? Confict ?

http://gcc.gnu.org/wiki/abbreviations_and_acronyms
Comment 104 lucier 2009-02-21 18:56:45 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate
 compile times on large routines

Cool, that leaves me with

> > DFS = ???
> > SCC = ? Confict ?

Comment 105 Steven Bosscher 2009-02-21 19:04:10 UTC
SCC as in SCCVN
DFS = Depth First Search
Comment 106 Daniel Berlin 2009-02-21 22:34:45 UTC
Subject: Re:  [4.3/4.4 Regression] Inordinate 
	compile times on large routines

Right.
Basically, the value numbering PRE uses as a pre-pass is known as SCCVN.
It value numbers by doing a depth first search over the SSA variables,
iterating only over cycles (which end up forming Strongly Connected
Components in this graph).
In your case, you end up with a strongly connected component
containing 46000 variables.  Value numbering gives up at that point
(one value numbering gives up, PRE gives up as well).
The SCC finding algorithm is linear (the value numbering algorithm is
not) but the constant can be large sometimes.
My guess is that in this case, we are wasting time in the vec pushing
or something.
I haven't profiled it.

On Sat, Feb 21, 2009 at 2:04 PM, steven at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #105 from steven at gcc dot gnu dot org  2009-02-21 19:04 -------
> SCC as in SCCVN
> DFS = Depth First Search
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 107 Paolo Bonzini 2009-05-08 12:22:46 UTC
Subject: Bug 26854

Author: bonzini
Date: Fri May  8 12:22:30 2009
New Revision: 147282

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147282
Log:
2009-05-08  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/33928
	PR 26854
	* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
	process_uses, build_single_def_use_links): New.
	(update_df): Update use_def_ref.
	(forward_propagate_into): Use get_def_for_use instead of use-def
	chains.
	(fwprop_init): Call build_single_def_use_links and let it initialize
	dataflow.
	(fwprop_done): Free use_def_ref.
	(fwprop_addr): Eliminate duplicate call to df_set_flags.
	* df-problems.c (df_rd_simulate_artificial_defs_at_top, 
	df_rd_simulate_one_insn): New.
	(df_rd_bb_local_compute_process_def): Update head comment.
	(df_chain_create_bb): Use the new RD simulation functions.
	* df.h (df_rd_simulate_artificial_defs_at_top, 
	df_rd_simulate_one_insn): New.
	* opts.c (decode_options): Enable fwprop at -O1.
	* doc/invoke.texi (-fforward-propagate): Document this.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/fwprop.c
    trunk/gcc/opts.c

Comment 108 Paolo Bonzini 2009-06-15 16:30:16 UTC
http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968
This is the current state of -ftime-report/-fmem-report after the proposed reimplementation of fwprop's dataflow.

Remaining hogs are:

1) Accounting for TV_ALIAS_STMT_WALKING is expensive.  Waiting for info from Brad as to how much of the cost is paid without -ftime-report.  If it turns out to be noticeable, Richi preapproved the trivial patch to remove the timevar.

2) CFG cleanup uses heavily the iterative fixing of dominators in
remove_edge_and_dominated_blocks, which on this testcase is very expensive.  Probably we should make sure no dominators are there in some key cfgcleanup passes, or just kill dominators at the beginning of CFG cleanup if the testcase is particularly large.
Comment 109 Paolo Bonzini 2009-06-27 14:48:48 UTC
Subject: Bug 26854

Author: bonzini
Date: Sat Jun 27 14:48:34 2009
New Revision: 149010

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149010
Log:
2009-06-07  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/26854
        * timevar.def: Remove TV_DF_RU, add TV_DF_MD.
        * df-problems.c (df_rd_add_problem): Fix comment.
        (df_md_set_bb_info, df_md_free_bb_info, df_md_alloc,
        df_md_simulate_artificial_defs_at_top,
        df_md_simulate_one_insn, df_md_bb_local_compute_process_def,
        df_md_bb_local_compute, df_md_local_compute, df_md_reset,
        df_md_transfer_function, df_md_init, df_md_confluence_0,
        df_md_confluence_n, df_md_free, df_md_top_dump, df_md_bottom_dump,
        problem_MD, df_md_add_problem): New.
        * df.h (DF_MD, DF_MD_BB_INFO, struct df_md_bb_info, df_md,
        df_md_get_bb_info): New.
        DF_LAST_PROBLEM_PLUS1): Adjust.

        * Makefile.in (fwprop.o): Include domwalk.h.
        * fwprop.c: Include domwalk.h.
        (reg_defs, reg_defs_stack): New.
        (bitmap_only_bit_between): Remove.
        (process_defs): New.
        (process_uses): Use reg_defs and local_md instead of
        bitmap_only_bit_between and local_rd.
        (single_def_use_enter_block): New, from build_single_def_use_links.
        (single_def_use_leave_block): New.
        (build_single_def_use_links): Remove code moved to
        single_def_use_enter_block, invoke domwalk.
        (use_killed_between): Adjust comment.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/df-problems.c
    trunk/gcc/df.h
    trunk/gcc/fwprop.c
    trunk/gcc/timevar.def

Comment 110 Richard Biener 2009-08-04 12:27:35 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 111 Peter Bergner 2009-10-03 01:39:35 UTC
Subject: Bug 26854

Author: bergner
Date: Sat Oct  3 01:39:14 2009
New Revision: 152430

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152430
Log:
	Backport from mainline.

	2009-08-30  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (get_reg_use_in): Delete.
	(free_load_extend): New function.
	(forward_propagate_subreg): Use it.

	2009-08-23  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (try_fwprop_subst): Allow multiple sets.
	(get_reg_use_in): New function.
	(forward_propagate_subreg): Propagate through subreg of zero_extend
	or sign_extend.

	2009-05-08  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/33928
	PR 26854
	* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
	process_uses, build_single_def_use_links): New.
	(update_df): Update use_def_ref.
	(forward_propagate_into): Use get_def_for_use instead of use-def
	chains.
	(fwprop_init): Call build_single_def_use_links and let it initialize
	dataflow.
	(fwprop_done): Free use_def_ref.
	(fwprop_addr): Eliminate duplicate call to df_set_flags.
	* df-problems.c (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	(df_rd_bb_local_compute_process_def): Update head comment.
	(df_chain_create_bb): Use the new RD simulation functions.
	* df.h (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	* opts.c (decode_options): Enable fwprop at -O1.
	* doc/invoke.texi (-fforward-propagate): Document this.

Modified:
    branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_3-branch/gcc/REVISION
    branches/ibm/gcc-4_3-branch/gcc/df-problems.c
    branches/ibm/gcc-4_3-branch/gcc/df.h
    branches/ibm/gcc-4_3-branch/gcc/doc/invoke.texi
    branches/ibm/gcc-4_3-branch/gcc/fwprop.c
    branches/ibm/gcc-4_3-branch/gcc/opts.c

Comment 112 Jack Howarth 2010-03-26 17:44:40 UTC
What is the status of this bug?
Comment 113 lucier 2010-03-27 04:27:56 UTC
Created attachment 20220 [details]
time/mem report compiling compiler.i

This is the time and detailed memory report for 20100302 compiling compiler.i above with main optimization options -O1 -fschedule-insns2 (precise command line and configuration options are given at the top of the file).

With these optimization levels cpu time and memory don't look too bad to me.  The main routines are

 parser                : 320.93 (59%) usr   1.40 (27%) sys 322.62 (59%) wall  103143 kB (15%) ggc
 tree CFG cleanup      :  73.43 (14%) usr   0.01 ( 0%) sys  73.46 (13%) wall    1388 kB ( 0%) ggc

Nothing else is above 3%.

I'm building today's gcc on an X86-64 RHEL5 machine with more memory to test with -O3 -fschedule-insns, as this set of options now gives about 20% speedup on some of my codes of this type.
Comment 114 lucier 2010-03-27 04:59:30 UTC
Created attachment 20221 [details]
time/mem report compiling compiler.i

This is the time and detailed memory report for compiling compiler.i with today's gcc and optimization level -O3 -fschedule-insns.  Again, the detailed configuration information and command line are contained at the beginning of the file.

Except for taking > 20GB of RAM, this doesn't look too bad, either.  The passes taking the most time are:

 parser                : 222.18 (21%) usr   2.95 (11%) sys 225.37 (21%) wall  103148 kB (11%) ggc
 tree CFG cleanup      :  63.67 ( 6%) usr   0.00 ( 0%) sys  63.60 ( 6%) wall    2467 kB ( 0%) ggc
 scheduling            : 394.04 (37%) usr   0.00 ( 0%) sys 394.04 (36%) wall    5824 kB ( 1%) ggc
 TOTAL                 :1056.69            26.47          1083.41             916872 kB
Comment 115 lucier 2010-03-27 05:20:00 UTC
Created attachment 20222 [details]
time/mem report compiling compiler.i with -O1

Here is the time and memory report with -O1 -fschedule-insns2 on the same machine as the -O3 -fschedule-insns report.

The biggest times are:

 parser                : 224.89 (54%) usr   2.61 (24%) sys 226.97 (53%) wall  103148 kB (15%) ggc
 tree CFG cleanup      :  60.61 (15%) usr   0.00 ( 0%) sys  60.58 (14%) wall    1388 kB ( 0%) ggc
 reload                :  19.17 ( 5%) usr   0.00 ( 0%) sys  19.17 ( 5%) wall    4694 kB ( 1%) ggc
 TOTAL                 : 413.29            10.95           424.28             709657 kB
Comment 116 Richard Biener 2010-03-27 11:14:45 UTC
Given that parsing takes most of the time the compile-time indeed looks
reasonable.  That DF uses >20GB of ram at -O3 is still unfortunate, but the
-O1 numbers look indeed good.

I wonder if the parsing numbers are accurate as the initial report has
like 9s parsing while the current ones are >200s.  Can you explain that
difference?  (like, were you testing different source?)

As is the testcase(s) are an interesting source of information - maybe we
should gather those up on a page in the wiki just in case we end up closing
this bug at some point (I suggest not to at the moment, the parsing times
look odd and >20GB memory use doesn't sound reasonable).  Did you ever
test other compilers and see how they perform with respect to memory usage
and compile time?
Comment 117 lucier 2010-03-27 16:38:18 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] Inordinate compile times on large routines


On Mar 27, 2010, at 7:14 AM, rguenth at gcc dot gnu dot org wrote:

> I wonder if the parsing numbers are accurate as the initial report has
> like 9s parsing while the current ones are >200s.  Can you explain  
> that
> difference?  (like, were you testing different source?)

Yes, different source (compiler.i instead of all.i), different  
(faster) machine.  Perhaps gathering the detailed memory stats affect  
the parser time.

Here are times for the original source file all.i using the same  
machine and compiler as in the immediately previous report for  
compiler.i:

  df live&initialized regs:  45.00 ( 8%) usr   0.00 ( 0%) sys  45.04  
( 8%) wall       0 kB ( 0%) ggc
  parser                :  19.60 ( 3%) usr   1.22 ( 7%) sys  21.25  
( 4%) wall   70217 kB ( 2%) ggc
  scheduling            : 301.86 (52%) usr   0.00 ( 0%) sys 301.87  
(51%) wall    8739 kB ( 0%) ggc
  TOTAL                 : 579.88            17.55            
597.65            3393985 kB

Glancing at top, the maximum reported memory usage was > 13GB.  I'll  
attach the detailed results for all.i next

> As is the testcase(s) are an interesting source of information -  
> maybe we
> should gather those up on a page in the wiki just in case we end up  
> closing
> this bug at some point (I suggest not to at the moment, the parsing  
> times
> look odd and >20GB memory use doesn't sound reasonable).  Did you ever
> test other compilers and see how they perform with respect to memory  
> usage
> and compile time?

No, none that were not a gcc derivative.

Brad



Comment 118 lucier 2010-03-27 16:44:51 UTC
Created attachment 20224 [details]
time/memory report compiling all.i with -O3

These are the detailed time and memory statistics reported when compiling all.i with -O3 -fschedule-insns on x86-64.
Comment 119 Peter Bergner 2010-04-29 14:34:59 UTC
Subject: Bug 26854

Author: bergner
Date: Thu Apr 29 14:34:35 2010
New Revision: 158902

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158902
Log:
	Backport from mainline.

	2009-08-30  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (get_reg_use_in): Delete.
	(free_load_extend): New function.
	(forward_propagate_subreg): Use it.

	2009-08-23  Alan Modra  <amodra@bigpond.net.au>

	PR target/41081
	* fwprop.c (try_fwprop_subst): Allow multiple sets.
	(get_reg_use_in): New function.
	(forward_propagate_subreg): Propagate through subreg of zero_extend
	or sign_extend.

	2009-05-08  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/33928
	PR 26854
	* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
	process_uses, build_single_def_use_links): New.
	(update_df): Update use_def_ref.
	(forward_propagate_into): Use get_def_for_use instead of use-def
	chains.
	(fwprop_init): Call build_single_def_use_links and let it initialize
	dataflow.
	(fwprop_done): Free use_def_ref.
	(fwprop_addr): Eliminate duplicate call to df_set_flags.
	* df-problems.c (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	(df_rd_bb_local_compute_process_def): Update head comment.
	(df_chain_create_bb): Use the new RD simulation functions.
	* df.h (df_rd_simulate_artificial_defs_at_top,
	df_rd_simulate_one_insn): New.
	* opts.c (decode_options): Enable fwprop at -O1.
	* doc/invoke.texi (-fforward-propagate): Document this.

Modified:
    branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_4-branch/gcc/df-problems.c
    branches/ibm/gcc-4_4-branch/gcc/df.h
    branches/ibm/gcc-4_4-branch/gcc/doc/invoke.texi
    branches/ibm/gcc-4_4-branch/gcc/fwprop.c
    branches/ibm/gcc-4_4-branch/gcc/opts.c

Comment 120 Richard Biener 2010-05-22 18:10:58 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 121 Richard Biener 2011-01-18 14:22:57 UTC
PR47344 now tracks the regression property of this bug.
Comment 122 Jan Hubicka 2011-01-18 14:48:32 UTC
oprofiling shows that 50% of parsing time is in decl_jump_unsafe that is C frontend thingy to output some sort of warnings on gotos to VLAs.  This can probably be solved quite easilly.

Later we get (at -O2 all.i)
83417    17.0179  cc1                      dominated_by_p
75164    15.3342  cc1                      bitmap_equal_p
38134     7.7797  cc1                      bitmap_set_bit
26144     5.3336  cc1                      bitmap_ior_into
21031     4.2905  cc1                      decl_jump_unsafe
16142     3.2931  cc1                      register_new_assert_for.isra.42
12713     2.5936  cc1                      bitmap_elt_insert_after
11136     2.2719  cc1                      sbitmap_a_or_b
10625     2.1676  cc1                      et_splay
10059     2.0521  cc1                      walk_dominator_tree
6775      1.3822  cc1                      dse_enter_block
5952      1.2143  cc1                      bitmap_bit_p

probably callgrinding to work out who is doing so many dominance tests might be enlightening. I get 5.3GB memory use at that point. 300MB of it seems to be GGC.

Later in RTL copmilation DCE seems to get stuck...
62815    22.8048  cc1                      mark_insn
40748    14.7934  cc1                      rest_of_handle_ud_dce
18558     6.7374  cc1                      bitmap_and_into
13046     4.7363  cc1                      bitmap_elt_insert_after
9955      3.6141  cc1                      bitmap_copy
8812      3.1992  cc1                      bitmap_ior_into
7271      2.6397  cc1                      bitmap_set_bit
6952      2.5239  cc1                      bitmap_ior_and_compl
6718      2.4389  cc1                      ira_compress_allocno_live_ranges
6370      2.3126  cc1                      create_start_finish_chains
6123      2.2229  cc1                      bitmap_and
4627      1.6798  cc1                      bitmap_and_compl_into
3688      1.3389  cc1                      regstat_compute_ri
3150      1.1436  cc1                      record_reg_classes.constprop.9

Finally at IRA time we get stuck in 
168451   33.5060  cc1                      ira_build_conflicts
46025     9.1547  cc1                      allocno_spill_priority_compare
18558     3.6913  cc1                      bitmap_and_into
13046     2.5949  cc1                      bitmap_elt_insert_after
12285     2.4436  cc1                      color_pass
9955      1.9801  cc1                      bitmap_copy
8812      1.7528  cc1                      bitmap_ior_into
7689      1.5294  cc1                      splay_tree_splay
Comment 123 Daniel Berlin 2011-01-18 14:54:33 UTC
On Tue, Jan 18, 2011 at 9:49 AM, hubicka at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854
>
> Jan Hubicka <hubicka at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |hubicka at gcc dot gnu.org
>
> --- Comment #122 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-18 14:48:32 UTC ---
> oprofiling shows that 50% of parsing time is in decl_jump_unsafe that is C
> frontend thingy to output some sort of warnings on gotos to VLAs.  This can
> probably be solved quite easilly.
>

> Later we get (at -O2 all.i)
> 83417    17.0179  cc1                      dominated_by_p
> 75164    15.3342  cc1                      bitmap_equal_p
> 38134     7.7797  cc1                      bitmap_set_bit
> 26144     5.3336  cc1                      bitmap_ior_into
> 21031     4.2905  cc1                      decl_jump_unsafe
> 16142     3.2931  cc1                      register_new_assert_for.isra.42
> 12713     2.5936  cc1                      bitmap_elt_insert_after
> 11136     2.2719  cc1                      sbitmap_a_or_b
> 10625     2.1676  cc1                      et_splay
> 10059     2.0521  cc1                      walk_dominator_tree
> 6775      1.3822  cc1                      dse_enter_block
> 5952      1.2143  cc1                      bitmap_bit_p


This looks suspiciously like it's not using the DFS numbers
Comment 124 Jan Hubicka 2011-01-18 15:15:01 UTC
> 
> This looks suspiciously like it's not using the DFS numbers
It seems that they are used, just we do a lot of queries from register_new_assert_for
according to my ^C GDB profiling.

Honza
> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
Comment 125 Daniel Berlin 2011-01-18 15:18:25 UTC
>
> --- Comment #124 from Jan Hubicka <hubicka at ucw dot cz> 2011-01-18 15:15:01 UTC ---
>>
>> This looks suspiciously like it's not using the DFS numbers
> It seems that they are used, just we do a lot of queries from
> register_new_assert_for
> according to my ^C GDB profiling.
>

Interesting, i wonder why et_splay shows up at all then.
Comment 126 Joseph S. Myers 2011-01-24 22:53:09 UTC
Ian added decl_jump_unsafe though I've no idea if the performance is actually worse or better than with my previous implementation (that he rewrote in order to add various -Wc++-compat cases).
Comment 127 Ian Lance Taylor 2011-01-24 23:37:02 UTC
We could skip some O(n squared) loops calling decl_jump_unsafe if we first check whether there are any decls for which decl_jump_unsafe is true.
Comment 128 ian@gcc.gnu.org 2011-01-26 01:26:52 UTC
Author: ian
Date: Wed Jan 26 01:26:48 2011
New Revision: 169267

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169267
Log:
	PR tree-optimization/26854
	* c-decl.c (struct c_scope): Add field has_jump_unsafe_decl.
	(decl_jump_unsafe): Move higher in file, with no other change.
	(bind): Set has_jump_unsafe_decl if appropriate.
	(update_label_decls): Test has_jump_unsafe_decl to avoid loop.
	(check_earlier_gotos): Likewise.
	(c_check_switch_jump_warnings): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/c-decl.c
Comment 129 Diego Novillo 2011-02-02 17:50:06 UTC
Author: dnovillo
Date: Wed Feb  2 17:49:54 2011
New Revision: 169601

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169601
Log:
	PR tree-optimization/26854
	* c-decl.c (struct c_scope): Add field has_jump_unsafe_decl.
	(decl_jump_unsafe): Move higher in file, with no other change.
	(bind): Set has_jump_unsafe_decl if appropriate.
	(update_label_decls): Test has_jump_unsafe_decl to avoid loop.
	(check_earlier_gotos): Likewise.
	(c_check_switch_jump_warnings): Likewise.

Modified:
    branches/google/integration/gcc/ChangeLog
    branches/google/integration/gcc/c-decl.c
Comment 130 Paolo Bonzini 2015-07-04 09:59:33 UTC
A late update...

all.i: with GCC 4.8.3 on a Xeon E5 v3 time is taken mostly by alias stmt walking

 alias stmt walking      : 272.52 (65%)  (-O2)
 alias stmt walking      : 116.06 (67%)  (-O1)

Requred memory is 700 MB.

With GCC 5.1, but on a Core i7 Ivy Bridge laptop so times are not comparable, time is also taken mostly by alias stmt walking:

 alias stmt walking      : 604.43 (54%) usr (-O1)

and memory usage is also around 700 MB.

Brad was using -fschedule-insns too, and it's pretty expensive:

 scheduling              : 430.61 (38%)    (-O1 -fschedule-insns, 5.1)
 scheduling              : 122.68 (41%)    (-O1 -fschedule-insns, 4.8.3)

It also brings the top memory usage to 1 GB.

---------------------

compile.i:

with GCC 4.8.3 time is taken mostly by scheduling and some tree passes (-O1):

 alias stmt walking      : 206.77 (36%)
 tree CFG cleanup        :  42.66 ( 7%) usr   0.02 ( 0%) sys  42.65 ( 7%) wall    1108 kB ( 0%) ggc
 dominator optimization  :  39.98 ( 7%) usr   0.04 ( 1%) sys  39.97 ( 7%) wall   23123 kB ( 3%) ggc

Required memory is around 1 GB.  I haven't tested 5.1 but the alias stmt walking seems to be a common feature of the Gambit testcases.

In both cases, memory usage is at least under control.  However, total compile time has regressed since the previous report in comment 115.  As expected walk_aliased_vdefs has a high % of time spent, but the time is spent in bitmap operations rather than the callbacks!

This is because the callback is the trivial mark_modified function.  The guilty walk_aliased_vdefs invocation is parm_ref_data_preserved_p, invoked from ipa_load_from_parm_agg and in turn from unmodified_parm_or_parm_agg_item.  Memoization via parms_ainfo seems like a plan, I'm opening a separate bug.