Bug 43627 - [4.5 Regression] slow compilation (tree canonical iv takes 75%)
Summary: [4.5 Regression] slow compilation (tree canonical iv takes 75%)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.0
: P2 normal
Target Milestone: 4.5.1
Assignee: Richard Biener
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: compile-time-hog, missed-optimization
Depends on:
Blocks: 41043
  Show dependency treegraph
 
Reported: 2010-04-02 08:14 UTC by Joost VandeVondele
Modified: 2010-04-15 13:47 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 4.4.3 4.5.1 4.6.0
Known to fail: 4.5.0
Last reconfirmed: 2010-04-02 14:23:22


Attachments
testcase (37.57 KB, application/octet-stream)
2010-04-02 08:16 UTC, Joost VandeVondele
Details
smaller testcase (needs 3s, 80% in tree canonical iv) (2.11 KB, text/plain)
2010-04-02 14:07 UTC, Joost VandeVondele
Details
reduced testcase (1.92 KB, text/plain)
2010-04-02 14:08 UTC, Richard Biener
Details
minimal patch (896 bytes, patch)
2010-04-02 15:10 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joost VandeVondele 2010-04-02 08:14:09 UTC
The to-be-attached file compiles very slowly with 4.5:

4.3 ([gcc-4_3-branch revision 135036]): 37s
4.4 ([gcc-4_4-branch revision 150482]): 30s
4.5 ([trunk revision 157940]):        6m35s

 gfortran -fbounds-check -g -O3 -ffast-math -funroll-loops -ftree-vectorize -march=native -c hog.f90
Comment 1 Joost VandeVondele 2010-04-02 08:16:17 UTC
Created attachment 20287 [details]
testcase

reproduce with 

gfortran -fbounds-check -g -O3 -ffast-math -funroll-loops -ftree-vectorize -march=native -c hog.f90
Comment 2 Joost VandeVondele 2010-04-02 08:27:40 UTC
And a timing report as well (notice the machine is not fully idle). The major consumer is tree canonical.

Execution times (seconds)
 garbage collection    :   7.71 ( 2%) usr   0.07 ( 4%) sys  14.12 ( 2%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.18 ( 0%) usr   0.01 ( 1%) sys   0.24 ( 0%) wall    6675 kB ( 1%) ggc
 callgraph optimization:   0.61 ( 0%) usr   0.03 ( 2%) sys   0.61 ( 0%) wall    1655 kB ( 0%) ggc
 ipa cp                :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall     539 kB ( 0%) ggc
 ipa reference         :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 ipa SRA               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 cfg cleanup           :   0.78 ( 0%) usr   0.01 ( 1%) sys   1.27 ( 0%) wall    3661 kB ( 0%) ggc
 CFG verifier          :   2.10 ( 1%) usr   0.00 ( 0%) sys   3.40 ( 1%) wall       0 kB ( 0%) ggc
 trivially dead code   :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall       0 kB ( 0%) ggc
 df multiple defs      :   0.59 ( 0%) usr   0.00 ( 0%) sys   0.92 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs      :   0.86 ( 0%) usr   0.00 ( 0%) sys   1.83 ( 0%) wall       0 kB ( 0%) ggc
 df live regs          :   4.92 ( 1%) usr   0.01 ( 1%) sys   8.23 ( 1%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   1.48 ( 0%) usr   0.01 ( 1%) sys   3.37 ( 1%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.71 ( 0%) usr   0.00 ( 0%) sys   1.39 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   4.15 ( 1%) usr   0.01 ( 1%) sys   7.47 ( 1%) wall    9314 kB ( 1%) ggc
 register information  :   1.29 ( 0%) usr   0.01 ( 1%) sys   3.00 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.64 ( 0%) usr   0.00 ( 0%) sys   0.74 ( 0%) wall   21770 kB ( 3%) ggc
 alias stmt walking    :   1.94 ( 1%) usr   0.06 ( 4%) sys   3.50 ( 1%) wall       0 kB ( 0%) ggc
 register scan         :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       0 kB ( 0%) ggc
 rebuild jump labels   :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   1.27 ( 0%) usr   0.12 ( 7%) sys   1.50 ( 0%) wall   42200 kB ( 5%) ggc
 inline heuristics     :   0.43 ( 0%) usr   0.02 ( 1%) sys   0.34 ( 0%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.69 ( 0%) usr   0.03 ( 2%) sys   0.79 ( 0%) wall   52375 kB ( 6%) ggc
 tree eh               :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    9418 kB ( 1%) ggc
 tree CFG cleanup      :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.80 ( 0%) wall     418 kB ( 0%) ggc
 tree VRP              :   2.08 ( 1%) usr   0.05 ( 3%) sys   3.67 ( 1%) wall   54923 kB ( 7%) ggc
 tree copy propagation :   0.37 ( 0%) usr   0.00 ( 0%) sys   0.59 ( 0%) wall     237 kB ( 0%) ggc
 tree find ref. vars   :   0.07 ( 0%) usr   0.02 ( 1%) sys   0.09 ( 0%) wall    3774 kB ( 0%) ggc
 tree PTA              :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall     425 kB ( 0%) ggc
 tree PHI insertion    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     315 kB ( 0%) ggc
 tree SSA rewrite      :   0.44 ( 0%) usr   0.03 ( 2%) sys   0.80 ( 0%) wall   20682 kB ( 3%) ggc
 tree SSA other        :   0.22 ( 0%) usr   0.02 ( 1%) sys   0.23 ( 0%) wall     434 kB ( 0%) ggc
 tree SSA incremental  :   0.62 ( 0%) usr   0.04 ( 2%) sys   0.91 ( 0%) wall     438 kB ( 0%) ggc
 tree operand scan     :   0.27 ( 0%) usr   0.14 ( 8%) sys   0.53 ( 0%) wall   21791 kB ( 3%) ggc
 dominator optimization:   0.42 ( 0%) usr   0.00 ( 0%) sys   0.72 ( 0%) wall    4190 kB ( 1%) ggc
 tree CCP              :   0.56 ( 0%) usr   0.01 ( 1%) sys   0.70 ( 0%) wall    3081 kB ( 0%) ggc
 tree PHI const/copy prop:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      22 kB ( 0%) ggc
 tree split crit edges :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall    3268 kB ( 0%) ggc
 tree reassociation    :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall     161 kB ( 0%) ggc
 tree PRE              :   6.54 ( 2%) usr   0.02 ( 1%) sys  11.71 ( 2%) wall   25200 kB ( 3%) ggc
 tree FRE              :   0.76 ( 0%) usr   0.03 ( 2%) sys   1.15 ( 0%) wall    8100 kB ( 1%) ggc
 tree code sinking     :   0.23 ( 0%) usr   0.04 ( 2%) sys   0.44 ( 0%) wall   12275 kB ( 2%) ggc
 tree linearize phis   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.19 ( 0%) usr   0.01 ( 1%) sys   0.25 ( 0%) wall    9572 kB ( 1%) ggc
 tree phiprop          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.19 ( 0%) usr   0.02 ( 1%) sys   0.51 ( 0%) wall      17 kB ( 0%) ggc
 tree aggressive DCE   :   0.49 ( 0%) usr   0.01 ( 1%) sys   0.74 ( 0%) wall    2998 kB ( 0%) ggc
 tree buildin call DCE :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      27 kB ( 0%) ggc
 tree loop bounds      :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.47 ( 0%) wall    6310 kB ( 1%) ggc
 tree loop invariant motion:   0.29 ( 0%) usr   0.01 ( 1%) sys   0.45 ( 0%) wall     498 kB ( 0%) ggc
 tree canonical iv     : 230.79 (62%) usr   0.10 ( 6%) sys 393.03 (61%) wall  146373 kB (18%) ggc
 scev constant prop    :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall    5809 kB ( 1%) ggc
 tree loop unswitching :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 complete unrolling    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall    1123 kB ( 0%) ggc
 tree vectorization    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      40 kB ( 0%) ggc
 tree slp vectorization:   0.48 ( 0%) usr   0.00 ( 0%) sys   0.83 ( 0%) wall   19329 kB ( 2%) ggc
 tree iv optimization  :   0.59 ( 0%) usr   0.00 ( 0%) sys   0.77 ( 0%) wall   13315 kB ( 2%) ggc
 predictive commoning  :   1.44 ( 0%) usr   0.00 ( 0%) sys   2.29 ( 0%) wall   40577 kB ( 5%) ggc
 tree loop init        :   0.17 ( 0%) usr   0.01 ( 1%) sys   0.31 ( 0%) wall    5246 kB ( 1%) ggc
 tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree copy headers     :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall     758 kB ( 0%) ggc
 tree SSA uncprop      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA verifier     :   9.57 ( 3%) usr   0.01 ( 1%) sys  15.09 ( 2%) wall       0 kB ( 0%) ggc
 tree STMT verifier    :  18.08 ( 5%) usr   0.10 ( 6%) sys  30.59 ( 5%) wall       0 kB ( 0%) ggc
 tree switch initialization conversion:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 callgraph verifier    :   1.64 ( 0%) usr   0.00 ( 0%) sys   1.83 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.58 ( 0%) usr   0.00 ( 0%) sys   0.84 ( 0%) wall       0 kB ( 0%) ggc
 control dependences   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   8.51 ( 2%) usr   0.05 ( 3%) sys  15.28 ( 2%) wall   76554 kB ( 9%) ggc
 jump                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 forward prop          :   1.18 ( 0%) usr   0.00 ( 0%) sys   2.75 ( 0%) wall    6749 kB ( 1%) ggc
 CSE                   :   1.51 ( 0%) usr   0.01 ( 1%) sys   2.73 ( 0%) wall    1375 kB ( 0%) ggc
 dead code elimination :   0.73 ( 0%) usr   0.00 ( 0%) sys   1.60 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1      :   0.75 ( 0%) usr   0.01 ( 1%) sys   1.18 ( 0%) wall    5337 kB ( 1%) ggc
 dead store elim2      :   1.39 ( 0%) usr   0.00 ( 0%) sys   2.67 ( 0%) wall    6079 kB ( 1%) ggc
 loop analysis         :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.06 ( 0%) wall      61 kB ( 0%) ggc
 loop invariant motion :   0.10 ( 0%) usr   0.01 ( 1%) sys   0.16 ( 0%) wall       1 kB ( 0%) ggc
 loop unswitching      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 loop unrolling        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     190 kB ( 0%) ggc
 CPROP                 :   1.05 ( 0%) usr   0.00 ( 0%) sys   1.94 ( 0%) wall    7896 kB ( 1%) ggc
 PRE                   :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall     882 kB ( 0%) ggc
 web                   :   1.08 ( 0%) usr   0.00 ( 0%) sys   1.81 ( 0%) wall      23 kB ( 0%) ggc
 CSE 2                 :   1.53 ( 0%) usr   0.00 ( 0%) sys   2.51 ( 0%) wall     793 kB ( 0%) ggc
 branch prediction     :   0.14 ( 0%) usr   0.01 ( 1%) sys   0.25 ( 0%) wall    4053 kB ( 0%) ggc
 combiner              :   2.39 ( 1%) usr   0.02 ( 1%) sys   4.13 ( 1%) wall   26323 kB ( 3%) ggc
 if-conversion         :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     130 kB ( 0%) ggc
 regmove               :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.47 ( 0%) wall       4 kB ( 0%) ggc
 integrated RA         :   8.51 ( 2%) usr   0.01 ( 1%) sys  14.18 ( 2%) wall    8933 kB ( 1%) ggc
 reload                :   1.93 ( 1%) usr   0.04 ( 2%) sys   3.31 ( 1%) wall    1774 kB ( 0%) ggc
 reload CSE regs       :   0.80 ( 0%) usr   0.01 ( 1%) sys   1.54 ( 0%) wall    9904 kB ( 1%) ggc
 load CSE after reload :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall       0 kB ( 0%) ggc
 thread pro- & epilogue:   0.14 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) wall     572 kB ( 0%) ggc
 if-conversion 2       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      59 kB ( 0%) ggc
 combine stack adjustments:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.56 ( 0%) wall    2057 kB ( 0%) ggc
 rename registers      :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.85 ( 0%) wall     701 kB ( 0%) ggc
 hard reg cprop        :   0.64 ( 0%) usr   0.00 ( 0%) sys   1.03 ( 0%) wall      35 kB ( 0%) ggc
 scheduling 2          :   1.70 ( 0%) usr   0.03 ( 2%) sys   3.15 ( 0%) wall     257 kB ( 0%) ggc
 machine dep reorg     :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.41 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks        :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall    2145 kB ( 0%) ggc
 final                 :   0.91 ( 0%) usr   0.03 ( 2%) sys   1.67 ( 0%) wall    5904 kB ( 1%) ggc
 symout                :   0.47 ( 0%) usr   0.07 ( 4%) sys   1.15 ( 0%) wall   50781 kB ( 6%) ggc
 variable tracking     :  26.64 ( 7%) usr   0.32 (19%) sys  48.05 ( 7%) wall   38563 kB ( 5%) ggc
 plugin execution      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 374.92             1.71           641.15             817719 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    10m46.566s
user    6m17.140s
Comment 3 Steven Bosscher 2010-04-02 09:18:29 UTC
This tells me you are comparing apples and cows: "Extra diagnostic checks enabled; compiler may run slowly."

Could you try again with a compiler configured with --enable=checking=release?
Comment 4 Joost VandeVondele 2010-04-02 09:26:30 UTC
(In reply to comment #3)
> This tells me you are comparing apples and cows: "Extra diagnostic checks
> enabled; compiler may run slowly."
> 
> Could you try again with a compiler configured with --enable=checking=release?
> 

I'll do now...

for reference, 4.4 has:

> gfortran -ftime-report -fbounds-check -g -O3 -ffast-math -funroll-loops -ftree-vectorize -march=native hog.f90

Execution times (seconds)
 garbage collection    :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.33 ( 1%) usr   0.03 ( 4%) sys   0.33 ( 1%) wall    9447 kB ( 2%) ggc
 callgraph optimization:   0.46 ( 2%) usr   0.01 ( 1%) sys   0.50 ( 2%) wall     239 kB ( 0%) ggc
 ipa cp                :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall       0 kB ( 0%) ggc
 ipa reference         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 cfg cleanup           :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     914 kB ( 0%) ggc
 trivially dead code   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 df live regs          :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   0.10 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.17 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall    3443 kB ( 1%) ggc
 register information  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall    6273 kB ( 1%) ggc
 register scan         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 rebuild jump labels   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   1.27 ( 5%) usr   0.12 (16%) sys   1.31 ( 5%) wall   50936 kB ( 9%) ggc
 inline heuristics     :   0.13 ( 1%) usr   0.05 ( 6%) sys   0.25 ( 1%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.44 ( 2%) usr   0.04 ( 5%) sys   0.54 ( 2%) wall   61550 kB (11%) ggc
 tree eh               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    9734 kB ( 2%) ggc
 tree CFG cleanup      :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall     668 kB ( 0%) ggc
 tree VRP              :   1.21 ( 5%) usr   0.03 ( 4%) sys   1.26 ( 5%) wall   42193 kB ( 8%) ggc
 tree copy propagation :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall     315 kB ( 0%) ggc
 tree find ref. vars   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    8937 kB ( 2%) ggc
 tree PTA              :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall     758 kB ( 0%) ggc
 tree alias analysis   :   0.12 ( 0%) usr   0.05 ( 6%) sys   0.12 ( 0%) wall      77 kB ( 0%) ggc
 tree call clobbering  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      18 kB ( 0%) ggc
 tree flow sensitive alias:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     121 kB ( 0%) ggc
 tree flow insensitive alias:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree memory partitioning:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      21 kB ( 0%) ggc
 tree PHI insertion    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     201 kB ( 0%) ggc
 tree SSA rewrite      :   0.17 ( 1%) usr   0.01 ( 1%) sys   0.13 ( 1%) wall   19668 kB ( 4%) ggc
 tree SSA other        :   0.11 ( 0%) usr   0.03 ( 4%) sys   0.18 ( 1%) wall     360 kB ( 0%) ggc
 tree SSA incremental  :   0.24 ( 1%) usr   0.02 ( 3%) sys   0.25 ( 1%) wall      40 kB ( 0%) ggc
 tree operand scan     :   0.36 ( 1%) usr   0.15 (19%) sys   0.58 ( 2%) wall   27070 kB ( 5%) ggc
 dominator optimization:   0.26 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall    2270 kB ( 0%) ggc
 tree SRA              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP              :   0.33 ( 1%) usr   0.01 ( 1%) sys   0.24 ( 1%) wall    4060 kB ( 1%) ggc
 tree reassociation    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall     124 kB ( 0%) ggc
 tree PRE              :   5.18 (21%) usr   0.05 ( 6%) sys   5.07 (20%) wall   87699 kB (16%) ggc
 tree FRE              :   0.51 ( 2%) usr   0.00 ( 0%) sys   0.55 ( 2%) wall    7664 kB ( 1%) ggc
 tree code sinking     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      75 kB ( 0%) ggc
 tree linearize phis   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.11 ( 0%) usr   0.02 ( 3%) sys   0.11 ( 0%) wall   11274 kB ( 2%) ggc
 tree phiprop          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       2 kB ( 0%) ggc
 tree aggressive DCE   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       5 kB ( 0%) ggc
 tree DSE              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall      65 kB ( 0%) ggc
 tree loop bounds      :   0.23 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    5701 kB ( 1%) ggc
 loop invariant motion :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      23 kB ( 0%) ggc
 scev constant prop    :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     520 kB ( 0%) ggc
 tree loop unswitching :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 complete unrolling    :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    1121 kB ( 0%) ggc
 tree iv optimization  :   2.30 (10%) usr   0.01 ( 1%) sys   2.33 ( 9%) wall   34677 kB ( 6%) ggc
 predictive commoning  :   0.94 ( 4%) usr   0.02 ( 3%) sys   1.00 ( 4%) wall   42843 kB ( 8%) ggc
 tree loop init        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     232 kB ( 0%) ggc
 tree SSA uncprop      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    :   0.11 ( 0%) usr   0.02 ( 3%) sys   0.22 ( 1%) wall   17790 kB ( 3%) ggc
 tree rename SSA copies:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   0.87 ( 4%) usr   0.01 ( 1%) sys   0.89 ( 4%) wall   27910 kB ( 5%) ggc
 forward prop          :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    3922 kB ( 1%) ggc
 CSE                   :   0.53 ( 2%) usr   0.01 ( 1%) sys   0.65 ( 3%) wall     639 kB ( 0%) ggc
 dead code elimination :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall    2892 kB ( 1%) ggc
 dead store elim2      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall    2730 kB ( 1%) ggc
 loop analysis         :   0.03 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall     389 kB ( 0%) ggc
 CPROP 1               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     430 kB ( 0%) ggc
 PRE                   :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.11 ( 0%) wall      15 kB ( 0%) ggc
 CPROP 2               :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1014 kB ( 0%) ggc
 bypass jumps          :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    1039 kB ( 0%) ggc
 web                   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      14 kB ( 0%) ggc
 CSE 2                 :   0.20 ( 1%) usr   0.00 ( 0%) sys   0.25 ( 1%) wall     228 kB ( 0%) ggc
 branch prediction     :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall    4586 kB ( 1%) ggc
 combiner              :   0.79 ( 3%) usr   0.01 ( 1%) sys   0.78 ( 3%) wall   15629 kB ( 3%) ggc
 if-conversion         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     140 kB ( 0%) ggc
 regmove               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :   0.81 ( 3%) usr   0.00 ( 0%) sys   1.02 ( 4%) wall    2360 kB ( 0%) ggc
 reload                :   0.57 ( 2%) usr   0.00 ( 0%) sys   0.43 ( 2%) wall    2090 kB ( 0%) ggc
 reload CSE regs       :   0.39 ( 2%) usr   0.01 ( 1%) sys   0.43 ( 2%) wall    4804 kB ( 1%) ggc
 load CSE after reload :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall      31 kB ( 0%) ggc
 thread pro- & epilogue:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     451 kB ( 0%) ggc
 if-conversion 2       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      59 kB ( 0%) ggc
 peephole 2            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      81 kB ( 0%) ggc
 rename registers      :   0.35 ( 1%) usr   0.00 ( 0%) sys   0.36 ( 1%) wall     379 kB ( 0%) ggc
 scheduling 2          :   0.51 ( 2%) usr   0.00 ( 0%) sys   0.56 ( 2%) wall     342 kB ( 0%) ggc
 machine dep reorg     :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       2 kB ( 0%) ggc
 reorder blocks        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     129 kB ( 0%) ggc
 final                 :   0.16 ( 1%) usr   0.03 ( 4%) sys   0.23 ( 1%) wall     745 kB ( 0%) ggc
 symout                :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall    3436 kB ( 1%) ggc
 variable tracking     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall    1278 kB ( 0%) ggc
 TOTAL                 :  24.14             0.77            24.91             537367 kB
/data03/vondele/gcc_4_4_branch/build/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/libgfortranbegin.a(fmain.o): In function `main':
/data03/vondele/gcc_4_4_branch/gcc/libgfortran/fmain.c:21: undefined reference to `MAIN__'
collect2: ld returned 1 exit status

Comment 5 Joost VandeVondele 2010-04-02 09:47:22 UTC
(In reply to comment #3)

cows with cows now (i.e. --enable-checking=release), on an idle machine.

Execution times (seconds)
 garbage collection    :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.11 ( 0%) usr   0.01 ( 1%) sys   0.12 ( 0%) wall    5939 kB ( 1%) ggc
 callgraph optimization:   0.29 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall     184 kB ( 0%) ggc
 ipa cp                :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall     539 kB ( 0%) ggc
 ipa reference         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc
 cfg cleanup           :   0.67 ( 0%) usr   0.00 ( 0%) sys   0.83 ( 0%) wall    3661 kB ( 1%) ggc
 trivially dead code   :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 df multiple defs      :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs      :   0.69 ( 0%) usr   0.00 ( 0%) sys   0.65 ( 0%) wall       0 kB ( 0%) ggc
 df live regs          :   3.08 ( 1%) usr   0.00 ( 0%) sys   3.07 ( 1%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   1.17 ( 0%) usr   0.00 ( 0%) sys   1.07 ( 0%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.53 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   2.50 ( 1%) usr   0.00 ( 0%) sys   2.73 ( 1%) wall    9314 kB ( 1%) ggc
 register information  :   1.05 ( 0%) usr   0.00 ( 0%) sys   0.84 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.58 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) wall   21770 kB ( 3%) ggc
 alias stmt walking    :   1.29 ( 0%) usr   0.04 ( 4%) sys   1.36 ( 0%) wall       0 kB ( 0%) ggc
 register scan         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 rebuild jump labels   :   0.21 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   1.15 ( 0%) usr   0.12 (11%) sys   1.26 ( 0%) wall   42200 kB ( 6%) ggc
 inline heuristics     :   0.24 ( 0%) usr   0.01 ( 1%) sys   0.24 ( 0%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.43 ( 0%) usr   0.05 ( 4%) sys   0.47 ( 0%) wall   52375 kB ( 8%) ggc
 tree eh               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall    9418 kB ( 1%) ggc
 tree CFG cleanup      :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.46 ( 0%) wall     418 kB ( 0%) ggc
 tree VRP              :   1.57 ( 1%) usr   0.06 ( 5%) sys   1.60 ( 1%) wall   54731 kB ( 8%) ggc
 tree copy propagation :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) wall     237 kB ( 0%) ggc
 tree find ref. vars   :   0.03 ( 0%) usr   0.01 ( 1%) sys   0.10 ( 0%) wall    3774 kB ( 1%) ggc
 tree PTA              :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     423 kB ( 0%) ggc
 tree PHI insertion    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     315 kB ( 0%) ggc
 tree SSA rewrite      :   0.24 ( 0%) usr   0.02 ( 2%) sys   0.19 ( 0%) wall   20682 kB ( 3%) ggc
 tree SSA other        :   0.10 ( 0%) usr   0.04 ( 4%) sys   0.19 ( 0%) wall     434 kB ( 0%) ggc
 tree SSA incremental  :   0.56 ( 0%) usr   0.02 ( 2%) sys   0.66 ( 0%) wall     438 kB ( 0%) ggc
 tree operand scan     :   0.21 ( 0%) usr   0.20 (18%) sys   0.42 ( 0%) wall   21791 kB ( 3%) ggc
 dominator optimization:   0.35 ( 0%) usr   0.01 ( 1%) sys   0.36 ( 0%) wall    4189 kB ( 1%) ggc
 tree SRA              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree CCP              :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall    3081 kB ( 0%) ggc
 tree PHI const/copy prop:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      22 kB ( 0%) ggc
 tree split crit edges :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    3265 kB ( 0%) ggc
 tree reassociation    :   0.12 ( 0%) usr   0.01 ( 1%) sys   0.11 ( 0%) wall     161 kB ( 0%) ggc
 tree PRE              :   4.88 ( 2%) usr   0.00 ( 0%) sys   4.89 ( 2%) wall   25200 kB ( 4%) ggc
 tree FRE              :   0.65 ( 0%) usr   0.02 ( 2%) sys   0.67 ( 0%) wall    8099 kB ( 1%) ggc
 tree code sinking     :   0.16 ( 0%) usr   0.05 ( 4%) sys   0.17 ( 0%) wall   12275 kB ( 2%) ggc
 tree linearize phis   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree forward propagate:   0.14 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall    9572 kB ( 1%) ggc
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.21 ( 0%) usr   0.03 ( 3%) sys   0.15 ( 0%) wall      17 kB ( 0%) ggc
 tree aggressive DCE   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall    2998 kB ( 0%) ggc
 tree DSE              :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall      26 kB ( 0%) ggc
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       5 kB ( 0%) ggc
 tree loop bounds      :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall    6263 kB ( 1%) ggc
 tree loop invariant motion:   0.19 ( 0%) usr   0.01 ( 1%) sys   0.19 ( 0%) wall     497 kB ( 0%) ggc
 tree canonical iv     : 223.30 (75%) usr   0.01 ( 1%) sys 223.28 (75%) wall   21873 kB ( 3%) ggc
 scev constant prop    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    5809 kB ( 1%) ggc
 tree loop unswitching :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 complete unrolling    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    1123 kB ( 0%) ggc
 tree slp vectorization:   0.38 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall   19328 kB ( 3%) ggc
 tree iv optimization  :   0.39 ( 0%) usr   0.00 ( 0%) sys   0.47 ( 0%) wall   13309 kB ( 2%) ggc
 predictive commoning  :   1.13 ( 0%) usr   0.01 ( 1%) sys   1.17 ( 0%) wall   40528 kB ( 6%) ggc
 tree loop init        :   0.13 ( 0%) usr   0.01 ( 1%) sys   0.07 ( 0%) wall    5208 kB ( 1%) ggc
 tree copy headers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     758 kB ( 0%) ggc
 tree SSA uncprop      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies:   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall       0 kB ( 0%) ggc
 control dependences   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   2.97 ( 1%) usr   0.04 ( 4%) sys   3.11 ( 1%) wall   76883 kB (11%) ggc
 lower subreg          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 forward prop          :   0.89 ( 0%) usr   0.00 ( 0%) sys   0.85 ( 0%) wall    6749 kB ( 1%) ggc
 CSE                   :   1.46 ( 0%) usr   0.01 ( 1%) sys   1.51 ( 1%) wall    1369 kB ( 0%) ggc
 dead code elimination :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1      :   0.60 ( 0%) usr   0.00 ( 0%) sys   0.44 ( 0%) wall    5337 kB ( 1%) ggc
 dead store elim2      :   0.48 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall    6072 kB ( 1%) ggc
 loop invariant motion :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       1 kB ( 0%) ggc
 loop unswitching      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 loop unrolling        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     190 kB ( 0%) ggc
 CPROP                 :   0.84 ( 0%) usr   0.02 ( 2%) sys   0.81 ( 0%) wall    7746 kB ( 1%) ggc
 PRE                   :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall     777 kB ( 0%) ggc
 web                   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall      16 kB ( 0%) ggc
 CSE 2                 :   1.42 ( 0%) usr   0.00 ( 0%) sys   1.54 ( 1%) wall     793 kB ( 0%) ggc
 branch prediction     :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall    4053 kB ( 1%) ggc
 combiner              :   2.05 ( 1%) usr   0.02 ( 2%) sys   2.10 ( 1%) wall   26058 kB ( 4%) ggc
 if-conversion         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     130 kB ( 0%) ggc
 regmove               :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall       4 kB ( 0%) ggc
 integrated RA         :   4.46 ( 1%) usr   0.00 ( 0%) sys   4.24 ( 1%) wall    8905 kB ( 1%) ggc
 reload                :   1.47 ( 0%) usr   0.00 ( 0%) sys   1.55 ( 1%) wall    1737 kB ( 0%) ggc
 reload CSE regs       :   0.73 ( 0%) usr   0.01 ( 1%) sys   0.76 ( 0%) wall    9904 kB ( 1%) ggc
 load CSE after reload :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 thread pro- & epilogue:   0.09 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall     572 kB ( 0%) ggc
 if-conversion 2       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      59 kB ( 0%) ggc
 combine stack adjustments:   0.07 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall    2057 kB ( 0%) ggc
 rename registers      :   0.48 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 0%) wall     701 kB ( 0%) ggc
 hard reg cprop        :   0.29 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall      35 kB ( 0%) ggc
 scheduling 2          :   1.42 ( 0%) usr   0.00 ( 0%) sys   1.42 ( 0%) wall     222 kB ( 0%) ggc
 machine dep reorg     :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall       0 kB ( 0%) ggc
 reorder blocks        :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall    2144 kB ( 0%) ggc
 final                 :   0.56 ( 0%) usr   0.06 ( 5%) sys   0.76 ( 0%) wall    5904 kB ( 1%) ggc
 symout                :   0.39 ( 0%) usr   0.06 ( 5%) sys   0.44 ( 0%) wall   50781 kB ( 7%) ggc
 variable tracking     :  23.48 ( 8%) usr   0.17 (15%) sys  23.48 ( 8%) wall   38556 kB ( 6%) ggc
 plugin execution      :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 298.36             1.14           299.51             690347 kB
COLLECT_GCC_OPTIONS='-v' '-ftime-report' '-fbounds-check' '-g' '-O3' '-ffast-math' '-funroll-loops' '-ftree-vectorize'  '-c'
 as -V -Qy --64 -o hog.o /tmp/cclB9I15.s
GNU assembler version 2.18.50 (x86_64-suse-linux) using BFD version (GNU Binutils; openSUSE 11.0) 2.18.50.20080409-11.1
COMPILER_PATH=/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/:/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/:/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/:/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/:/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/
LIBRARY_PATH=/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/:/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-ftime-report' '-fbounds-check' '-g' '-O3' '-ffast-math' '-funroll-loops' '-ftree-vectorize'  '-c'


Comment 6 Richard Biener 2010-04-02 12:19:05 UTC
The issue is for certain the many manually unrolled loops and possibly the
new autoinc code.

What's your native arch?  I can't reproduce this on a core i?86.
Comment 7 Joost VandeVondele 2010-04-02 12:28:51 UTC
(In reply to comment #6)
> What's your native arch?  I can't reproduce this on a core i?86.

-v output:

 /data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/f951 hog.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase hog.f90 -auxbase hog -g -O3 -version -fbounds-check -ffast-math -funroll-loops -ftree-vectorize -fintrinsic-modules-path /data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/finclude -o /tmp/ccA2YvFn.s
Comment 8 Richard Biener 2010-04-02 14:07:05 UTC
Confirmed on x86_64-linux with -O2 -fbounds-check.

find_loop_niter_by_eval takes a lot of time in each of the ints2bits_*
routines because the loops have a lot of exits (due to -fbounds-check).
Comment 9 Joost VandeVondele 2010-04-02 14:07:23 UTC
Created attachment 20290 [details]
smaller testcase (needs 3s, 80% in tree canonical iv)
Comment 10 Richard Biener 2010-04-02 14:08:31 UTC
Created attachment 20291 [details]
reduced testcase
Comment 11 Richard Biener 2010-04-02 14:13:11 UTC
Compared to 4.4 we no longer eliminate most of the bound checks in 4.5.
Comment 12 Joost VandeVondele 2010-04-02 14:17:17 UTC
(In reply to comment #9)
> Created an attachment (id=20290) [edit]
> smaller testcase (needs 3s, 80% in tree canonical iv)

from valgrind, I see some 13000000 cals to get_val_for / fold_binary_loc, for the small testcase

Comment 13 Richard Biener 2010-04-02 14:23:22 UTC
Testcase for that:

MODULE hfx_compression_core_methods

  IMPLICIT NONE

  INTEGER, PARAMETER :: int_8=8

  CONTAINS

  SUBROUTINE ints2bits_3(Ndata,packed_data,full_data)
    INTEGER, INTENT(IN)                      :: Ndata
    INTEGER(KIND=int_8), INTENT(OUT)         :: packed_data(*)
    INTEGER(KIND=int_8), INTENT(IN)          :: full_data(*)

    INTEGER, PARAMETER                       :: Nbits = 3

    INTEGER                                  :: idata, ipack, kdata, Ndata_rep
    INTEGER(KIND=int_8)                      :: data_tmp, pack_tmp

   idata=0
   ipack=0
   Ndata_rep=(Ndata/2)*2
   DO kdata=1,Ndata_rep,2
   pack_tmp=0
     idata=idata+1
        data_tmp = full_data(idata)
        data_tmp = ISHFT(data_tmp,61)
        pack_tmp = IOR(pack_tmp,data_tmp)
        pack_tmp = ISHFT(pack_tmp,-3)
     idata=idata+1
        data_tmp = full_data(idata)
        data_tmp = ISHFT(data_tmp,61)
        pack_tmp = IOR(pack_tmp,data_tmp)
        pack_tmp = ISHFT(pack_tmp,0)
   pack_tmp = ISHFT(pack_tmp,0)
   ipack = ipack + 1
   packed_data(ipack) = pack_tmp
   ENDDO
  END SUBROUTINE ints2bits_3

END MODULE hfx_compression_core_methods


likely caused by

2010-02-16  Richard Guenther  <rguenther@suse.de>

        PR tree-optimization/41043
        * tree-vrp.c  (vrp_var_may_overflow): Only ask SCEV for real loops.
        (vrp_visit_assignment_or_call): Do not ask SCEV for regular
        statements ...
        (vrp_visit_phi_node): ... but only for loop PHI nodes.
Comment 14 Richard Biener 2010-04-02 14:26:58 UTC
Interestingly it works on i?86 ...
Comment 15 Richard Biener 2010-04-02 14:39:17 UTC
C testcase for the missed VRP, fails with long on x86_64 only, with
long long also on i?86:

extern void link_error (void) __attribute__((noreturn));
int n;
float *x;
int main()
{
  if (n > 0)
    {
      int i = 0;
      do
        {
          long index;
          i = i + 1;
          index = i;
          if (index <= 0)
            link_error ();
          x[index] = 0;
          i = i + 1;
          index = i;
          if (index <= 0)
            link_error ();
          x[index] = 0;
        }
      while (i < n);
    }
}
Comment 16 Richard Biener 2010-04-02 14:53:07 UTC
It's the strict-overflow stuff that cripples VRP again here.  I have a kludge.
Comment 17 Richard Biener 2010-04-02 15:10:32 UTC
Created attachment 20292 [details]
minimal patch

I'm testing this minimal patch.
Comment 18 Richard Biener 2010-04-06 11:21:07 UTC
GCC 4.5.0 is being released.  Deferring to 4.5.1.
Comment 19 Richard Biener 2010-04-06 12:32:52 UTC
Subject: Bug 43627

Author: rguenth
Date: Tue Apr  6 12:32:25 2010
New Revision: 157992

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157992
Log:
2010-04-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/43627
	* tree-vrp.c (extract_range_from_unary_expr): Widenings
	of [1, +INF(OVF)] go to [1, +INF(OVF)] of the wider type,
	not varying.

	* gcc.dg/tree-ssa/vrp49.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/vrp49.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vrp.c

Comment 20 Richard Biener 2010-04-06 12:33:00 UTC
Fixed on trunk sofar.  Queued for 4.5.1.
Comment 21 Richard Biener 2010-04-15 13:47:02 UTC
Subject: Bug 43627

Author: rguenth
Date: Thu Apr 15 13:46:42 2010
New Revision: 158377

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158377
Log:
2010-04-15  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/43627
	* tree-vrp.c (extract_range_from_unary_expr): Widenings
	of [1, +INF(OVF)] go to [1, +INF(OVF)] of the wider type,
	not varying.

	* gcc.dg/tree-ssa/vrp49.c: New testcase.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.dg/tree-ssa/vrp49.c
Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/gcc/tree-vrp.c

Comment 22 Richard Biener 2010-04-15 13:47:10 UTC
Fixed.