Optimize df_worklist_dataflow

Jan Hubicka hubicka@ucw.cz
Tue Jun 22 16:37:00 GMT 2010


>
> If Steven doesn't complain, go ahead.

Hi,
thanks, I've commited it now.  Can track Steven's comments incrementally, if any.
I would like to experiment with optimizing the dataflow worklist implementation now
(i.e. stealing a bit from age to make it cheap to test if the BB is already in queue
and perhaps switch from bitmap to array as worklist implementation).

The WHOPR build time went down to 9m36s now, 10s is due to Jakub's genattrtab
change, 16% improvmenet since the first WHOPR builds.

With inlining bitset/test I can get to 9m32s but I would first like to figure
out if some of most abusive bitmap users can't be better switched to something
else.

The following are passes taking over 1% of time on CC1 LTO link:

 garbage collection    :  11.32 ( 2%) usr   0.24 ( 3%) sys  11.59 ( 2%) wall       0 kB ( 0%) ggc
 ipa lto gimple in     :   7.23 ( 1%) usr   0.77 ( 9%) sys   8.68 ( 2%) wall  878440 kB (29%) ggc
 ipa lto decl in       :   4.77 ( 1%) usr   0.21 ( 3%) sys   4.98 ( 1%) wall  248056 kB ( 8%) ggc
 cfg cleanup           :   9.17 ( 2%) usr   0.02 ( 0%) sys   9.42 ( 2%) wall   30639 kB ( 1%) ggc
 trivially dead code   :   3.19 ( 1%) usr   0.00 ( 0%) sys   2.96 ( 1%) wall       0 kB ( 0%) ggc
 df reaching defs      :   4.64 ( 1%) usr   0.03 ( 0%) sys   4.75 ( 1%) wall       0 kB ( 0%) ggc
 df live regs          :  24.17 ( 5%) usr   0.02 ( 0%) sys  24.22 ( 5%) wall       0 kB ( 0%) ggc
 df live&initialized regs:  13.53 ( 3%) usr   0.03 ( 0%) sys  13.73 ( 3%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   2.70 ( 1%) usr   0.02 ( 0%) sys   2.50 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   9.41 ( 2%) usr   0.05 ( 1%) sys   9.44 ( 2%) wall   68255 kB ( 2%) ggc
 register information  :   3.12 ( 1%) usr   0.01 ( 0%) sys   3.04 ( 1%) wall       0 kB ( 0%) ggc
 alias analysis        :   9.25 ( 2%) usr   0.03 ( 0%) sys   9.24 ( 2%) wall  197016 kB ( 7%) ggc
 alias stmt walking    :   6.75 ( 1%) usr   0.79 (10%) sys   7.36 ( 1%) wall   18244 kB ( 1%) ggc
 integration           :   7.86 ( 2%) usr   0.55 ( 7%) sys   8.41 ( 2%) wall  722881 kB (24%) ggc
 tree CFG cleanup      :   6.69 ( 1%) usr   0.05 ( 1%) sys   6.71 ( 1%) wall   17282 kB ( 1%) ggc
 tree VRP              :  10.64 ( 2%) usr   0.32 ( 4%) sys  10.85 ( 2%) wall  295619 kB (10%) ggc
 tree PTA              :   4.34 ( 1%) usr   0.01 ( 0%) sys   4.84 ( 1%) wall   40842 kB ( 1%) ggc
 tree SSA rewrite      :   2.94 ( 1%) usr   0.04 ( 0%) sys   2.95 ( 1%) wall   52184 kB ( 2%) ggc
 tree SSA incremental  :   6.18 ( 1%) usr   0.30 ( 4%) sys   6.08 ( 1%) wall   53161 kB ( 2%) ggc
 tree operand scan     :   2.97 ( 1%) usr   1.33 (16%) sys   4.11 ( 1%) wall  442136 kB (15%) ggc
 dominator optimization:   5.28 ( 1%) usr   0.06 ( 1%) sys   5.43 ( 1%) wall  116917 kB ( 4%) ggc
 tree PRE              :  26.72 ( 5%) usr   0.32 ( 4%) sys  26.66 ( 5%) wall  211068 kB ( 7%) ggc
 tree FRE              :   5.33 ( 1%) usr   0.25 ( 3%) sys   6.19 ( 1%) wall   20183 kB ( 1%) ggc
 tree slp vectorization:   3.44 ( 1%) usr   0.06 ( 1%) sys   3.83 ( 1%) wall  277483 kB ( 9%) ggc
 dominance computation :   5.51 ( 1%) usr   0.04 ( 0%) sys   5.75 ( 1%) wall       0 kB ( 0%) ggc
 expand                :  42.65 ( 9%) usr   0.37 ( 5%) sys  43.32 ( 9%) wall  915669 kB (31%) ggc
 forward prop          :   4.34 ( 1%) usr   0.03 ( 0%) sys   4.95 ( 1%) wall   64467 kB ( 2%) ggc
 CSE                   :  11.46 ( 2%) usr   0.02 ( 0%) sys  11.53 ( 2%) wall   18427 kB ( 1%) ggc
 dead store elim1      :   3.67 ( 1%) usr   0.04 ( 0%) sys   4.05 ( 1%) wall   41282 kB ( 1%) ggc
 dead store elim2      :   3.68 ( 1%) usr   0.03 ( 0%) sys   3.82 ( 1%) wall   48489 kB ( 2%) ggc
 CPROP                 :   9.85 ( 2%) usr   0.03 ( 0%) sys   9.98 ( 2%) wall   90860 kB ( 3%) ggc
 PRE                   :  11.78 ( 2%) usr   0.04 ( 0%) sys  11.37 ( 2%) wall   14087 kB ( 0%) ggc
 CSE 2                 :   6.38 ( 1%) usr   0.00 ( 0%) sys   6.25 ( 1%) wall   11012 kB ( 0%) ggc
 combiner              :  14.21 ( 3%) usr   0.03 ( 0%) sys  14.32 ( 3%) wall  234301 kB ( 8%) ggc
 if-conversion         :   3.39 ( 1%) usr   0.01 ( 0%) sys   3.13 ( 1%) wall   29919 kB ( 1%) ggc
 integrated RA         :  27.45 ( 6%) usr   0.02 ( 0%) sys  27.35 ( 5%) wall  119575 kB ( 4%) ggc
 reload                :  12.18 ( 2%) usr   0.02 ( 0%) sys  12.24 ( 2%) wall   39450 kB ( 1%) ggc
 reload CSE regs       :   8.64 ( 2%) usr   0.05 ( 1%) sys   8.59 ( 2%) wall  106855 kB ( 4%) ggc
 hard reg cprop        :   3.14 ( 1%) usr   0.01 ( 0%) sys   3.17 ( 1%) wall    2226 kB ( 0%) ggc
 scheduling 2          :  14.65 ( 3%) usr   0.02 ( 0%) sys  13.78 ( 3%) wall    5666 kB ( 0%) ggc
 final                 :   9.01 ( 2%) usr   0.34 ( 4%) sys  11.21 ( 2%) wall  140923 kB ( 5%) ggc
 symout                :   6.34 ( 1%) usr   0.34 ( 4%) sys   6.52 ( 1%) wall  390733 kB (13%) ggc
 variable tracking     :  51.66 (10%) usr   0.05 ( 1%) sys  52.09 (10%) wall  360699 kB (12%) ggc
 TOTAL                 : 494.52             8.22           505.99            2984350 kB

It seems that df is still one of lowest hanging fruits.  Just liveness related
stuff is about 10% of compile time.

Honza



More information about the Gcc-patches mailing list