propagate_freq enormous performance hit in 3.1

Brad Lucier lucier@math.purdue.edu
Wed Aug 15 05:53:00 GMT 2001


> > I don't have setup here able to compile your testcase, but this patch should
> > solve it. Now, when we have mark_dfs_back_edges it is nice cleanup anyway

It improves it, but doesn't solve it.

> > Still in extreme case of many nested loops the code may expose quadratic
> > behaviour, but I hope it is not main issue, as it is overall relativly fast.

Perhaps this is what is happening.  The new timings are

 ___H__20_all {GC 72513k -> 24052k} {GC 32111k -> 25325k} {GC 33960k -> 25286k} {GC 40662k -> 24398k} {GC 37453k -> 27411k} {GC 50776k -> 30006k} ___init_proc ____20_all
Execution times (seconds)
 garbage collection    :   6.37 ( 1%) usr   0.01 ( 0%) sys   6.38 ( 1%) wall
 cfg construction      : 184.87 (23%) usr  14.87 (52%) sys 199.76 (24%) wall
 cfg cleanup           : 527.58 (66%) usr   0.00 ( 0%) sys 527.85 (64%) wall
 preprocessing         :   2.57 ( 0%) usr   2.85 (10%) sys   5.39 ( 1%) wall
 lexical analysis      :   3.35 ( 0%) usr   5.72 (20%) sys   9.26 ( 1%) wall
 parser                :  16.29 ( 2%) usr   3.59 (12%) sys  19.85 ( 2%) wall
 varconst              :   0.78 ( 0%) usr   0.02 ( 0%) sys   0.81 ( 0%) wall
 jump                  :  13.96 ( 2%) usr   0.03 ( 0%) sys  13.99 ( 2%) wall
 CSE                   :  12.62 ( 2%) usr   0.00 ( 0%) sys  12.62 ( 2%) wall
 loop analysis         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 CSE 2                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         :1995.55 (251%) usr   1.02 ( 4%) sys1997.21 (242%) wall
 combiner              :  13.75 ( 2%) usr   0.00 ( 0%) sys  13.75 ( 2%) wall
 if-conversion         :  10.30 ( 1%) usr   0.00 ( 0%) sys  10.30 ( 1%) wall
 scheduling            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 local alloc           :   6.32 ( 1%) usr   0.00 ( 0%) sys   6.32 ( 1%) wall
 global alloc          :  24.31 ( 3%) usr   0.59 ( 2%) sys  24.90 ( 3%) wall
 reload CSE regs       :  40.56 ( 5%) usr   0.00 ( 0%) sys  40.56 ( 5%) wall
 flow 2                :-2143.-62 (-270%) usr   0.00 ( 0%) sys-2142.-40 (-259%) wall
 if-conversion 2       :  10.81 ( 1%) usr   0.00 ( 0%) sys  10.81 ( 1%) wall
 scheduling 2          :  38.93 ( 5%) usr   0.00 ( 0%) sys  39.11 ( 5%) wall
 delay branch sched    :   6.07 ( 1%) usr   0.00 ( 0%) sys   6.32 ( 1%) wall
 reorder blocks        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 shorten branches      :   0.74 ( 0%) usr   0.00 ( 0%) sys   0.77 ( 0%) wall
 final                 :  17.03 ( 2%) usr   0.01 ( 0%) sys  17.36 ( 2%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 rest of compilation   :   5.07 ( 1%) usr   0.00 ( 0%) sys   5.07 ( 1%) wall
 TOTAL                 : 794.30            28.72           826.23
5089.43u 28.76s 1:25:23.31 99.9%

Perhaps the only time we can trust is the last one, which is reduced from

12382.95u 32.20s 3:27:02.11 99.9%

Some detailed profiling information is:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 72.97    701.23   701.23     2474   283.44   283.44  propagate_freq
  5.79    756.86    55.63                             internal_mcount
  3.46    790.12    33.26       29  1146.90  1146.90  mark_critical_edges
  2.04    809.73    19.61       15  1307.33  1308.00  calc_idoms
  1.97    828.67    18.94   258060     0.07     0.07  try_forward_edges
  1.75    845.47    16.80        3  5600.00  9094.03  flow_loops_find
  1.73    862.09    16.62       25   664.80   664.80  find_unreachable_blocks
  1.40    875.51    13.42       15   894.67   894.67  calc_dfs_tree_nonrec
  1.25    887.53    12.02 69535200     0.00     0.00  make_edge
  1.02    897.29     9.76        5  1952.00  1952.00  mark_dfs_back_edges
  0.94    906.29     9.00 69475595     0.00     0.00  make_label_edge
  0.81    914.08     7.79        3  2596.67 238666.46  estimate_bb_frequencies
...
-----------------------------------------------
                0.85    0.00       3/2474        estimate_bb_frequencies [9]
              700.38    0.00    2471/2474        estimate_loops_at_level [10]
[11]    73.0  701.23    0.00    2474         propagate_freq [11]
-----------------------------------------------

Michael, you improved the speed of some of the other flow routines
before, do you know whether similar algorithms can apply here?

Brad



More information about the Gcc-patches mailing list