optimization/4313: Inordinately long time in CFG manipulation

lucier@math.purdue.edu lucier@math.purdue.edu
Wed Sep 12 22:06:00 GMT 2001


>Number:         4313
>Category:       optimization
>Synopsis:       Inordinately long time in CFG manipulation
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 12 22:06:01 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     B. Lucier
>Release:        gcc version 3.1 20010912 (experimental)
>Organization:
>Environment:
Red Hat Linux 7.1 on 350 MHz PII
Configured with: ../configure --enable-languages=c --enable-checking=no
Thread model: single
gcc version 3.1 20010912 (experimental)
built with
make bootstrap BOOT_CFLAGS='-O2 -g -pg' BOOT_LDFLAGS='-g -pg'
>Description:
Here are profiled timings for the following command applied to _num.i,
which may be at

http://www.math.purdue.edu/~lucier/_num.i.gz

(the server is down right now, I can't check):

[lucier@curie ~]$ time programs/gcc/gcc-3.1/objdir/gcc/cc1 -fno-math-errno -O1 -fschedule-insns2 -fomit-frame-pointer -D___SINGLE_HOST _num.i
 __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25002k -> 7637k} {GC 10676k -> 7660k} {GC 11856k -> 7923k} {GC 11786k -> 8417k} {GC 13063k -> 9614k} {GC 15281k -> 10232k} ___init_proc ____20___num
Execution times (seconds)
 garbage collection    :   3.52 ( 0%) usr   0.01 ( 0%) sys   3.67 ( 0%) wall
 cfg construction      : 643.65 (31%) usr   0.84 ( 0%) sys 663.76 (23%) wall
 cfg cleanup           : 127.71 ( 6%) usr   0.05 ( 0%) sys 136.16 ( 5%) wall
 preprocessing         :   2.43 ( 0%) usr   0.40 ( 0%) sys   2.90 ( 0%) wall
 lexical analysis      :   2.87 ( 0%) usr   0.71 ( 0%) sys   3.57 ( 0%) wall
 parser                :  10.21 ( 0%) usr   0.39 ( 0%) sys  11.01 ( 0%) wall
 varconst              :   0.41 ( 0%) usr   0.02 ( 0%) sys   0.43 ( 0%) wall
 integration           :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 jump                  :   3.99 ( 0%) usr   0.00 ( 0%) sys   4.04 ( 0%) wall
 CSE                   :   6.49 ( 0%) usr   0.00 ( 0%) sys   6.64 ( 0%) wall
 global CSE            :  69.06 ( 3%) usr   1.58 ( 0%) sys  72.31 ( 3%) wall
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 flow analysis         :  40.20 ( 2%) usr   0.35 ( 0%) sys  41.43 ( 1%) wall
 combiner              :   5.08 ( 0%) usr   0.00 ( 0%) sys   5.27 ( 0%) wall
 if-conversion         :   1.91 ( 0%) usr   0.07 ( 0%) sys   2.06 ( 0%) wall
 local alloc           :   1.80 ( 0%) usr   0.00 ( 0%) sys   1.82 ( 0%) wall
 global alloc          :  11.02 ( 1%) usr   0.06 ( 0%) sys  11.32 ( 0%) wall
 reload CSE regs       :  25.63 ( 1%) usr   0.00 ( 0%) sys  26.21 ( 1%) wall
 flow 2                : 864.58 (41%) usr 677.04 (98%) sys1595.09 (55%) wall
 if-conversion 2       :   1.58 ( 0%) usr   0.06 ( 0%) sys   1.66 ( 0%) wall
 scheduling 2          :  13.27 ( 1%) usr   0.06 ( 0%) sys  13.52 ( 0%) wall
 shorten branches      :   0.67 ( 0%) usr   0.00 ( 0%) sys   0.69 ( 0%) wall
 reg stack             :  37.86 ( 2%) usr   7.50 ( 1%) sys  45.86 ( 2%) wall
 final                 : 213.11 (10%) usr   0.18 ( 0%) sys 232.41 ( 8%) wall
 rest of compilation   :   3.34 ( 0%) usr   0.00 ( 0%) sys   3.36 ( 0%) wall
 TOTAL                 :2090.54           689.32          2885.34
2090.670u 689.440s 48:05.77 96.3%       0+0k 0+0io 1094pf+0w

This file is compiled in a few minutes with gcc-3.0.  Here
are reports of the top times with gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 29.24    199.93   199.93  9319742     0.02     0.02  cached_make_edge
 26.59    381.78   181.85  7449858     0.02     0.02  remove_edge
  6.21    424.22    42.44 72443984     0.00     0.00  bitmap_operation
  4.68    456.23    32.01       13  2462.31  5891.02  calculate_global_regs_live
  2.77    475.14    18.91    67665     0.28     0.71  try_crossjump_bb
  2.52    492.36    17.22     9668     1.78     1.78  sbitmap_vector_alloc
  2.24    507.68    15.32                             htab_traverse
  1.62    518.77    11.09 90441614     0.00     0.00  sbitmap_zero
...
-----------------------------------------------
                0.00    0.25      10/9594        find_basic_blocks [14]
                2.65  240.51    9584/9594        find_sub_basic_blocks [9]
[8]     35.6    2.65  240.76    9594         make_edges [8]
              199.86    0.00 9316700/9319742     cached_make_edge [11]
                9.74   11.05    9584/9614        sbitmap_vector_zero [26]
               17.07    0.00    9584/9668        sbitmap_vector_alloc [27]
                2.89    0.00 9277825/9277825     make_label_edge [69]
                0.02    0.05   44343/95703       returnjump_p [286]
                0.01    0.05   48208/51142       computed_jump_p [412]
                0.01    0.00   75759/433825      next_nonnote_insn [541]
                0.00    0.00   48208/13306482     find_reg_note [98]
-----------------------------------------------
                0.00    2.77     109/9584        commit_edge_insertions [56]
                0.06  240.51    9475/9584        split_all_insns [10]
[9]     35.6    0.06  243.28    9584         find_sub_basic_blocks [9]
                2.65  240.51    9584/9594        make_edges [8]
                0.00    0.07      96/434         split_block [205]
                0.04    0.01    9584/19000       purge_dead_edges [370]
                0.00    0.00      96/7449858     remove_edge [13]
                0.00    0.00    9748/13306482     find_reg_note [98]
-----------------------------------------------
                0.06  241.28      12/12          rest_of_compilation [7]
[10]    35.3    0.06  241.28      12         split_all_insns [10]
                0.06  240.51    9475/9584        find_sub_basic_blocks [9]
                0.06    0.64  252984/252984      split_insn [134]
                0.01    0.00       1/34          compute_bb_for_insn [229]
                0.00    0.00      12/90441614     sbitmap_zero [30]
                0.00    0.00      12/565         sbitmap_alloc [1663]
                0.00    0.00       1/9536        get_max_uid [1509]
-----------------------------------------------
                0.07    0.00    3042/9319742     make_edge [411]
              199.86    0.00 9316700/9319742     make_edges [8]
[11]    29.2  199.93    0.00 9319742         cached_make_edge [11]
-----------------------------------------------
                0.28   54.49       3/10          free_basic_block_vars [20]
                0.65  127.14       7/10          find_basic_blocks [14]
[12]    26.7    0.93  181.63      10         clear_edges [12]
              181.63    0.00 7440953/7449858     remove_edge [13]
-----------------------------------------------
                0.00    0.00       3/7449858     redirect_edge_succ_nodup [809]
                0.00    0.00      18/7449858     purge_dead_edges [370]
                0.00    0.00      36/7449858     try_redirect_by_replacing_jump [337]
                0.00    0.00      96/7449858     find_sub_basic_blocks [9]
                0.02    0.00     703/7449858     merge_blocks_nomove [325]
                0.06    0.00    2620/7449858     try_crossjump_to_edge [24]
                0.13    0.00    5429/7449858     flow_delete_block [41]
              181.63    0.00 7440953/7449858     clear_edges [12]
[13]    26.6  181.85    0.00 7449858         remove_edge [13]
-----------------------------------------------
                0.01  128.33      10/10          rest_of_compilation [7]
[14]    18.8    0.01  128.33      10         find_basic_blocks [14]
                0.65  127.14       7/10          clear_edges [12]
                0.00    0.25      10/9594        make_edges [8]
                0.09    0.02      10/10          find_basic_blocks_1 [333]
                0.08    0.00      10/34          compute_bb_for_insn [229]
                0.06    0.00      10/10          count_basic_blocks [425]
                0.03    0.01      10/14          tidy_fallthru_edges [437]
                0.00    0.00      10/1239490     timevar_push [201]
                0.00    0.00      10/254         varray_init [1743]
                0.00    0.00      10/9536        get_max_uid [1509]
-----------------------------------------------
                0.00   97.09       3/3           rest_of_compilation [7]
[15]    14.2    0.00   97.09       3         schedule_insns [15]
                0.12   89.56    9485/9496        update_life_info [16]
                0.00    2.83    9485/9485        schedule_region [72]
                0.00    2.13       3/3           init_regions [82]
                2.13    0.00    9485/18975       count_or_remove_death_notes [55]
                0.04    0.24       3/3           sched_init [222]
                0.02    0.00       3/34          compute_bb_for_insn [229]
                0.01    0.00       3/13          allocate_reg_life_data [485]
                0.00    0.00    9485/90441614     sbitmap_zero [30]
                0.00    0.00       3/3           reposition_prologue_and_epilogue_notes [1365]
                0.00    0.00       3/3796403     sbitmap_ones [242]
                0.00    0.00       6/565         sbitmap_alloc [1663]
                0.00    0.00       3/9536        get_max_uid [1509]
                0.00    0.00       3/3           sched_finish [2063]
                0.00    0.00       3/35859       get_insns [1454]
-----------------------------------------------
                0.00    0.01       1/9496        if_convert [31]
                0.00    0.03       3/9496        optimize_mode_switching [23]
                0.00    0.07       7/9496        life_analysis [45]
                0.12   89.56    9485/9496        schedule_insns [15]
[16]    13.1    0.12   89.67    9496         update_life_info [16]
               32.01   44.57      13/13          calculate_global_regs_live [17]
                0.00    9.83       3/22          cleanup_cfg [18]
                0.18    3.04   84954/143280      propagate_block [48]
                0.03    0.00   84954/229107      bitmap_copy [362]
                0.00    0.01    9485/9485        verify_local_live_at_start [845]
                0.00    0.00    9496/1465878     bitmap_clear [330]
                0.00    0.00       3/18975       count_or_remove_death_notes [55]
                0.00    0.00    9496/568566      bitmap_initialize [487]
-----------------------------------------------
               32.01   44.57      13/13          update_life_info [16]
[17]    11.2   32.01   44.57      13         calculate_global_regs_live [17]
               42.25    0.00 72111186/72443984     bitmap_operation [22]
                0.12    2.06   57752/143280      propagate_block [48]
                0.04    0.00  101235/229107      bitmap_copy [362]
                0.04    0.00  203195/2166730     bitmap_set_bit [179]
                0.00    0.03   57752/67237       bitmap_equal_p [501]
                0.02    0.00  328090/1465878     bitmap_clear [330]
                0.01    0.00  113217/568566      bitmap_initialize [487]
                0.00    0.00       1/90441614     sbitmap_zero [30]
-----------------------------------------------
                0.00    9.83       3/22          update_life_info [16]
                0.01   62.24      19/22          rest_of_compilation [7]
[18]    10.5    0.01   72.07      22         cleanup_cfg [18]
                0.09   64.62      22/22          try_optimize_cfg [19]
                0.03    7.33      31/31          delete_unreachable_blocks [37]
                0.00    0.00      22/1239490     timevar_pop [197]
                0.00    0.00      22/1239490     timevar_push [201]
                0.00    0.00      44/166659      free_EXPR_LIST_list [995]
-----------------------------------------------
                0.09   64.62      22/22          cleanup_cfg [18]
[19]     9.5    0.09   64.62      22         try_optimize_cfg [19]
               18.91   28.89   67665/67665       try_crossjump_bb [21]
                9.23    0.11  208787/208787      try_forward_edges [33]
                0.00    3.82    3146/5417        flow_delete_block [41]
                0.02    2.75  208787/208787      try_simplify_condjump [73]
                0.00    0.37     890/890         merge_blocks [192]
                0.00    0.32    2015/10723       flow_delete_insn_chain [86]
                0.02    0.12   74300/81241       redirect_edge_and_branch [278]
                0.01    0.02   96070/27562210     forwarder_block_p [35]
                0.02    0.01   86632/12775689     onlyjump_p [58]
                0.00    0.00    2015/138502      reg_mentioned_p [346]
                0.00    0.00     353/7724        redirect_edge_succ_nodup [809]
-----------------------------------------------
                0.00    5.25       7/73          life_analysis [45]
                0.00    6.75       9/73          if_convert [31]
                0.00   42.76      57/73          rest_of_compilation [7]
[20]     8.0    0.00   54.77      73         free_basic_block_vars [20]
                0.28   54.49       3/10          clear_edges [12]
-----------------------------------------------
               18.91   28.89   67665/67665       try_optimize_cfg [19]
[21]     7.0   18.91   28.89   67665         try_crossjump_bb [21]
                2.50   26.39 7261292/7261292     try_crossjump_to_edge [24]
-----------------------------------------------
                0.00    0.00       9/72443984     find_if_case_1 [533]
                0.00    0.00      14/72443984     find_if_case_2 [642]
                0.00    0.00     420/72443984     dead_or_predicable [813]
                0.01    0.00   18758/72443984     update_equiv_regs [208]
                0.04    0.00   67237/72443984     bitmap_equal_p [501]
                0.14    0.00  246360/72443984     finish_spills [181]
               42.25    0.00 72111186/72443984     calculate_global_regs_live [17]
[22]     6.2   42.44    0.00 72443984         bitmap_operation [22]
-----------------------------------------------
                0.88   29.45       3/3           rest_of_compilation [7]
[23]     4.4    0.88   29.45       3         optimize_mode_switching [23]
                0.00   28.35       6/6           pre_edge_lcm [25]
                0.25    0.57       1/5           commit_edge_insertions [56]
                0.02    0.03       3/21          sbitmap_vector_ones [188]
                0.01    0.03   36043/275175      get_attr_type [207]
                0.02    0.02   36284/2348999     note_stores <cycle 7> [140]
                0.00    0.03       3/9496        update_life_info [16]
                0.01    0.01      12/9614        sbitmap_vector_zero [26]
                0.02    0.00   23930/276241      reg_set_to_hard_reg_set [228]
                0.02    0.00      12/9668        sbitmap_vector_alloc [27]
                0.00    0.01    3541/62459       recog_memoized_1 [240]
                0.01    0.00       3/13          allocate_reg_life_data [485]
                0.01    0.00   27290/384090      asm_noperands [331]
                0.00    0.00   14620/86781       gen_sequence [482]
                0.00    0.00   14634/102819      end_sequence [728]
                0.00    0.00   18762/3738844     sbitmap_not [265]
                0.00    0.00   14535/14825       emit_insn_before [1120]
                0.00    0.00       5/5           emit_i387_cw_initialization [1235]
                0.00    0.00      10/47          assign_386_stack_local [1203]
                0.00    0.00      14/123         insert_insn_on_edge [1175]
                0.00    0.00       5/2348999     emit_move_insn <cycle 7> [460]
                0.00    0.00      71/198         emit_block_insn_before [1383]
                0.00    0.00   23420/23420       reg_dies [1471]
                0.00    0.00   14620/102819      start_sequence [1426]
                0.00    0.00    9385/9385        new_seginfo [1512]
                0.00    0.00    9385/9385        add_seginfo [1511]
                0.00    0.00      48/48          make_preds_opaque [1895]
                0.00    0.00       6/6           free_edge_list [1987]
-----------------------------------------------
                2.50   26.39 7261292/7261292     try_crossjump_bb [21]
[24]     4.2    2.50   26.39 7261292         try_crossjump_to_edge [24]
                2.97   14.21 6289011/6289011     flow_find_cross_jump [28]
                2.41    5.63 27025588/27562210     forwarder_block_p [35]
                0.00    0.40    2549/10723       flow_delete_insn_chain [86]
                0.35    0.02 7261156/7261156     outgoing_edges_match [195]
                0.01    0.25     338/434         split_block [205]
                0.06    0.00    2620/7449858     remove_edge [13]
                0.01    0.05    2549/3021        make_single_succ_edge [387]
                0.01    0.00    2549/2631        set_block_for_new_insns [789]
                0.00    0.00    2549/9171        gen_jump [689]
                0.00    0.00    2549/2596        emit_jump_insn_before [949]
                0.00    0.00    2549/13306482     find_reg_note [98]
                0.00    0.00    2549/433825      next_nonnote_insn [541]
                0.00    0.00    2549/8290        block_label [1100]
                0.00    0.00      71/176         emit_barrier_after [1166]
                0.00    0.00      71/165738      gen_rtx_CONST_INT [686]
-----------------------------------------------
                0.00   28.35       6/6           optimize_mode_switching [23]
[25]     4.1    0.00   28.35       6         pre_edge_lcm [25]
                5.85    4.21       6/6           compute_laterin [32]
                1.41    5.29       6/6           compute_antinout_edge [40]
                0.88    4.41       6/6           compute_available [49]
                1.50    1.51       6/6           compute_earliest [66]
                1.16    0.68       6/6           compute_insert_delete [85]
                1.35    0.00       6/6           create_edge_list [100]
                0.10    0.00      54/9668        sbitmap_vector_alloc [27]
-----------------------------------------------
                0.00    0.00       3/9614        init_dependency_caches [698]
                0.01    0.01      12/9614        optimize_mode_switching [23]
                0.02    0.02      15/9614        idoms_to_doms [248]
                9.74   11.05    9584/9614        make_edges [8]
[26]     3.1    9.77   11.09    9614         sbitmap_vector_zero [26]
               11.09    0.00 90422081/90441614     sbitmap_zero [30]
-----------------------------------------------
                0.01    0.00       3/9668        flow_loops_find [63]
                0.01    0.00       3/9668        init_dependency_caches [698]
                0.01    0.00       6/9668        estimate_probability [34]
                0.01    0.00       6/9668        if_convert [31]
                0.02    0.00      12/9668        optimize_mode_switching [23]
                0.10    0.00      54/9668        pre_edge_lcm [25]
               17.07    0.00    9584/9668        make_edges [8]
[27]     2.5   17.22    0.00    9668         sbitmap_vector_alloc [27]
-----------------------------------------------
                2.97   14.21 6289011/6289011     try_crossjump_to_edge [24]
[28]     2.5    2.97   14.21 6289011         flow_find_cross_jump [28]
                7.32    0.00 5587015/5643903     rtx_renumbered_equal_p [36]
                2.31    1.30 12578022/12775689     onlyjump_p [58]
                0.74    2.36 11164966/11164997     find_reg_equal_equiv_note [65]
                0.03    0.08  440578/508111      stack_regs_mentioned [316]
                0.02    0.05   41435/95703       returnjump_p [286]
                0.00    0.00     196/2223309     rtx_equal_p [239]
-----------------------------------------------
                                                 <spontaneous>
[29]     2.5   15.32    1.60                 htab_traverse [29]
                0.00    0.87   43785/43785       rtx_htab_mark_1 [116]
                0.20    0.53  460865/460865      cselib_invalidate_mem_1 [127]
                0.00    0.00    1170/1170        mark_hash_entry [1067]
                0.00    0.00    3386/3386        discard_useless_locs [1565]
                0.00    0.00    1631/1631        discard_useless_values [1598]
-----------------------------------------------
                0.00    0.00       1/90441614     calculate_global_regs_live [17]
                0.00    0.00       1/90441614     flow_depth_first_order_compute [160]
                0.00    0.00       1/90441614     if_convert [31]
                0.00    0.00       2/90441614     flow_loops_find [63]
                0.00    0.00       3/90441614     combine_instructions [74]
                0.00    0.00       6/90441614     mark_dfs_back_edges [80]
                0.00    0.00       6/90441614     compute_antinout_edge [40]
                0.00    0.00       6/90441614     compute_earliest [66]
                0.00    0.00       6/90441614     compute_available [49]
                0.00    0.00      12/90441614     split_all_insns [10]
                0.00    0.00     519/90441614     flow_loop_nodes_find [1183]
                0.00    0.00    9485/90441614     init_regions [82]
                0.00    0.00    9485/90441614     schedule_insns [15]
               11.09    0.00 90422081/90441614     sbitmap_vector_zero [26]
[30]     1.6   11.09    0.00 90441614         sbitmap_zero [30]
-----------------------------------------------

The enormous amount of system time in flow2 is from mmapping
and unmmapping a large bitmap, Daniel Berlin provided the
analysis previously.

I wanted to send this timing data now because of the large
changes being made to the cfg code, with the hope that
there can be some improvement.

Brad Lucier
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:



More information about the Gcc-bugs mailing list