This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Timing information for CFG manipulations
- To: jh at suse dot cz (Jan Hubicka)
- Subject: Re: Timing information for CFG manipulations
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Tue, 16 Oct 2001 18:57:40 -0500 (EST)
- Cc: lucier at math dot purdue dot edu (Brad Lucier), jh at suse dot cz (Jan Hubicka), gcc-patches at gcc dot gnu dot org, rth at cygnus dot com, gcc at gcc dot gnu dot org
> Just curious, how does the time compare to the older gcc versions?
Well, 3.0.1 compiles this file in about 1/3 the time:
dino01% /soft/parallelisme/linux/gcc-3.0.1/lib/gcc-lib/i686-pc-linux-gnu/3.0.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
__sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 23738k -> 7627k} {GC 11539k -> 7534k} {GC 9859k -> 7972k} {GC 11631k -> 8916k} {GC 14125k -> 9023k} ___init_proc ____20___num
Execution times (seconds)
garbage collection : 0.61 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall
preprocessing : 0.14 ( 0%) usr 0.13 (14%) sys 0.27 ( 0%) wall
lexical analysis : 0.34 ( 1%) usr 0.23 (24%) sys 0.56 ( 1%) wall
parser : 1.10 ( 2%) usr 0.15 (16%) sys 1.27 ( 2%) wall
varconst : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
jump : 4.04 ( 6%) usr 0.17 (18%) sys 4.36 ( 6%) wall
CSE : 0.75 ( 1%) usr 0.00 ( 0%) sys 0.77 ( 1%) wall
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 11.21 (17%) usr 0.08 ( 8%) sys 12.47 (16%) wall
combiner : 1.07 ( 2%) usr 0.01 ( 1%) sys 1.12 ( 1%) wall
if-conversion : 1.12 ( 2%) usr 0.03 ( 3%) sys 1.22 ( 2%) wall
local alloc : 0.41 ( 1%) usr 0.03 ( 3%) sys 0.56 ( 1%) wall
global alloc : 2.58 ( 4%) usr 0.03 ( 3%) sys 3.14 ( 4%) wall
reload CSE regs : 9.85 (15%) usr 0.01 ( 1%) sys 12.76 (16%) wall
flow 2 : 13.72 (21%) usr 0.01 ( 1%) sys 19.63 (25%) wall
if-conversion 2 : 0.96 ( 1%) usr 0.01 ( 1%) sys 1.19 ( 2%) wall
shorten branches : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
reg stack : 15.94 (24%) usr 0.05 ( 5%) sys 17.05 (22%) wall
final : 0.97 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 0.45 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall
TOTAL : 65.49 0.96 78.82
> I wonder if we can't speed up the flow.c pass considerably.
> What other functions (except for bitmap_operation) does have more than 10
> millions of calls? Do we run into problems with too much RTL traversal
> or it is purely dominated by the dataflow bitmaps?
Here is some detailed information from gprof:
/u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
__sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num
Execution times (seconds)
garbage collection : 1.10 ( 1%) usr 0.00 ( 0%) sys 1.09 ( 1%) wall
cfg construction : 8.09 ( 4%) usr 0.40 ( 8%) sys 8.56 ( 4%) wall
cfg cleanup : 52.37 (25%) usr 0.03 ( 1%) sys 52.47 (24%) wall
preprocessing : 0.48 ( 0%) usr 0.10 ( 2%) sys 0.50 ( 0%) wall
lexical analysis : 0.73 ( 0%) usr 0.22 ( 5%) sys 0.94 ( 0%) wall
parser : 2.57 ( 1%) usr 0.21 ( 4%) sys 2.94 ( 1%) wall
varconst : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall
jump : 0.76 ( 0%) usr 0.02 ( 0%) sys 0.72 ( 0%) wall
CSE : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.75 ( 1%) wall
global CSE : 41.29 (20%) usr 0.53 (11%) sys 41.88 (19%) wall
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
flow analysis : 24.23 (11%) usr 0.16 ( 3%) sys 24.38 (11%) wall
combiner : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.78 ( 1%) wall
if-conversion : 1.19 ( 1%) usr 0.04 ( 1%) sys 1.25 ( 1%) wall
local alloc : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall
global alloc : 4.69 ( 2%) usr 0.05 ( 1%) sys 4.75 ( 2%) wall
reload CSE regs : 9.90 ( 5%) usr 0.01 ( 0%) sys 9.88 ( 5%) wall
flow 2 : 33.92 (16%) usr 0.10 ( 2%) sys 34.00 (16%) wall
if-conversion 2 : 0.98 ( 0%) usr 0.03 ( 1%) sys 1.03 ( 0%) wall
shorten branches : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
reg stack : 22.45 (11%) usr 2.86 (60%) sys 25.31 (12%) wall
final : 0.75 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall
rest of compilation : 1.30 ( 1%) usr 0.00 ( 0%) sys 1.38 ( 1%) wall
TOTAL : 211.35 4.78 216.50
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.66 29.10 29.10 72698858 0.00 0.00 bitmap_operation
12.43 50.81 21.71 13 1670.00 4145.64 calculate_global_regs_live
9.86 68.04 17.23 9305997 0.00 0.00 cached_make_edge
5.57 77.77 9.73 67331 0.14 0.38 try_crossjump_bb
4.03 84.81 7.04 htab_traverse
2.99 90.03 5.22 27855 0.19 0.19 sbitmap_intersection_of_su
ccs
2.60 94.57 4.54 207472 0.02 0.02 try_forward_edges
2.31 98.61 4.04 6 673.33 976.87 compute_laterin
2.21 102.47 3.86 6270 0.62 0.62 expunge_block
2.18 106.28 3.81 191 19.95 19.95 sbitmap_vector_alloc
2.15 110.04 3.76 6047223 0.00 0.00 rtx_renumbered_equal_p
2.12 113.74 3.70 31 119.35 119.35 find_unreachable_blocks
1.72 116.75 3.01 61978208 0.00 0.00 active_insn_p
1.67 119.66 2.91 9272730 0.00 0.00 make_label_edge
1.67 122.57 2.91 451 6.45 6.45 propagate_freq
1.35 124.93 2.36 15 157.33 158.67 calc_idoms
1.33 127.26 2.33 101420319 0.00 0.00 bitmap_element_link
1.33 129.59 2.33 29217 0.08 0.08 sbitmap_intersection_of_pr
eds
1.11 131.53 1.94 25030880 0.00 0.00 forwarder_block_p
1.01 133.30 1.77 5845902 0.00 0.00 flow_find_cross_jump
0.97 134.99 1.69 9385 0.18 0.20 convert_regs_1
0.95 136.65 1.66 15 110.67 110.67 calc_dfs_tree_nonrec
0.94 138.29 1.64 120 13.67 201.19 make_edges
0.87 139.81 1.52 11128643 0.00 0.00 sbitmap_a_and_b
0.81 141.23 1.42 6407969 0.00 0.00 try_crossjump_to_edge
0.78 142.60 1.37 5627 0.24 0.24 can_delete_label_p
0.69 143.81 1.21 6 201.67 281.03 compute_insert_delete
0.68 144.99 1.18 6 196.67 328.86 compute_earliest
0.64 146.11 1.12 6 186.67 186.67 mark_dfs_back_edges
0.58 147.13 1.02 11904476 0.00 0.00 onlyjump_p
0.56 148.10 0.97 10 97.00 106.99 clear_edges
0.54 149.05 0.95 7458208 0.00 0.00 sbitmap_difference
0.54 150.00 0.95 3 316.67 701.77 flow_loops_find
0.50 150.87 0.87 6 145.00 1027.93 compute_antinout_edge
0.46 151.68 0.81 6 135.00 135.00 create_edge_list
0.42 152.42 0.74 13479620 0.00 0.00 find_reg_note
0.35 153.03 0.61 5 122.00 4570.84 commit_edge_insertions
0.34 153.62 0.59 10292 0.06 0.06 remove_edge
0.33 154.19 0.57 6 95.00 496.89 compute_available
0.33 154.76 0.57 3 190.00 1449.86 estimate_bb_frequencies
0.30 155.29 0.53 11543061 0.00 0.00 find_reg_equal_equiv_note
0.30 155.82 0.53 24708 0.02 0.04 try_combine
0.30 156.34 0.52 12898539 0.00 0.00 side_effects_p
0.26 156.80 0.46 170126 0.00 0.00 find_reloads
0.24 157.22 0.42 4200455 0.00 0.00 ggc_set_mark
0.24 157.64 0.42 1 420.00 2322.97 convert_regs_2
0.22 158.02 0.38 18788 0.02 0.02 remove_fake_successors
0.22 158.40 0.38 6 63.33 71.66 thread_jumps
0.21 158.77 0.37 3 123.33 14103.27 optimize_mode_switching
0.19 159.10 0.33 3 110.00 110.00 compute_alignments
0.18 159.42 0.32 11698103 0.00 0.00 single_set_2
...
-----------------------------------------------
21.71 32.18 13/13 update_life_info [8]
[9] 30.9 21.71 32.18 13 calculate_global_regs_live [9]
28.90 2.36 72205551/72698858 bitmap_operation [14]
0.03 0.82 56808/132182 propagate_block [55]
0.00 0.02 56808/56808 bitmap_equal_p [376]
0.02 0.00 100467/219660 bitmap_copy [279]
0.01 0.00 328395/1374187 bitmap_clear [304]
0.00 0.00 255232/1933812 bitmap_set_bit [282]
0.00 0.00 112039/518311 bitmap_initialize [420]
0.00 0.00 1828/101420319 bitmap_element_link [52]
0.00 0.00 1/1125177 sbitmap_zero [225]
-----------------------------------------------
0.37 41.94 3/3 rest_of_compilation [7]
[10] 24.2 0.37 41.94 3 optimize_mode_switching [10]
0.00 20.56 6/6 pre_edge_lcm [19]
0.01 16.47 3/11 update_life_info [8]
0.12 4.45 1/5 commit_edge_insertions [17]
0.24 0.00 12/191 sbitmap_vector_alloc [38]
0.01 0.01 3/21 sbitmap_vector_ones [177]
0.01 0.01 37484/2679415 note_stores <cycle 7> [115]
0.00 0.01 37243/80229 get_attr_type [327]
0.01 0.01 12/137 sbitmap_vector_zero [171]
0.01 0.00 48/48 make_preds_opaque [581]
0.00 0.00 3/10 allocate_reg_life_data [457]
0.00 0.00 23837/460580 reg_set_to_hard_reg_set [244]
0.00 0.00 14620/91501 gen_sequence [405]
0.00 0.00 3528/60871 recog_memoized_1 [290]
0.00 0.00 28568/474001 asm_noperands [359]
0.00 0.00 18576/3738386 sbitmap_not [184]
0.00 0.00 14606/14838 emit_insn_before [991]
0.00 0.00 5/5 emit_i387_cw_initialization [1113]
0.00 0.00 14/123 insert_insn_on_edge [1058]
0.00 0.00 10/68 assign_386_stack_local [1079]
0.00 0.00 5/2679415 emit_move_insn <cycle 7> [451]
0.00 0.00 24525/24525 reg_dies [1398]
0.00 0.00 14634/108004 end_sequence [1322]
0.00 0.00 14620/108004 start_sequence [1323]
0.00 0.00 9292/9292 new_seginfo [1466]
0.00 0.00 9292/9292 add_seginfo [1465]
0.00 0.00 6/6 free_edge_list [1968]
-----------------------------------------------
0.00 5.32 3/22 update_life_info [8]
0.00 33.69 19/22 rest_of_compilation [7]
[11] 22.3 0.00 39.01 22 cleanup_cfg [11]
0.08 35.19 22/22 try_optimize_cfg [13]
0.02 3.72 31/31 delete_unreachable_blocks [40]
0.00 0.00 22/1239368 timevar_push [153]
0.00 0.00 22/1239368 timevar_pop [164]
0.00 0.00 44/136318 free_EXPR_LIST_list [1315]
-----------------------------------------------
0.00 5.54 1/7 reg_to_stack [23]
0.00 33.26 6/7 rest_of_compilation [7]
[12] 22.2 0.00 38.81 7 life_analysis [12]
0.03 38.42 7/11 update_life_info [8]
0.05 0.14 6/16 init_alias_analysis [95]
0.04 0.06 7/10 delete_noop_moves [179]
0.00 0.03 7/73 free_basic_block_vars [119]
0.01 0.02 3/3 notice_stack_pointer_modification [365]
0.01 0.01 7/10 allocate_reg_life_data [457]
0.00 0.00 7/7 allocate_bb_life_data [716]
0.00 0.00 7/7 mark_regs_live_at_end [1213]
0.00 0.00 6/16 end_alias_analysis [1922]
-----------------------------------------------
0.08 35.19 22/22 cleanup_cfg [11]
[13] 20.2 0.08 35.19 22 try_optimize_cfg [13]
9.73 15.62 67331/67331 try_crossjump_bb [15]
4.54 0.07 207472/207472 try_forward_edges [33]
0.00 2.40 3138/5348 flow_delete_block [36]
0.06 1.72 207472/207472 try_simplify_condjump [59]
0.00 0.41 1099/1099 merge_blocks [107]
0.00 0.38 6/6 remove_fake_edges [111]
0.00 0.18 2011/12100 delete_insn_chain [73]
0.00 0.05 74089/81227 redirect_edge_and_branch [271]
0.01 0.01 95491/25030880 forwarder_block_p [32]
0.01 0.00 85868/11904476 onlyjump_p [64]
0.00 0.00 350/7709 redirect_edge_succ_nodup [604]
0.00 0.00 2011/157599 reg_mentioned_p [422]
0.00 0.00 6/6 add_noreturn_fake_exit_edges [1965]
-----------------------------------------------
0.00 0.00 5/72698858 find_if_case_1 [620]
0.00 0.00 270/72698858 dead_or_predicable [710]
0.01 0.00 18572/72698858 update_equiv_regs [159]
0.02 0.00 56808/72698858 bitmap_equal_p [376]
0.17 0.01 417652/72698858 finish_spills [114]
28.90 2.36 72205551/72698858 calculate_global_regs_live [9]
[14] 18.0 29.10 2.38 72698858 bitmap_operation [14]
2.31 0.00 100601217/101420319 bitmap_element_link [52]
0.07 0.00 3360247/4560586 bitmap_element_allocate [214]
-----------------------------------------------
9.73 15.62 67331/67331 try_optimize_cfg [13]
[15] 14.5 9.73 15.62 67331 try_crossjump_bb [15]
1.42 14.20 6407969/6407969 try_crossjump_to_edge [21]
-----------------------------------------------
0.14 1.88 10/120 find_basic_blocks [46]
1.50 20.63 110/120 find_sub_basic_blocks [18]
[16] 13.8 1.64 22.50 120 make_edges [16]
17.22 0.00 9301682/9305997 cached_make_edge [20]
2.91 0.00 9272730/9272730 make_label_edge [47]
2.19 0.00 110/191 sbitmap_vector_alloc [38]
0.07 0.06 110/137 sbitmap_vector_zero [171]
0.00 0.02 44026/97363 returnjump_p [317]
0.01 0.00 47889/50845 computed_jump_p [493]
0.01 0.00 75082/93306 next_nonnote_insn [556]
0.00 0.00 47889/13479620 find_reg_note [83]
-----------------------------------------------
0.12 4.45 1/5 optimize_mode_switching [10]
0.12 4.45 1/5 convert_regs [27]
0.37 13.35 3/5 thread_prologue_and_epilogue_insns [22]
[17] 13.1 0.61 22.24 5 commit_edge_insertions [17]
0.27 21.96 109/110 find_sub_basic_blocks [18]
0.00 0.02 109/109 commit_one_edge_insertion [472]
-----------------------------------------------
0.00 0.20 1/110 split_all_insns [96]
0.27 21.96 109/110 commit_edge_insertions [17]
[18] 12.8 0.27 22.16 110 find_sub_basic_blocks [18]
1.50 20.63 110/120 make_edges [16]
0.00 0.03 362/362 find_bb_boundaries [339]
0.00 0.00 362/9683 purge_dead_edges [684]
0.00 0.00 544/13479620 find_reg_note [83]
-----------------------------------------------
0.00 20.56 6/6 optimize_mode_switching [10]
[19] 11.8 0.00 20.56 6 pre_edge_lcm [19]
0.87 5.30 6/6 compute_antinout_edge [29]
4.04 1.82 6/6 compute_laterin [30]
0.57 2.41 6/6 compute_available [43]
1.18 0.79 6/6 compute_earliest [56]
1.21 0.48 6/6 compute_insert_delete [60]
1.08 0.00 54/191 sbitmap_vector_alloc [38]
0.81 0.00 6/6 create_edge_list [79]
-----------------------------------------------
0.01 0.00 4315/9305997 make_edge [613]
17.22 0.00 9301682/9305997 make_edges [16]
[20] 9.9 17.23 0.00 9305997 cached_make_edge [20]
-----------------------------------------------
1.42 14.20 6407969/6407969 try_crossjump_bb [15]
[21] 8.9 1.42 14.20 6407969 try_crossjump_to_edge [21]
1.77 6.66 5845902/5845902 flow_find_cross_jump [24]
1.90 2.93 24480556/25030880 forwarder_block_p [32]
0.00 0.33 3663/12100 delete_insn_chain [73]
0.21 0.01 6407841/6407841 outgoing_edges_match [148]
0.21 0.00 3729/10292 remove_edge [93]
0.00 0.18 517/597 split_block [154]
0.00 0.01 3663/4298 make_single_succ_edge [614]
0.00 0.00 3663/10353 gen_jump [689]
0.00 0.00 3663/3736 emit_jump_insn_after [783]
0.00 0.00 3663/93306 next_nonnote_insn [556]
0.00 0.00 3663/13479620 find_reg_note [83]
0.00 0.00 3729/7450804 free_edge [203]
0.00 0.00 3663/9503 block_label [961]
0.00 0.00 66/153 emit_barrier_after [1036]
0.00 0.00 66/254995 gen_rtx_CONST_INT [519]
-----------------------------------------------
0.00 13.72 3/3 rest_of_compilation [7]
[22] 7.9 0.00 13.72 3 thread_prologue_and_epilogue_insns [22]
0.37 13.35 3/5 commit_edge_insertions [17]
0.00 0.00 3/3 gen_prologue [699]
0.00 0.00 3/3 gen_epilogue [1092]
0.00 0.00 6/91501 gen_sequence [405]
0.00 0.00 6/129874 emit_insn [386]
0.00 0.00 6/123 insert_insn_on_edge [1058]
0.00 0.00 6/51353 emit_note [663]
0.00 0.00 3/17313 emit_jump_insn [703]
0.00 0.00 12/108004 end_sequence [1322]
0.00 0.00 6/108004 start_sequence [1323]
0.00 0.00 6/6 record_insns [1976]
0.00 0.00 3/3 ix86_can_use_return_insn_p [2039]
-----------------------------------------------
0.26 12.66 3/3 rest_of_compilation [7]
[23] 7.4 0.26 12.66 3 reg_to_stack [23]
0.00 6.92 1/1 convert_regs [27]
0.00 5.54 1/7 life_analysis [12]
0.19 0.00 1/6 mark_dfs_back_edges [72]
0.01 0.00 1/5 count_or_remove_death_notes [309]
0.00 0.00 1/7 delete_dead_jumptables [351]
0.00 0.00 1/4 alloc_aux_for_blocks [582]
0.00 0.00 105/152060 gen_raw_REG [366]
0.00 0.00 105/138338 gen_rtx_REG [1314]
0.00 0.00 1/9317 get_max_uid [1463]
0.00 0.00 1/262 varray_init [1700]
-----------------------------------------------
1.77 6.66 5845902/5845902 try_crossjump_to_edge [21]
[24] 4.8 1.77 6.66 5845902 flow_find_cross_jump [24]
3.59 0.00 5771506/6047223 rtx_renumbered_equal_p [39]
0.53 0.96 11543012/11543061 find_reg_equal_equiv_note [65]
1.00 0.47 11691804/11904476 onlyjump_p [64]
0.05 0.04 616039/682505 stack_regs_mentioned [205]
0.00 0.02 46872/97363 returnjump_p [317]
0.00 0.00 188/1264901 rtx_equal_p [442]
0.00 0.00 7/39 remove_note [1892]
-----------------------------------------------
0.02 7.72 9/9 rest_of_compilation [7]
[25] 4.4 0.02 7.72 9 if_convert [25]
0.00 5.49 1/11 update_life_info [8]
0.00 1.65 6/15 calculate_dominance_info [35]
0.01 0.40 28166/28166 find_if_header [106]
0.12 0.00 6/191 sbitmap_vector_alloc [38]
0.00 0.04 9/73 free_basic_block_vars [119]
0.01 0.00 1/5 count_or_remove_death_notes [309]
0.00 0.00 1/26 allocate_reg_info [432]
0.00 0.00 1/1125177 sbitmap_zero [225]
0.00 0.00 2/60023 max_reg_num [1344]
0.00 0.00 1/662 sbitmap_alloc [1612]