This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: if-conversion a performance bottleneck
- To: matzmich at cs dot tu-berlin dot de (Michael Matz)
- Subject: Re: if-conversion a performance bottleneck
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Wed, 3 May 2000 22:05:09 -0500 (EST)
- Cc: lucier at math dot purdue dot edu (Brad Lucier), rth at cygnus dot com (Richard Henderson), gcc at gcc dot gnu dot org
> Please try the attached diff (against actual CVS) if they make also a
> difference for you ;)
Your changes to flow.c have cut the number of calls to
sbitmap_intersection_of_succs from 40604 to 24225, so they are definitely
worthwhile. Bootstrapped on alphaev6-unknown-linux-gnu.
Brad
Here are the new statistics with your changes:
Execution times (seconds)
garbage collection : 1.30 ( 1%) usr 0.00 ( 0%) sys 1.30 ( 1%) wall
parser : 6.45 ( 3%) usr 0.19 (15%) sys 6.64 ( 3%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 20.06 ( 9%) usr 0.84 (66%) sys 20.90 ( 9%) wall
CSE : 2.77 ( 1%) usr 0.00 ( 0%) sys 2.77 ( 1%) wall
global CSE : 5.51 ( 2%) usr 0.01 ( 1%) sys 5.52 ( 2%) wall
loop analysis : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
CSE 2 : 2.29 ( 1%) usr 0.00 ( 0%) sys 2.29 ( 1%) wall
flow analysis : 52.11 (23%) usr 0.06 ( 5%) sys 52.16 (23%) wall
combiner : 2.80 ( 1%) usr 0.00 ( 0%) sys 2.80 ( 1%) wall
if-conversion : 44.42 (20%) usr 0.02 ( 2%) sys 44.43 (20%) wall
regmove : 0.50 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall
scheduling : 6.26 ( 3%) usr 0.01 ( 1%) sys 6.27 ( 3%) wall
local alloc : 1.49 ( 1%) usr 0.00 ( 0%) sys 1.49 ( 1%) wall
global alloc : 2.78 ( 1%) usr 0.07 ( 6%) sys 2.86 ( 1%) wall
reload CSE regs : 7.92 ( 4%) usr 0.01 ( 1%) sys 7.93 ( 3%) wall
flow 2 : 19.36 ( 9%) usr 0.00 ( 0%) sys 19.35 ( 9%) wall
if-conversion 2 : 36.38 (16%) usr 0.01 ( 1%) sys 36.38 (16%) wall
peephole 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
schedulding 2 : 8.88 ( 4%) usr 0.00 ( 0%) sys 8.88 ( 4%) wall
shorten branches : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
final : 3.22 ( 1%) usr 0.00 ( 0%) sys 3.22 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 1.07 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall
TOTAL : 226.10 1.27 227.32
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
42.54 69.80 69.80 24225 2.88 2.88 sbitmap_intersection_of_succs
17.24 98.09 28.29 15025 1.88 1.88 sbitmap_intersection_of_preds
5.03 106.34 8.25 42 196.36 196.36 mark_critical_edges
3.98 112.86 6.52 25085231 0.00 0.00 bitmap_operation
3.56 118.70 5.84 10248 0.57 0.62 compute_block_backward_dependences
3.44 124.35 5.65 9 627.60 11546.70 compute_flow_dominators
2.37 128.24 3.89 21 185.36 185.93 delete_unreachable_blocks
2.31 132.03 3.79 6 631.02 1760.19 calculate_global_regs_live
...
-----------------------------------------------
1.88 32.76 3/9 flow_loops_find [9]
3.77 65.51 6/9 if_convert [8]
[6] 63.3 5.65 98.27 9 compute_flow_dominators [6]
69.80 0.00 24225/24225 sbitmap_intersection_of_succs [7]
28.29 0.00 15025/15025 sbitmap_intersection_of_preds [10]
0.16 0.00 39259/39259 sbitmap_a_and_b [135]
0.02 0.00 9/27 sbitmap_vector_alloc [255]
0.00 0.00 9/18 sbitmap_vector_zero [771]
0.00 0.00 9/74716 sbitmap_zero [725]
0.00 0.00 9/9 sbitmap_vector_ones [1365]
-----------------------------------------------
69.80 0.00 24225/24225 compute_flow_dominators [6]
[7] 42.5 69.80 0.00 24225 sbitmap_intersection_of_succs [7]
0.00 0.00 24225/39250 sbitmap_copy [612]
-----------------------------------------------
0.00 69.40 12/12 rest_of_compilation [5]
[8] 42.3 0.00 69.40 12 if_convert [8]
3.77 65.51 6/9 compute_flow_dominators [6]
0.00 0.06 12/43 free_basic_block_vars [112]
0.03 0.00 12/48 compute_bb_for_insn [147]
0.00 0.01 20509/20509 find_if_header [391]
0.01 0.00 6/27 sbitmap_vector_alloc [255]
0.00 0.00 1/10258 update_life_info [12]
0.00 0.00 1/37 allocate_reg_info [540]
0.00 0.00 1/20500 count_or_remove_death_notes [36]
0.00 0.00 1/995 sbitmap_alloc [679]
0.00 0.00 2/145008 max_reg_num [374]
0.00 0.00 1/74716 sbitmap_zero [725]
0.00 0.00 12/5225 get_max_uid [1134]
-----------------------------------------------
2.96 36.17 3/3 rest_of_compilation [5]
[9] 23.8 2.96 36.17 3 flow_loops_find [9]
1.88 32.76 3/9 compute_flow_dominators [6]
0.76 0.00 964/964 flow_loop_exits_find [38]
0.75 0.00 1/1 flow_depth_first_order_compute [39]
0.01 0.00 3/27 sbitmap_vector_alloc [255]
0.00 0.00 964/964 flow_loop_pre_header_find [627]
0.00 0.00 966/995 sbitmap_alloc [679]
0.00 0.00 3/3 flow_loops_tree_build [747]
0.00 0.00 964/964 flow_loop_nodes_find [791]
0.00 0.00 964/964 sbitmap_last_set_bit [834]
0.00 0.00 2/74716 sbitmap_zero [725]
0.00 0.00 964/964 sbitmap_first_set_bit [1174]
0.00 0.00 3/3 flow_loops_level_compute [1435]
-----------------------------------------------
28.29 0.00 15025/15025 compute_flow_dominators [6]
[10] 17.2 28.29 0.00 15025 sbitmap_intersection_of_preds [10]
0.00 0.00 15025/39250 sbitmap_copy [612]
-----------------------------------------------