This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Tremendous performance regression in 1.1.2 -> mainline
- To: matzmich at cs dot tu-berlin dot de (Michael Matz)
- Subject: Re: Tremendous performance regression in 1.1.2 -> mainline
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Sun, 9 Apr 2000 01:23:18 -0500 (EST)
- Cc: lucier at math dot purdue dot edu (Brad Lucier), gcc at gcc dot gnu dot org
> With a reduced version of your test I had an big improvement in speed:
> (from 85 to 8 seconds in flow).
This works great, thanks! I'm wondering if the same change in the
order of processing the blocks might work in the lcm routine
compute_pre_data to reduce the runtime there. There is about a factor
of five regression in the time to compile
http://www.math.purdue.edu/~lucier/all.i.gz
that occurs mainly in the gcse pass:
-----------------------------------------------
0.00 151.92 3/3 rest_of_compilation [5]
[6] 49.6 0.00 151.92 3 gcse_main [6]
0.00 132.11 1/1 one_pre_gcse_pass [7]
0.00 19.66 2/2 one_cprop_pass [27]
0.01 0.08 2/2 compute_sets [258]
0.04 0.02 2/2 alloc_gcse_mem [300]
0.00 0.00 1/1 compute_can_copy [1161]
0.00 0.00 6/185600 max_reg_num [440]
0.00 0.00 1/66300 gcse_alloc [467]
0.00 0.00 2/2 alloc_reg_set_mem [1572]
0.00 0.00 2/2 free_reg_set_mem [1581]
0.00 0.00 2/2 free_gcse_mem [1580]
0.00 0.00 1/14 gcc_obstack_init [1420]
-----------------------------------------------
0.00 132.11 1/1 gcse_main [6]
[7] 43.1 0.00 132.11 1 one_pre_gcse_pass [7]
0.00 105.12 1/1 compute_pre_data [8]
0.00 25.12 1/1 pre_gcse [19]
1.03 0.77 1/3 compute_hash_table [35]
0.00 0.07 1/1 alloc_pre_mem [288]
0.00 0.00 1/1 remove_fake_edges [957]
0.00 0.00 1/4728 remove_fake_successors [929]
0.00 0.00 1/1 alloc_expr_hash_table [1590]
0.00 0.00 1/1 add_noreturn_fake_exit_edges [1589]
0.00 0.00 1/1 compute_expr_hash_table [1594]
0.00 0.00 1/1 free_edge_list [1603]
0.00 0.00 1/1 free_pre_mem [1605]
0.00 0.00 1/1 free_expr_hash_table [1604]
-----------------------------------------------
0.00 105.12 1/1 one_pre_gcse_pass [7]
[8] 34.3 0.00 105.12 1 compute_pre_data [8]
0.00 64.15 1/1 pre_edge_lcm [13]
9.05 30.58 1/1 compute_ae_kill [14]
0.02 1.31 1/3 compute_local_properties [39]
0.00 0.00 1/1 compute_transpout [810]
0.00 0.00 1/36 sbitmap_vector_zero [591]
-----------------------------------------------
...
-----------------------------------------------
0.00 64.15 1/1 compute_pre_data [8]
[13] 20.9 0.00 64.15 1 pre_edge_lcm [13]
0.01 34.89 1/1 compute_antinout_edge [15]
0.14 22.10 1/1 compute_laterin [25]
0.01 6.58 1/13 compute_available [10]
0.00 0.20 1/1 compute_earliest [176]
0.00 0.12 1/1 compute_insert_delete [227]
0.10 0.00 9/42 sbitmap_vector_alloc [110]
0.00 0.00 1/1 create_edge_list [847]
-----------------------------------------------
9.05 30.58 1/1 compute_pre_data [8]
[14] 12.9 9.05 30.58 1 compute_ae_kill [14]
30.58 0.00 60619050/60619050 expr_killed_p [17]
-----------------------------------------------
0.01 34.89 1/1 pre_edge_lcm [13]
[15] 11.4 0.01 34.89 1 compute_antinout_edge [15]
34.71 0.00 11204/11204 sbitmap_intersection_of_succs [16]
0.18 0.00 11205/11205 sbitmap_a_or_b_and_c [186]
0.00 0.00 1/20 sbitmap_vector_ones [587]
0.00 0.00 1/182778 sbitmap_zero [733]
-----------------------------------------------
34.71 0.00 11204/11204 compute_antinout_edge [15]
[16] 11.3 34.71 0.00 11204 sbitmap_intersection_of_succs [16]
0.00 0.00 11204/108038 sbitmap_copy [549]
-----------------------------------------------
37606148 expr_killed_p [17]
30.58 0.00 60619050/60619050 compute_ae_kill [14]
[17] 10.0 30.58 0.00 60619050+37606148 expr_killed_p [17]
37606148 expr_killed_p [17]
gcse also spends a fair amount of time in delete_null_pointer_checks:
-----------------------------------------------
0.00 66.52 6/6 rest_of_compilation [5]
[11] 21.7 0.00 66.52 6 delete_null_pointer_checks [11]
0.10 66.33 10/10 delete_null_pointer_checks_1 [12]
0.09 0.00 8/42 sbitmap_vector_alloc [110]
0.00 0.00 2563/2670 canonicalize_condition [721]
0.00 0.00 8905/421847 condjump_p [381]
0.00 0.00 8903/850373 simplejump_p [279]
0.00 0.00 2563/2670 get_condition [988]
0.00 0.00 2/185600 max_reg_num [440]
0.00 0.00 2/2 get_bitmap_width [1583]
-----------------------------------------------
0.10 66.33 10/10 delete_null_pointer_checks [11]
[12] 21.7 0.10 66.33 10 delete_null_pointer_checks_1 [12]
0.10 65.78 10/13 compute_available [10]
0.05 0.36 461510/2655811 note_stores [44]
0.03 0.00 461510/3538732 single_set [163]
0.00 0.00 20/36 sbitmap_vector_zero [591]
0.00 0.00 99/2670 canonicalize_condition [721]
0.00 0.00 99/2670 get_condition [988]
-----------------------------------------------
but (as usual) I have no idea what's going on there.
Brad