This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
-O2 profile times
- To: gcc at gcc dot gnu dot org
- Subject: -O2 profile times
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Fri, 7 Jul 2000 10:39:03 -0500 (EST)
- Cc: lucier at math dot purdue dot edu
On this file:
http://www.math.purdue.edu/~lucier/all.i.gz
with this compiler:
Reading specs from /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/specs
gcc version 2.96 20000706 (experimental)
plus Michael Matz's patches
http://gcc.gnu.org/ml/gcc-patches/2000-06/msg00792.html
and
http://gcc.gnu.org/ml/gcc-patches/2000-06/msg00794.html
I get the following profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
28.96 93.96 93.96 48788 1.93 1.93 pre_expr_reaches_here_p_work
25.31 176.06 82.10 12878 6.38 6.51 compute_block_backward_dependences
8.44 203.43 27.37 35999163 0.00 0.00 rtx_renumbered_equal_p
5.62 221.67 18.24 htab_traverse
4.92 237.63 15.96 19147664 0.00 0.00 find_cross_jump
Michael mentioned pre_expr_reaches_here_p_work as a problem; I don't
know if he has a fix. But compute_block_backward_dependences is a simple
routine that takes up nearly all the time in the scheduler:
-----------------------------------------------
0.06 90.64 6/6 rest_of_compilation [5]
[12] 28.0 0.06 90.64 6 schedule_insns [12]
0.02 86.52 12878/12878 schedule_region [13]
0.04 1.87 12878/12886 update_life_info [50]
1.48 0.00 25756/25761 count_or_remove_death_notes [59]
0.16 0.26 6/27 init_alias_analysis [49]
0.03 0.07 6/9 split_all_insns [251]
0.08 0.00 12/46 compute_bb_for_insn [177]
0.04 0.01 12878/12878 find_insn_reg_weight [414]
0.04 0.00 2/45 sbitmap_vector_alloc [84]
0.01 0.00 6/12 allocate_reg_life_data [520]
0.00 0.00 6/6 find_single_block_region [931]
0.00 0.00 25756/394947 sbitmap_zero [582]
0.00 0.00 3/3 reposition_prologue_and_epilogue_notes [970]
0.00 0.00 2/62 sbitmap_vector_zero [512]
0.00 0.00 12/2555 sbitmap_alloc [998]
0.00 0.00 3/5679 emit_note_after [841]
0.00 0.00 6/6657 get_max_uid [988]
0.00 0.00 6/182901 sbitmap_ones [591]
0.00 0.00 3/7577 get_insns [1020]
0.00 0.00 6/27 end_alias_analysis [1477]
0.00 0.00 1/1 is_cfg_nonregular [1716]
-----------------------------------------------
0.02 86.52 12878/12878 schedule_insns [12]
[13] 26.7 0.02 86.52 12878 schedule_region [13]
82.10 1.73 12878/12878 compute_block_backward_dependences [14]
0.09 1.32 12878/12878 schedule_block [61]
0.01 1.04 12878/12878 set_priorities [79]
0.07 0.14 12878/12878 compute_block_forward_dependences [211]
0.00 0.01 12878/12878 init_deps [594]
0.00 0.00 12878/12878 free_pending_lists [787]
0.00 0.00 25756/930862 bitmap_clear [377]
0.00 0.00 25756/259531 bitmap_initialize [710]
-----------------------------------------------
82.10 1.73 12878/12878 schedule_region [13]
[14] 25.8 82.10 1.73 12878 compute_block_backward_dependences [14]
0.03 1.50 12878/12878 sched_analyze [58]
0.02 0.14 12878/12878 add_branch_dependences [249]
0.04 0.00 211435/373771 free_list [363]
0.01 0.00 211435/949058 free_INSN_LIST_list [451]
0.00 0.00 12878/265952 max_reg_num [468]
-----------------------------------------------
In all.i, the floating-point code was written in SSA style within
each block, leading to many pseudos; I presume the same will happen
with any code which has the SSA transform applied to it.
The scheduling passes make a big difference in the performance of
floating-point code on the alpha, but we're getting killed with
the compilation time of the code.
Brad Lucier