This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

-O2 profile times

To: gcc at gcc dot gnu dot org
Subject: -O2 profile times
From: Brad Lucier <lucier at math dot purdue dot edu>
Date: Fri, 7 Jul 2000 10:39:03 -0500 (EST)
Cc: lucier at math dot purdue dot edu

On this file:

http://www.math.purdue.edu/~lucier/all.i.gz

with this compiler:

Reading specs from /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/specs
gcc version 2.96 20000706 (experimental)                                        

plus Michael Matz's patches

http://gcc.gnu.org/ml/gcc-patches/2000-06/msg00792.html

and

http://gcc.gnu.org/ml/gcc-patches/2000-06/msg00794.html

I get the following profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 28.96     93.96    93.96    48788     1.93     1.93  pre_expr_reaches_here_p_work
 25.31    176.06    82.10    12878     6.38     6.51  compute_block_backward_dependences
  8.44    203.43    27.37 35999163     0.00     0.00  rtx_renumbered_equal_p
  5.62    221.67    18.24                             htab_traverse
  4.92    237.63    15.96 19147664     0.00     0.00  find_cross_jump

Michael mentioned pre_expr_reaches_here_p_work as a problem; I don't
know if he has a fix.  But compute_block_backward_dependences is a simple
routine that takes up nearly all the time in the scheduler:

-----------------------------------------------
                0.06   90.64       6/6           rest_of_compilation [5]
[12]    28.0    0.06   90.64       6         schedule_insns [12]
                0.02   86.52   12878/12878       schedule_region [13]
                0.04    1.87   12878/12886       update_life_info [50]
                1.48    0.00   25756/25761       count_or_remove_death_notes [59]
                0.16    0.26       6/27          init_alias_analysis [49]
                0.03    0.07       6/9           split_all_insns [251]
                0.08    0.00      12/46          compute_bb_for_insn [177]
                0.04    0.01   12878/12878       find_insn_reg_weight [414]
                0.04    0.00       2/45          sbitmap_vector_alloc [84]
                0.01    0.00       6/12          allocate_reg_life_data [520]
                0.00    0.00       6/6           find_single_block_region [931]
                0.00    0.00   25756/394947      sbitmap_zero [582]
                0.00    0.00       3/3           reposition_prologue_and_epilogue_notes [970]
                0.00    0.00       2/62          sbitmap_vector_zero [512]
                0.00    0.00      12/2555        sbitmap_alloc [998]
                0.00    0.00       3/5679        emit_note_after [841]
                0.00    0.00       6/6657        get_max_uid [988]
                0.00    0.00       6/182901      sbitmap_ones [591]
                0.00    0.00       3/7577        get_insns [1020]
                0.00    0.00       6/27          end_alias_analysis [1477]
                0.00    0.00       1/1           is_cfg_nonregular [1716]
-----------------------------------------------
                0.02   86.52   12878/12878       schedule_insns [12]
[13]    26.7    0.02   86.52   12878         schedule_region [13]
               82.10    1.73   12878/12878       compute_block_backward_dependences [14]
                0.09    1.32   12878/12878       schedule_block [61]
                0.01    1.04   12878/12878       set_priorities [79]
                0.07    0.14   12878/12878       compute_block_forward_dependences [211]
                0.00    0.01   12878/12878       init_deps [594]
                0.00    0.00   12878/12878       free_pending_lists [787]
                0.00    0.00   25756/930862      bitmap_clear [377]
                0.00    0.00   25756/259531      bitmap_initialize [710]
-----------------------------------------------
               82.10    1.73   12878/12878       schedule_region [13]
[14]    25.8   82.10    1.73   12878         compute_block_backward_dependences [14]
                0.03    1.50   12878/12878       sched_analyze [58]
                0.02    0.14   12878/12878       add_branch_dependences [249]
                0.04    0.00  211435/373771      free_list [363]
                0.01    0.00  211435/949058      free_INSN_LIST_list [451]
                0.00    0.00   12878/265952      max_reg_num [468]
-----------------------------------------------

In all.i, the floating-point code was written in SSA style within
each block, leading to many pseudos; I presume the same will happen
with any code which has the SSA transform applied to it.

The scheduling passes make a big difference in the performance of
floating-point code on the alpha, but we're getting killed with
the compilation time of the code.

Brad Lucier

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]