This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

optimization/2001: Inordinately long compile times in reload CSE regs



>Number:         2001
>Category:       optimization
>Synopsis:       Inordinately long compile times in reload CSE regs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 15 10:56:01 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     B. Lucier
>Release:        gcc-3_0-branch 20010212
>Organization:
>Environment:
alphaev6-unknown-linux-gnu
>Description:
On this file

http://www.math.purdue.edu/~lucier/pi.i.gz

compilation takes an inordinately long time with -O2:

popov-684% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.97/cc1 -fPIC -fno-math-errno -O2 -mcpu=ev6 pi.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_pi {GC 5327k -> 1566k} {GC 27247k -> 13569k} {GC 67368k -> 12960k} {GC 23978k -> 17460k} {GC 40715k -> 20602k} {GC 50610k -> 27280k} ___init_proc ____20_pi
Execution times (seconds)
 garbage collection    :   1.83 ( 1%) usr   0.00 ( 1%) sys   1.84 ( 1%) wall
 preprocessing         :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.05 ( 0%) wall
 lexical analysis      :   0.10 ( 0%) usr   0.03 ( 5%) sys   0.13 ( 0%) wall
 parser                :   0.26 ( 0%) usr   0.02 ( 3%) sys   0.28 ( 0%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 integration           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 jump                  :   2.17 ( 1%) usr   0.03 ( 4%) sys   2.21 ( 1%) wall
 CSE                   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 global CSE            :   3.71 ( 2%) usr   0.18 (24%) sys   3.89 ( 2%) wall
 loop analysis         :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall
 CSE 2                 :   5.14 ( 3%) usr   0.00 ( 0%) sys   5.14 ( 3%) wall
 flow analysis         :   2.31 ( 1%) usr   0.00 ( 0%) sys   2.31 ( 1%) wall
 combiner              :   1.22 ( 1%) usr   0.00 ( 0%) sys   1.23 ( 1%) wall
 if-conversion         :   0.39 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall
 regmove               :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall
 scheduling            :   4.14 ( 2%) usr   0.17 (22%) sys   4.32 ( 2%) wall
 local alloc           :   1.23 ( 1%) usr   0.00 ( 0%) sys   1.23 ( 1%) wall
 global alloc          :   3.48 ( 2%) usr   0.04 ( 6%) sys   3.53 ( 2%) wall
 reload CSE regs       : 158.30 (81%) usr   0.10 (13%) sys 158.36 (81%) wall
 flow 2                :   3.23 ( 2%) usr   0.02 ( 3%) sys   3.25 ( 2%) wall
 if-conversion 2       :   0.07 ( 0%) usr   0.00 ( 1%) sys   0.08 ( 0%) wall
 peephole 2            :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall
 scheduling 2          :   4.01 ( 2%) usr   0.09 (12%) sys   4.10 ( 2%) wall
 reorder blocks        :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) wall
 shorten branches      :   0.26 ( 0%) usr   0.01 ( 1%) sys   0.27 ( 0%) wall
 final                 :   2.17 ( 1%) usr   0.02 ( 3%) sys   2.19 ( 1%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 rest of compilation   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall
 TOTAL                 : 195.64             0.79           196.38

And the .s file is quite large:

popov-685% wc pi.s
 108819  217242 1881754 pi.s

This doesn't happen with -O2 -fno-gcse:

popov-686% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.97/cc1 -fPIC -fno-math-errno -O2 -fno-gcse -mcpu=ev6 pi.i
 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_pi {GC 5327k -> 1566k} {GC 5633k -> 2021k} ___init_proc ____20_pi
Execution times (seconds)
 garbage collection    :   0.05 ( 2%) usr   0.00 ( 1%) sys   0.06 ( 2%) wall
 preprocessing         :   0.06 ( 2%) usr   0.01 (12%) sys   0.08 ( 2%) wall
 lexical analysis      :   0.08 ( 3%) usr   0.03 (24%) sys   0.12 ( 4%) wall
 parser                :   0.25 ( 8%) usr   0.03 (20%) sys   0.28 ( 8%) wall
 varconst              :   0.01 ( 0%) usr   0.00 ( 1%) sys   0.01 ( 0%) wall
 integration           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 jump                  :   0.66 (21%) usr   0.03 (24%) sys   0.70 (21%) wall
 CSE                   :   0.10 ( 3%) usr   0.00 ( 1%) sys   0.10 ( 3%) wall
 loop analysis         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 CSE 2                 :   0.08 ( 3%) usr   0.00 ( 0%) sys   0.08 ( 3%) wall
 flow analysis         :   0.42 (13%) usr   0.00 ( 0%) sys   0.42 (13%) wall
 combiner              :   0.09 ( 3%) usr   0.00 ( 2%) sys   0.09 ( 3%) wall
 if-conversion         :   0.15 ( 5%) usr   0.00 ( 0%) sys   0.15 ( 4%) wall
 regmove               :   0.03 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 1%) wall
 scheduling            :   0.13 ( 4%) usr   0.00 ( 2%) sys   0.13 ( 4%) wall
 local alloc           :   0.08 ( 3%) usr   0.00 ( 0%) sys   0.08 ( 3%) wall
 global alloc          :   0.10 ( 3%) usr   0.00 ( 3%) sys   0.11 ( 3%) wall
 reload CSE regs       :   0.17 ( 5%) usr   0.00 ( 4%) sys   0.17 ( 5%) wall
 flow 2                :   0.31 (10%) usr   0.00 ( 0%) sys   0.31 ( 9%) wall
 if-conversion 2       :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 peephole 2            :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 scheduling 2          :   0.10 ( 3%) usr   0.00 ( 3%) sys   0.11 ( 3%) wall
 reorder blocks        :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 1%) wall
 shorten branches      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 final                 :   0.07 ( 2%) usr   0.00 ( 1%) sys   0.07 ( 2%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 rest of compilation   :   0.05 ( 2%) usr   0.00 ( 0%) sys   0.05 ( 2%) wall
 TOTAL                 :   3.20             0.15             3.35
popov-687% wc pi.s
   6854   13327  100879 pi.s

A profiled version of cc1 shows the following top routines:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 47.94     73.50    73.50    80710     0.91     1.36  htab_traverse
  9.11     87.47    13.97 117220084     0.00     0.00  canon_rtx
  5.00     95.14     7.67 19006280     0.00     0.00  find_base_term
  2.90     99.58     4.44 25786729     0.00     0.00  rtx_equal_for_memref_p
  2.81    103.89     4.31  6847873     0.00     0.00  exp_equiv_p
  1.49    106.17     2.28 13642612     0.00     0.00  cselib_invalidate_mem_1
  1.41    108.34     2.17 18915614     0.00     0.00  get_addr
  1.30    110.34     2.00  5934642     0.00     0.01  cselib_mem_conflict_p
  1.21    112.20     1.86  6448431     0.00     0.00  memrefs_conflict_p

With the following entries in the detailed report

-----------------------------------------------
                0.02  109.36   80692/80692       cselib_invalidate_rtx [9]
[11]    71.3    0.02  109.36   80692         cselib_invalidate_mem [11]
               73.49   35.87   80692/80710       htab_traverse [10]
-----------------------------------------------
                0.01    2.32       1/28          mark_constant_function [59]
                0.03    6.97       3/28          update_equiv_regs [30]
                0.07   13.95       6/28          cse_main [18]
                0.07   13.95       6/28          life_analysis [20]
                0.07   13.95       6/28          reload_cse_regs_1 [17]
                0.07   13.95       6/28          sched_init [23]
[12]    42.7    0.31   65.08      28         init_alias_analysis [12]
                0.20   64.54 2030264/3481856     note_stores [8]
                0.04    0.26  849476/849476      prologue_epilogue_contains [148]
                0.03    0.00  642902/3044992     find_reg_note [212]
                0.01    0.00   16946/21832       rtx_varies_p [553]
                0.00    0.00   12708/324874      reg_overlap_mentioned_p [401]
                0.00    0.00     780/217683      gen_raw_REG [194]
                0.00    0.00    1040/114328      gen_rtx_fmt_e [296]
                0.00    0.00    3723/7631907     rtx_equal_p [114]
                0.00    0.00     780/215521      gen_rtx_REG [315]
                0.00    0.00      96/7968        plus_constant_wide [458]
                0.00    0.00      84/545563      xcalloc [359]
                0.00    0.00      56/387386      xmalloc [407]
                0.00    0.00      65/254720      get_insns [491]
                0.00    0.00      28/78          ggc_add_root [1132]
                0.00    0.00      28/79188       max_reg_num [650]
                0.00    0.00      44/203         single_set_2 [1232]
                0.00    0.00      96/307         plus_constant_for_output_wide [1401]
                0.00    0.00      28/49          ggc_add_rtx_root [1484]
-----------------------------------------------
                2.28   33.43 13642612/13642612     htab_traverse [10]
[13]    23.3    2.28   33.43 13642612         cselib_invalidate_mem_1 [13]
                2.00   31.42 5934642/5934642     cselib_mem_conflict_p [15]
                0.01    0.00    5796/1253896     cselib_lookup <cycle 16> [102]
                0.00    0.00    5796/105304      unchain_one_elt_loc_list [450]
                0.00    0.00    5796/105304      unchain_one_elt_list [471]
-----------------------------------------------
                0.00    0.00     835/6520637     mark_used_regs [111]
                0.00    0.01    1003/6520637     invalidate_mems_from_set [291]
                0.10    2.98  584157/6520637     sched_analyze_1 [47]
                1.04   30.32 5934642/6520637     cselib_mem_conflict_p [15]
[14]    22.5    1.14   33.31 6520637         write_dependence_p [14]
                1.86   18.88 6442497/6448431     memrefs_conflict_p [19]
                0.53    5.72 6517815/6527929     base_alias_check [37]
                2.40    0.00 5940308/19006280     find_base_term [34]
                1.54    0.00 12884994/117220084     canon_rtx [24]
                1.49    0.00 13035630/18915614     get_addr [61]
                0.39    0.50 6520637/6533770     mems_in_disjoint_alias_sets_p [89]
                0.00    0.00   12118/15918       fixed_scalar_and_varying_struct_p [594]
-----------------------------------------------
                             5928846             cselib_mem_conflict_p [15]
                2.00   31.42 5934642/5934642     cselib_invalidate_mem_1 [13]
[15]    21.8    2.00   31.42 5934642+5928846 cselib_mem_conflict_p [15]
                1.04   30.32 5934642/6520637     write_dependence_p [14]
                0.07    0.00 5934642/5942137     anti_dependence [295]
                             5928846             cselib_mem_conflict_p [15]
-----------------------------------------------
                0.00   26.94       6/6           rest_of_compilation [7]
[16]    17.6    0.00   26.94       6         schedule_insns [16]
                0.12   14.24       6/6           sched_init [23]
                0.00    9.17    1292/1292        schedule_region [29]
                0.00    3.27    1292/1299        update_life_info [54]
                0.00    0.05       6/6           init_regions [324]
                0.04    0.01    1292/2588        count_or_remove_death_notes [253]
                0.04    0.00       6/45          compute_bb_for_insn [147]
                0.00    0.00       6/12          allocate_reg_life_data [790]
                0.00    0.00    1292/94510       sbitmap_zero [649]
                0.00    0.00       3/3           reposition_prologue_and_epilogue_notes [1067]
                0.00    0.00       6/753         get_max_uid [848]
                0.00    0.00      12/290         sbitmap_alloc [1077]
                0.00    0.00       6/80279       sbitmap_ones [614]
                0.00    0.00       3/254720      get_insns [491]
                0.00    0.00       6/6           sched_finish [1651]
-----------------------------------------------
                0.04   12.62       3/6           rest_of_compilation [7]
                0.04   12.62       3/6           reload_cse_regs [21]
[17]    16.5    0.08   25.24       6         reload_cse_regs_1 [17]
                0.07   13.95       6/28          init_alias_analysis [12]
                0.10    7.77  304474/304474      cselib_process_insn [32]
                0.18    3.16  212310/212310      reload_cse_simplify [53]
                0.01    0.00     672/678         clear_table [574]
                0.00    0.00       6/6           cselib_finish [1028]
                0.00    0.00       6/6           cselib_init [1156]
                0.00    0.00       8/8           reload_cse_delete_noop_set [1628]
                0.00    0.00       6/28          end_alias_analysis [1518]
-----------------------------------------------
                0.02   20.92       6/6           rest_of_compilation [7]
[18]    13.7    0.02   20.92       6         cse_main [18]
                0.07   13.95       6/28          init_alias_analysis [12]
                0.03    6.49     731/731         cse_basic_block [35]
                0.00    0.34     277/821         ggc_collect [83]
                0.03    0.00     733/733         cse_end_of_basic_block [363]
                0.01    0.00       6/6           ggc_pop_context [563]
                0.00    0.00      12/753         get_max_uid [848]
                0.00    0.00       6/114328      gen_rtx_fmt_e [296]
                0.00    0.00       6/545563      xcalloc [359]
                0.00    0.00       6/387386      xmalloc [407]
                0.00    0.00       6/22          init_recog [1539]
                0.00    0.00       6/6           ggc_push_context [1636]
                0.00    0.00       6/28          end_alias_analysis [1518]
-----------------------------------------------
                0.00    0.02    5934/6448431     true_dependence [325]
                1.86   18.88 6442497/6448431     write_dependence_p [14]
[19]    13.5    1.86   18.90 6448431         memrefs_conflict_p [19]
                4.44    9.36 25786729/25786729     rtx_equal_for_memref_p [25]
                3.07    0.00 25791747/117220084     canon_rtx [24]
                1.35    0.00 25791664/25791664     addr_side_effect_eval [73]
                0.67    0.00 5859756/18915614     get_addr [61]
-----------------------------------------------
                0.00   17.15       6/6           rest_of_compilation [7]
[20]    11.2    0.00   17.15       6         life_analysis [20]
                0.07   13.95       6/28          init_alias_analysis [12]
                0.00    3.06       3/3           notice_stack_pointer_modification [56]
                0.01    0.04       6/6           delete_noop_moves [305]
                0.00    0.02       6/1299        update_life_info [54]
                0.00    0.00       6/43          free_basic_block_vars [462]
                0.00    0.00       6/12          allocate_reg_life_data [790]
                0.00    0.00       6/6           allocate_bb_life_data [987]
                0.00    0.00       6/6           mark_regs_live_at_end [1098]
                0.00    0.00       6/28          end_alias_analysis [1518]
-----------------------------------------------
                0.00   16.09       3/3           rest_of_compilation [7]
[21]    10.5    0.00   16.09       3         reload_cse_regs [21]
                0.04   12.62       3/6           reload_cse_regs_1 [17]
                0.05    3.38       3/3           reload_cse_move2add [52]
                0.00    0.00       3/3           reload_combine [1705]
-----------------------------------------------

Since compile-time performance is a release criterion for
gcc 3.0, I consider this serious.
>How-To-Repeat:

>Fix:
The root problem is that gcse kills all pseudos in every
basic block that is the target of a computed goto.  There
are many such blocks in this code, so many pseudos are
reloaded at the beginning of each such block; implementing Ruething's
variant of LCM that can handle abnormal edges would likely fix this.

But until that is done, reload should handle this problem
more gracefully.  I don't know what the fix is.
>Release-Note:
>Audit-Trail:
>Unformatted:


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]