This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha


To follow up on my comments that gcc-2.95 is much slower than egcs-1.1.2
on the alpha-ev6 for some files, here are some timing data for a profiled
cc1 on alphaev6-unknown-linux-gnu with glibc-2.1.1, binutils-2.9.5.0.4,
and kernel 2.2.10.

Because the various timings on this machine are screwed up, I don't know
how to interpret the information precisely.  But it can give you a
an idea of the relative times that various parts of the process took.

gcc was called with

gcc -fPIC -save-temps -O1

on eight relatively large files (basically, each of these files
contains a 25,000+ line procedure to be compiled, with a total
of 18 local variables and one argument).

The output from cc1 on the first of these files (which is typical) is:

 __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 21.472976
time in integration: 0.000000
time in jump: 1.503040
time in cse: 0.000000
time in gcse: 0.000000
time in loop: 0.000000
time in cse2: 0.000000
time in branch-prob: 0.000000
time in flow: 5.866736
time in combine: 0.000000
time in regmove: 0.000000
time in sched: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 44.732032
time in flow2: 0.000000
time in sched2: 0.000000
time in shorten-branch: 1.636752
time in stack-reg: 0.000000
time in final: 7.841184
time in varconst: 0.031232
time in symout: 0.000000
time in dump: 0.000000

If I read this correctly, it's spending a lot of time in reload
(global_alloc isn't in the call graph, and reload is; in toplev.c, you
see that one or the other is called, but the time is reported above as
global_alloc either way).  Perhaps there's a problem in reload.

The gprof output file can be found at

http://www.math.purdue.edu/~lucier/gmon.summary.gz

The summary information from gprof for cc1 begins:

Flat profile:

Each sample counts as 0.000976562 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  7.74      2.74     2.74    90228     0.03     0.06  order_regs_for_reload
  6.94      5.21     2.46   358919     0.01     0.01  find_reloads
  6.82      7.62     2.42        8   302.49  4312.47  yyparse
  6.51      9.93     2.31 42370698     0.00     0.00  bitmap_bit_p
  3.64     11.22     1.29  2455661     0.00     0.00  yylex
  2.60     12.15     0.92   302171     0.00     0.00  record_reg_classes
  2.50     13.03     0.89                             hard_reg_use_compare
  2.32     13.85     0.82       24    34.26    58.61  stupid_life_analysis
  1.89     14.52     0.67   512830     0.00     0.00  for_each_rtx
  1.65     15.11     0.59  8208660     0.00     0.00  count_pseudo

Some selected information from the call graph about reload, which seems
to take a long time:

-----------------------------------------------
                2.74    2.27   90228/90228       find_reload_regs [7]
[8]     14.1    2.74    2.27   90228         order_regs_for_reload [8]
                0.59    0.91 8208660/8208660     count_pseudo [23]
                0.59    0.00 10827360/42370698     bitmap_bit_p [13]
                0.18    0.00 5503908/5503956     bitmap_clear [93]
                0.00    0.00   90228/704750      bitmap_initialize [259]
-----------------------------------------------
                0.17    4.56      16/16          reload [6]
[9]     13.3    0.17    4.56      16         reload_as_needed [9]
                0.26    1.83   90228/90228       emit_reload_insns [15]
                0.38    0.95   90228/90228       choose_reload_regs [30]
                0.70    0.29  102504/358919      find_reloads [11]
                0.04    0.04  256335/864560      note_stores [74]
                0.02    0.00   90228/90228       subst_reloads [247]
                0.00    0.02   12276/268691      eliminate_regs_in_insn [52]
                0.01    0.00   50798/84370       set_offsets_for_label [312]
                0.01    0.00   90228/2814858     asm_noperands [96]
                0.00    0.00   12276/268691      update_eliminable_offsets [216]
                0.00    0.00      16/40          set_initial_elim_offsets [475]
-----------------------------------------------
                0.21    3.32      24/24          reload [6]
[10]    10.0    0.21    3.32      24         calculate_needs_all_insns [10]
                1.76    0.74  256415/358919      find_reloads [11]
                0.06    0.42  256415/268691      eliminate_regs_in_insn [52]
                0.21    0.00   90228/90228       calculate_needs [83]
                0.07    0.01  122572/122572      set_label_offsets [146]
                0.03    0.00  256415/268691      update_eliminable_offsets [216]
                0.02    0.00  256415/1897960     single_set [109]
-----------------------------------------------
                0.70    0.29  102504/358919      reload_as_needed [9]
                1.76    0.74  256415/358919      calculate_needs_all_insns [10]
[11]     9.8    2.46    1.03  358919         find_reloads [11]
                0.19    0.14  219242/243152      push_reload [57]
                0.09    0.14  358911/1750922     extract_insn [33]
                0.13    0.00 1993307/3091999     reg_fits_class_p [88]
                0.08    0.02  310922/310922      combine_reloads [123]
                0.03    0.08   92668/92668       find_reloads_address [125]
                0.08    0.00 1692578/1910746     reg_class_subset_p [136]
                0.03    0.00  358919/1897960     single_set [109]
                0.02    0.00  237120/237388      reg_alternate_class [281]
                0.01    0.00  237120/355215      reg_preferred_class [311]
                0.01    0.00   97392/304810      normal_memory_operand [268]
                0.00    0.00     676/1014        zap_mask [724]
-----------------------------------------------
[12]     6.9    0.92    1.51  218511+1228287 <cycle 3 as a whole> [12]
                0.32    0.42  267378+208665      expand_expr <cycle 3> [40]
                0.10    0.27  246016             gen_movdi <cycle 3> [58]
                0.09    0.14   56286             expand_binop <cycle 3> [80]
                0.06    0.10  246688             emit_move_insn_1 <cycle 3> [99]
                0.02    0.13   25803             do_jump_for_compare <cycle 3> [106]
                0.08    0.05   65220             store_expr <cycle 3> [113]
                0.04    0.07  125120             emit_move_insn <cycle 3> [120]
                0.03    0.06   63104             expand_assignment <cycle 3> [130]
                0.03    0.04   73195             memory_address <cycle 3> [148]
                0.03    0.05   25811+2040        emit_cmp_insn <cycle 3> [149]
                0.03    0.02   25793+2040        do_jump <cycle 3> [181]
                0.01    0.03   25811             alpha_emit_conditional_branch <cycle 3> [188]
                0.01    0.02   29226             copy_to_mode_reg <cycle 3> [209]
                0.01    0.02   28480             force_reg <cycle 3> [234]
                0.01    0.01   32680             change_address <cycle 3> [240]
                0.00    0.02   10850             gen_ble <cycle 3> [246]
                0.01    0.01   25803             compare_from_rtx <cycle 3> [264]
                0.00    0.01    5233             gen_bgt <cycle 3> [300]
                0.01    0.00   23795             compare <cycle 3> [305]
                0.00    0.01    4997             gen_bne <cycle 3> [329]
                0.01    0.00   10500+37504       force_operand <cycle 3> [349]
                0.01    0.00    9808             expand_shift <cycle 3> [366]
                0.00    0.00    2008             gen_bgtu <cycle 3> [392]
                0.00    0.00    2072             emit_unop_insn <cycle 3> [394]
                0.00    0.00    1973             gen_beq <cycle 3> [407]
                0.00    0.00    4088             convert_modes <cycle 3> [409]
                0.00    0.00    2080             convert_move <cycle 3> [426]
                0.00    0.00     698             gen_blt <cycle 3> [431]
                0.00    0.00     140             expand_divmod <cycle 3> [456]
                0.00    0.00      32             expand_call <cycle 3> [459]
                0.00    0.00     662             expand_mult <cycle 3> [525]
                0.00    0.00      52             gen_bge <cycle 3> [571]
                0.00    0.00      88             copy_to_reg <cycle 3> [584]
                0.00    0.00      16             emit_libcall_block <cycle 3> [595]
                0.00    0.00      32             load_register_parameters <cycle 3> [623]
                0.00    0.00      32             precompute_arguments <cycle 3> [628]
                0.00    0.00      32             precompute_register_parameters <cycle 3> [639]
                0.00    0.00    1058             jumpifnot <cycle 3> [721]
-----------------------------------------------
                0.45    0.00 8208660/42370698     count_pseudo [23]
                0.59    0.00 10827360/42370698     order_regs_for_reload [8]
                0.63    0.00 11549184/42370698     choose_reload_regs [30]
                0.64    0.00 11785494/42370698     finish_spills [25]
[13]     6.5    2.31    0.00 42370698         bitmap_bit_p [13]
-----------------------------------------------
                0.21    2.03      24/24          rest_of_compilation [5]
[14]     6.3    0.21    2.03      24         final [14]
                0.22    1.81  497630/497630      final_scan_insn [17]
                0.00    0.00      24/5328        oballoc [485]
                0.00    0.00      24/48          check_exception_handler_labels [816]
                0.00    0.00      24/24          init_insn_eh_region [861]
                0.00    0.00      24/104         init_recog [804]
                0.00    0.00      24/24          free_insn_eh_region [856]
-----------------------------------------------
                0.26    1.83   90228/90228       reload_as_needed [9]
[15]     5.9    0.26    1.83   90228         emit_reload_insns [15]
                0.05    1.49  121568/121568      gen_reload [22]
                0.11    0.02 2218446/2218446     emit_insns_before [114]
                0.04    0.00  227736/519956      rtx_equal_p [128]
                0.01    0.02   59574/59574       reg_set_p [241]
                0.02    0.00  102417/102417      reload_reg_reaches_end_p [250]
                0.01    0.01  102409/102409      push_to_sequence [265]
                0.02    0.00   35821/35957       reg_mentioned_p [270]
                0.01    0.01   35821/864560      note_stores [74]
                0.01    0.00  121568/541814      end_sequence [195]
                0.01    0.00   71642/1897960     single_set [109]
                0.00    0.00   85747/363523      get_last_insn [271]
                0.00    0.00  157389/279445      get_insns [371]
                0.00    0.00   19159/541814      start_sequence [156]
                0.00    0.00   35829/422200      find_reg_note [238]
                0.00    0.00   19159/19311       emit_insns [482]
                0.00    0.00    1262/1262        delete_output_reload [539]
-----------------------------------------------
[16]     5.8    0.39    1.66  417809+698063  <cycle 7 as a whole> [16]
                0.20    1.16  317912             build_binary_op <cycle 7> [29]
                0.15    0.00  707786             default_conversion <cycle 7> [101]
                0.01    0.06   25861             build_unary_op <cycle 7> [155]
                0.01    0.00   35813             truthvalue_conversion <cycle 7> [310]
-----------------------------------------------

Brad Lucier    lucier@math.purdue.edu

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]