This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
- To: gcc@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu
- Subject: Re: big slowdown in egcs-1.1.2->gcc-2.95 on alpha
- From: Brad Lucier <lucier@math.purdue.edu>
- Date: Fri, 6 Aug 1999 15:16:21 -0500 (EST)
- Cc: staff@math.purdue.edu, hosking@cs.purdue.edu, wilker@math.purdue.edu
To follow up on my comments that gcc-2.95 is much slower than egcs-1.1.2
on the alpha-ev6 for some files, here are some timing data for a profiled
cc1 on alphaev6-unknown-linux-gnu with glibc-2.1.1, binutils-2.9.5.0.4,
and kernel 2.2.10.
Because the various timings on this machine are screwed up, I don't know
how to interpret the information precisely. But it can give you a
an idea of the relative times that various parts of the process took.
gcc was called with
gcc -fPIC -save-temps -O1
on eight relatively large files (basically, each of these files
contains a 25,000+ line procedure to be compiled, with a total
of 18 local variables and one argument).
The output from cc1 on the first of these files (which is typical) is:
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_g0_2d_1 ___init_proc ____20_g0_2d_1
time in parse: 21.472976
time in integration: 0.000000
time in jump: 1.503040
time in cse: 0.000000
time in gcse: 0.000000
time in loop: 0.000000
time in cse2: 0.000000
time in branch-prob: 0.000000
time in flow: 5.866736
time in combine: 0.000000
time in regmove: 0.000000
time in sched: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 44.732032
time in flow2: 0.000000
time in sched2: 0.000000
time in shorten-branch: 1.636752
time in stack-reg: 0.000000
time in final: 7.841184
time in varconst: 0.031232
time in symout: 0.000000
time in dump: 0.000000
If I read this correctly, it's spending a lot of time in reload
(global_alloc isn't in the call graph, and reload is; in toplev.c, you
see that one or the other is called, but the time is reported above as
global_alloc either way). Perhaps there's a problem in reload.
The gprof output file can be found at
http://www.math.purdue.edu/~lucier/gmon.summary.gz
The summary information from gprof for cc1 begins:
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
7.74 2.74 2.74 90228 0.03 0.06 order_regs_for_reload
6.94 5.21 2.46 358919 0.01 0.01 find_reloads
6.82 7.62 2.42 8 302.49 4312.47 yyparse
6.51 9.93 2.31 42370698 0.00 0.00 bitmap_bit_p
3.64 11.22 1.29 2455661 0.00 0.00 yylex
2.60 12.15 0.92 302171 0.00 0.00 record_reg_classes
2.50 13.03 0.89 hard_reg_use_compare
2.32 13.85 0.82 24 34.26 58.61 stupid_life_analysis
1.89 14.52 0.67 512830 0.00 0.00 for_each_rtx
1.65 15.11 0.59 8208660 0.00 0.00 count_pseudo
Some selected information from the call graph about reload, which seems
to take a long time:
-----------------------------------------------
2.74 2.27 90228/90228 find_reload_regs [7]
[8] 14.1 2.74 2.27 90228 order_regs_for_reload [8]
0.59 0.91 8208660/8208660 count_pseudo [23]
0.59 0.00 10827360/42370698 bitmap_bit_p [13]
0.18 0.00 5503908/5503956 bitmap_clear [93]
0.00 0.00 90228/704750 bitmap_initialize [259]
-----------------------------------------------
0.17 4.56 16/16 reload [6]
[9] 13.3 0.17 4.56 16 reload_as_needed [9]
0.26 1.83 90228/90228 emit_reload_insns [15]
0.38 0.95 90228/90228 choose_reload_regs [30]
0.70 0.29 102504/358919 find_reloads [11]
0.04 0.04 256335/864560 note_stores [74]
0.02 0.00 90228/90228 subst_reloads [247]
0.00 0.02 12276/268691 eliminate_regs_in_insn [52]
0.01 0.00 50798/84370 set_offsets_for_label [312]
0.01 0.00 90228/2814858 asm_noperands [96]
0.00 0.00 12276/268691 update_eliminable_offsets [216]
0.00 0.00 16/40 set_initial_elim_offsets [475]
-----------------------------------------------
0.21 3.32 24/24 reload [6]
[10] 10.0 0.21 3.32 24 calculate_needs_all_insns [10]
1.76 0.74 256415/358919 find_reloads [11]
0.06 0.42 256415/268691 eliminate_regs_in_insn [52]
0.21 0.00 90228/90228 calculate_needs [83]
0.07 0.01 122572/122572 set_label_offsets [146]
0.03 0.00 256415/268691 update_eliminable_offsets [216]
0.02 0.00 256415/1897960 single_set [109]
-----------------------------------------------
0.70 0.29 102504/358919 reload_as_needed [9]
1.76 0.74 256415/358919 calculate_needs_all_insns [10]
[11] 9.8 2.46 1.03 358919 find_reloads [11]
0.19 0.14 219242/243152 push_reload [57]
0.09 0.14 358911/1750922 extract_insn [33]
0.13 0.00 1993307/3091999 reg_fits_class_p [88]
0.08 0.02 310922/310922 combine_reloads [123]
0.03 0.08 92668/92668 find_reloads_address [125]
0.08 0.00 1692578/1910746 reg_class_subset_p [136]
0.03 0.00 358919/1897960 single_set [109]
0.02 0.00 237120/237388 reg_alternate_class [281]
0.01 0.00 237120/355215 reg_preferred_class [311]
0.01 0.00 97392/304810 normal_memory_operand [268]
0.00 0.00 676/1014 zap_mask [724]
-----------------------------------------------
[12] 6.9 0.92 1.51 218511+1228287 <cycle 3 as a whole> [12]
0.32 0.42 267378+208665 expand_expr <cycle 3> [40]
0.10 0.27 246016 gen_movdi <cycle 3> [58]
0.09 0.14 56286 expand_binop <cycle 3> [80]
0.06 0.10 246688 emit_move_insn_1 <cycle 3> [99]
0.02 0.13 25803 do_jump_for_compare <cycle 3> [106]
0.08 0.05 65220 store_expr <cycle 3> [113]
0.04 0.07 125120 emit_move_insn <cycle 3> [120]
0.03 0.06 63104 expand_assignment <cycle 3> [130]
0.03 0.04 73195 memory_address <cycle 3> [148]
0.03 0.05 25811+2040 emit_cmp_insn <cycle 3> [149]
0.03 0.02 25793+2040 do_jump <cycle 3> [181]
0.01 0.03 25811 alpha_emit_conditional_branch <cycle 3> [188]
0.01 0.02 29226 copy_to_mode_reg <cycle 3> [209]
0.01 0.02 28480 force_reg <cycle 3> [234]
0.01 0.01 32680 change_address <cycle 3> [240]
0.00 0.02 10850 gen_ble <cycle 3> [246]
0.01 0.01 25803 compare_from_rtx <cycle 3> [264]
0.00 0.01 5233 gen_bgt <cycle 3> [300]
0.01 0.00 23795 compare <cycle 3> [305]
0.00 0.01 4997 gen_bne <cycle 3> [329]
0.01 0.00 10500+37504 force_operand <cycle 3> [349]
0.01 0.00 9808 expand_shift <cycle 3> [366]
0.00 0.00 2008 gen_bgtu <cycle 3> [392]
0.00 0.00 2072 emit_unop_insn <cycle 3> [394]
0.00 0.00 1973 gen_beq <cycle 3> [407]
0.00 0.00 4088 convert_modes <cycle 3> [409]
0.00 0.00 2080 convert_move <cycle 3> [426]
0.00 0.00 698 gen_blt <cycle 3> [431]
0.00 0.00 140 expand_divmod <cycle 3> [456]
0.00 0.00 32 expand_call <cycle 3> [459]
0.00 0.00 662 expand_mult <cycle 3> [525]
0.00 0.00 52 gen_bge <cycle 3> [571]
0.00 0.00 88 copy_to_reg <cycle 3> [584]
0.00 0.00 16 emit_libcall_block <cycle 3> [595]
0.00 0.00 32 load_register_parameters <cycle 3> [623]
0.00 0.00 32 precompute_arguments <cycle 3> [628]
0.00 0.00 32 precompute_register_parameters <cycle 3> [639]
0.00 0.00 1058 jumpifnot <cycle 3> [721]
-----------------------------------------------
0.45 0.00 8208660/42370698 count_pseudo [23]
0.59 0.00 10827360/42370698 order_regs_for_reload [8]
0.63 0.00 11549184/42370698 choose_reload_regs [30]
0.64 0.00 11785494/42370698 finish_spills [25]
[13] 6.5 2.31 0.00 42370698 bitmap_bit_p [13]
-----------------------------------------------
0.21 2.03 24/24 rest_of_compilation [5]
[14] 6.3 0.21 2.03 24 final [14]
0.22 1.81 497630/497630 final_scan_insn [17]
0.00 0.00 24/5328 oballoc [485]
0.00 0.00 24/48 check_exception_handler_labels [816]
0.00 0.00 24/24 init_insn_eh_region [861]
0.00 0.00 24/104 init_recog [804]
0.00 0.00 24/24 free_insn_eh_region [856]
-----------------------------------------------
0.26 1.83 90228/90228 reload_as_needed [9]
[15] 5.9 0.26 1.83 90228 emit_reload_insns [15]
0.05 1.49 121568/121568 gen_reload [22]
0.11 0.02 2218446/2218446 emit_insns_before [114]
0.04 0.00 227736/519956 rtx_equal_p [128]
0.01 0.02 59574/59574 reg_set_p [241]
0.02 0.00 102417/102417 reload_reg_reaches_end_p [250]
0.01 0.01 102409/102409 push_to_sequence [265]
0.02 0.00 35821/35957 reg_mentioned_p [270]
0.01 0.01 35821/864560 note_stores [74]
0.01 0.00 121568/541814 end_sequence [195]
0.01 0.00 71642/1897960 single_set [109]
0.00 0.00 85747/363523 get_last_insn [271]
0.00 0.00 157389/279445 get_insns [371]
0.00 0.00 19159/541814 start_sequence [156]
0.00 0.00 35829/422200 find_reg_note [238]
0.00 0.00 19159/19311 emit_insns [482]
0.00 0.00 1262/1262 delete_output_reload [539]
-----------------------------------------------
[16] 5.8 0.39 1.66 417809+698063 <cycle 7 as a whole> [16]
0.20 1.16 317912 build_binary_op <cycle 7> [29]
0.15 0.00 707786 default_conversion <cycle 7> [101]
0.01 0.06 25861 build_unary_op <cycle 7> [155]
0.01 0.00 35813 truthvalue_conversion <cycle 7> [310]
-----------------------------------------------
Brad Lucier lucier@math.purdue.edu