On this file http://www.math.purdue.edu/~lucier/pi.i.gz compilation takes an inordinately long time with -O2: popov-684% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.97/cc1 -fPIC -fno-math-errno -O2 -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_pi {GC 5327k -> 1566k} {GC 27247k -> 13569k} {GC 67368k -> 12960k} {GC 23978k -> 17460k} {GC 40715k -> 20602k} {GC 50610k -> 27280k} ___init_proc ____20_pi Execution times (seconds) garbage collection : 1.83 ( 1%) usr 0.00 ( 1%) sys 1.84 ( 1%) wall preprocessing : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.05 ( 0%) wall lexical analysis : 0.10 ( 0%) usr 0.03 ( 5%) sys 0.13 ( 0%) wall parser : 0.26 ( 0%) usr 0.02 ( 3%) sys 0.28 ( 0%) wall varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall jump : 2.17 ( 1%) usr 0.03 ( 4%) sys 2.21 ( 1%) wall CSE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall global CSE : 3.71 ( 2%) usr 0.18 (24%) sys 3.89 ( 2%) wall loop analysis : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall CSE 2 : 5.14 ( 3%) usr 0.00 ( 0%) sys 5.14 ( 3%) wall flow analysis : 2.31 ( 1%) usr 0.00 ( 0%) sys 2.31 ( 1%) wall combiner : 1.22 ( 1%) usr 0.00 ( 0%) sys 1.23 ( 1%) wall if-conversion : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall regmove : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall scheduling : 4.14 ( 2%) usr 0.17 (22%) sys 4.32 ( 2%) wall local alloc : 1.23 ( 1%) usr 0.00 ( 0%) sys 1.23 ( 1%) wall global alloc : 3.48 ( 2%) usr 0.04 ( 6%) sys 3.53 ( 2%) wall reload CSE regs : 158.30 (81%) usr 0.10 (13%) sys 158.36 (81%) wall flow 2 : 3.23 ( 2%) usr 0.02 ( 3%) sys 3.25 ( 2%) wall if-conversion 2 : 0.07 ( 0%) usr 0.00 ( 1%) sys 0.08 ( 0%) wall peephole 2 : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall scheduling 2 : 4.01 ( 2%) usr 0.09 (12%) sys 4.10 ( 2%) wall reorder blocks : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall shorten branches : 0.26 ( 0%) usr 0.01 ( 1%) sys 0.27 ( 0%) wall final : 2.17 ( 1%) usr 0.02 ( 3%) sys 2.19 ( 1%) wall symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rest of compilation : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall TOTAL : 195.64 0.79 196.38 And the .s file is quite large: popov-685% wc pi.s 108819 217242 1881754 pi.s This doesn't happen with -O2 -fno-gcse: popov-686% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.97/cc1 -fPIC -fno-math-errno -O2 -fno-gcse -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20_pi {GC 5327k -> 1566k} {GC 5633k -> 2021k} ___init_proc ____20_pi Execution times (seconds) garbage collection : 0.05 ( 2%) usr 0.00 ( 1%) sys 0.06 ( 2%) wall preprocessing : 0.06 ( 2%) usr 0.01 (12%) sys 0.08 ( 2%) wall lexical analysis : 0.08 ( 3%) usr 0.03 (24%) sys 0.12 ( 4%) wall parser : 0.25 ( 8%) usr 0.03 (20%) sys 0.28 ( 8%) wall varconst : 0.01 ( 0%) usr 0.00 ( 1%) sys 0.01 ( 0%) wall integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall jump : 0.66 (21%) usr 0.03 (24%) sys 0.70 (21%) wall CSE : 0.10 ( 3%) usr 0.00 ( 1%) sys 0.10 ( 3%) wall loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall CSE 2 : 0.08 ( 3%) usr 0.00 ( 0%) sys 0.08 ( 3%) wall flow analysis : 0.42 (13%) usr 0.00 ( 0%) sys 0.42 (13%) wall combiner : 0.09 ( 3%) usr 0.00 ( 2%) sys 0.09 ( 3%) wall if-conversion : 0.15 ( 5%) usr 0.00 ( 0%) sys 0.15 ( 4%) wall regmove : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.03 ( 1%) wall scheduling : 0.13 ( 4%) usr 0.00 ( 2%) sys 0.13 ( 4%) wall local alloc : 0.08 ( 3%) usr 0.00 ( 0%) sys 0.08 ( 3%) wall global alloc : 0.10 ( 3%) usr 0.00 ( 3%) sys 0.11 ( 3%) wall reload CSE regs : 0.17 ( 5%) usr 0.00 ( 4%) sys 0.17 ( 5%) wall flow 2 : 0.31 (10%) usr 0.00 ( 0%) sys 0.31 ( 9%) wall if-conversion 2 : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall peephole 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall scheduling 2 : 0.10 ( 3%) usr 0.00 ( 3%) sys 0.11 ( 3%) wall reorder blocks : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall shorten branches : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall final : 0.07 ( 2%) usr 0.00 ( 1%) sys 0.07 ( 2%) wall symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rest of compilation : 0.05 ( 2%) usr 0.00 ( 0%) sys 0.05 ( 2%) wall TOTAL : 3.20 0.15 3.35 popov-687% wc pi.s 6854 13327 100879 pi.s A profiled version of cc1 shows the following top routines: Flat profile: Each sample counts as 0.000976562 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 47.94 73.50 73.50 80710 0.91 1.36 htab_traverse 9.11 87.47 13.97 117220084 0.00 0.00 canon_rtx 5.00 95.14 7.67 19006280 0.00 0.00 find_base_term 2.90 99.58 4.44 25786729 0.00 0.00 rtx_equal_for_memref_p 2.81 103.89 4.31 6847873 0.00 0.00 exp_equiv_p 1.49 106.17 2.28 13642612 0.00 0.00 cselib_invalidate_mem_1 1.41 108.34 2.17 18915614 0.00 0.00 get_addr 1.30 110.34 2.00 5934642 0.00 0.01 cselib_mem_conflict_p 1.21 112.20 1.86 6448431 0.00 0.00 memrefs_conflict_p With the following entries in the detailed report ----------------------------------------------- 0.02 109.36 80692/80692 cselib_invalidate_rtx [9] [11] 71.3 0.02 109.36 80692 cselib_invalidate_mem [11] 73.49 35.87 80692/80710 htab_traverse [10] ----------------------------------------------- 0.01 2.32 1/28 mark_constant_function [59] 0.03 6.97 3/28 update_equiv_regs [30] 0.07 13.95 6/28 cse_main [18] 0.07 13.95 6/28 life_analysis [20] 0.07 13.95 6/28 reload_cse_regs_1 [17] 0.07 13.95 6/28 sched_init [23] [12] 42.7 0.31 65.08 28 init_alias_analysis [12] 0.20 64.54 2030264/3481856 note_stores [8] 0.04 0.26 849476/849476 prologue_epilogue_contains [148] 0.03 0.00 642902/3044992 find_reg_note [212] 0.01 0.00 16946/21832 rtx_varies_p [553] 0.00 0.00 12708/324874 reg_overlap_mentioned_p [401] 0.00 0.00 780/217683 gen_raw_REG [194] 0.00 0.00 1040/114328 gen_rtx_fmt_e [296] 0.00 0.00 3723/7631907 rtx_equal_p [114] 0.00 0.00 780/215521 gen_rtx_REG [315] 0.00 0.00 96/7968 plus_constant_wide [458] 0.00 0.00 84/545563 xcalloc [359] 0.00 0.00 56/387386 xmalloc [407] 0.00 0.00 65/254720 get_insns [491] 0.00 0.00 28/78 ggc_add_root [1132] 0.00 0.00 28/79188 max_reg_num [650] 0.00 0.00 44/203 single_set_2 [1232] 0.00 0.00 96/307 plus_constant_for_output_wide [1401] 0.00 0.00 28/49 ggc_add_rtx_root [1484] ----------------------------------------------- 2.28 33.43 13642612/13642612 htab_traverse [10] [13] 23.3 2.28 33.43 13642612 cselib_invalidate_mem_1 [13] 2.00 31.42 5934642/5934642 cselib_mem_conflict_p [15] 0.01 0.00 5796/1253896 cselib_lookup <cycle 16> [102] 0.00 0.00 5796/105304 unchain_one_elt_loc_list [450] 0.00 0.00 5796/105304 unchain_one_elt_list [471] ----------------------------------------------- 0.00 0.00 835/6520637 mark_used_regs [111] 0.00 0.01 1003/6520637 invalidate_mems_from_set [291] 0.10 2.98 584157/6520637 sched_analyze_1 [47] 1.04 30.32 5934642/6520637 cselib_mem_conflict_p [15] [14] 22.5 1.14 33.31 6520637 write_dependence_p [14] 1.86 18.88 6442497/6448431 memrefs_conflict_p [19] 0.53 5.72 6517815/6527929 base_alias_check [37] 2.40 0.00 5940308/19006280 find_base_term [34] 1.54 0.00 12884994/117220084 canon_rtx [24] 1.49 0.00 13035630/18915614 get_addr [61] 0.39 0.50 6520637/6533770 mems_in_disjoint_alias_sets_p [89] 0.00 0.00 12118/15918 fixed_scalar_and_varying_struct_p [594] ----------------------------------------------- 5928846 cselib_mem_conflict_p [15] 2.00 31.42 5934642/5934642 cselib_invalidate_mem_1 [13] [15] 21.8 2.00 31.42 5934642+5928846 cselib_mem_conflict_p [15] 1.04 30.32 5934642/6520637 write_dependence_p [14] 0.07 0.00 5934642/5942137 anti_dependence [295] 5928846 cselib_mem_conflict_p [15] ----------------------------------------------- 0.00 26.94 6/6 rest_of_compilation [7] [16] 17.6 0.00 26.94 6 schedule_insns [16] 0.12 14.24 6/6 sched_init [23] 0.00 9.17 1292/1292 schedule_region [29] 0.00 3.27 1292/1299 update_life_info [54] 0.00 0.05 6/6 init_regions [324] 0.04 0.01 1292/2588 count_or_remove_death_notes [253] 0.04 0.00 6/45 compute_bb_for_insn [147] 0.00 0.00 6/12 allocate_reg_life_data [790] 0.00 0.00 1292/94510 sbitmap_zero [649] 0.00 0.00 3/3 reposition_prologue_and_epilogue_notes [1067] 0.00 0.00 6/753 get_max_uid [848] 0.00 0.00 12/290 sbitmap_alloc [1077] 0.00 0.00 6/80279 sbitmap_ones [614] 0.00 0.00 3/254720 get_insns [491] 0.00 0.00 6/6 sched_finish [1651] ----------------------------------------------- 0.04 12.62 3/6 rest_of_compilation [7] 0.04 12.62 3/6 reload_cse_regs [21] [17] 16.5 0.08 25.24 6 reload_cse_regs_1 [17] 0.07 13.95 6/28 init_alias_analysis [12] 0.10 7.77 304474/304474 cselib_process_insn [32] 0.18 3.16 212310/212310 reload_cse_simplify [53] 0.01 0.00 672/678 clear_table [574] 0.00 0.00 6/6 cselib_finish [1028] 0.00 0.00 6/6 cselib_init [1156] 0.00 0.00 8/8 reload_cse_delete_noop_set [1628] 0.00 0.00 6/28 end_alias_analysis [1518] ----------------------------------------------- 0.02 20.92 6/6 rest_of_compilation [7] [18] 13.7 0.02 20.92 6 cse_main [18] 0.07 13.95 6/28 init_alias_analysis [12] 0.03 6.49 731/731 cse_basic_block [35] 0.00 0.34 277/821 ggc_collect [83] 0.03 0.00 733/733 cse_end_of_basic_block [363] 0.01 0.00 6/6 ggc_pop_context [563] 0.00 0.00 12/753 get_max_uid [848] 0.00 0.00 6/114328 gen_rtx_fmt_e [296] 0.00 0.00 6/545563 xcalloc [359] 0.00 0.00 6/387386 xmalloc [407] 0.00 0.00 6/22 init_recog [1539] 0.00 0.00 6/6 ggc_push_context [1636] 0.00 0.00 6/28 end_alias_analysis [1518] ----------------------------------------------- 0.00 0.02 5934/6448431 true_dependence [325] 1.86 18.88 6442497/6448431 write_dependence_p [14] [19] 13.5 1.86 18.90 6448431 memrefs_conflict_p [19] 4.44 9.36 25786729/25786729 rtx_equal_for_memref_p [25] 3.07 0.00 25791747/117220084 canon_rtx [24] 1.35 0.00 25791664/25791664 addr_side_effect_eval [73] 0.67 0.00 5859756/18915614 get_addr [61] ----------------------------------------------- 0.00 17.15 6/6 rest_of_compilation [7] [20] 11.2 0.00 17.15 6 life_analysis [20] 0.07 13.95 6/28 init_alias_analysis [12] 0.00 3.06 3/3 notice_stack_pointer_modification [56] 0.01 0.04 6/6 delete_noop_moves [305] 0.00 0.02 6/1299 update_life_info [54] 0.00 0.00 6/43 free_basic_block_vars [462] 0.00 0.00 6/12 allocate_reg_life_data [790] 0.00 0.00 6/6 allocate_bb_life_data [987] 0.00 0.00 6/6 mark_regs_live_at_end [1098] 0.00 0.00 6/28 end_alias_analysis [1518] ----------------------------------------------- 0.00 16.09 3/3 rest_of_compilation [7] [21] 10.5 0.00 16.09 3 reload_cse_regs [21] 0.04 12.62 3/6 reload_cse_regs_1 [17] 0.05 3.38 3/3 reload_cse_move2add [52] 0.00 0.00 3/3 reload_combine [1705] ----------------------------------------------- Since compile-time performance is a release criterion for gcc 3.0, I consider this serious. Release: gcc-3_0-branch 20010212 Environment: alphaev6-unknown-linux-gnu
Fix: The root problem is that gcse kills all pseudos in every basic block that is the target of a computed goto. There are many such blocks in this code, so many pseudos are reloaded at the beginning of each such block; implementing Ruething's variant of LCM that can handle abnormal edges would likely fix this. But until that is done, reload should handle this problem more gracefully. I don't know what the fix is.
From: Brad Lucier <lucier@math.purdue.edu> To: gcc-gnats@gcc.gnu.org, nobody@gcc.gnu.org Cc: lucier@math.purdue.edu Subject: Re: optimization/2001: Inordinately long compile times in reload CSE regs Date: Thu, 15 Feb 2001 15:46:09 -0500 (EST) Just to note, this is a regression versus gcc-2.95.2, which takes 3.32 seconds to compile this file, instead of 195.64 with the current release branch. Brad
State-Changed-From-To: open->feedback State-Changed-Why: Brad, can you confirm whether this problem still applies? If it is a regression w.r.t. 2.95, then we may want to raise its priority. Thanks Wolfgang
State-Changed-From-To: feedback->analyzed State-Changed-Why: Brad says this still happens. Since it is a regression w.r.t. 2.95, I raise its priority.
From: Wolfgang Bangerth <bangerth@ticam.utexas.edu> To: gcc-gnats@gcc.gnu.org Cc: Subject: Re: optimization/2001: Inordinately long compile times in reload CSE regs Date: Tue, 19 Nov 2002 08:58:33 -0600 (CST) ---------- Forwarded message ---------- Date: Mon, 18 Nov 2002 23:37:37 -0500 (EST) From: Brad Lucier <lucier@math.purdue.edu> To: bangerth@dealii.org Cc: gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org Subject: Re: optimization/2001: Inordinately long compile times in reload CSE regs > > Synopsis: Inordinately long compile times in reload CSE regs > > State-Changed-From-To: open->feedback > State-Changed-By: bangerth > State-Changed-When: Mon Nov 18 15:22:46 2002 > State-Changed-Why: > Brad, can you confirm whether this problem still applies? If > it is a regression w.r.t. 2.95, then we may want to raise > its priority. > > Thanks > Wolfgang > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 > It still happens with today's CVS mainline. The timings for cc1 (checking disabled, no profiling) are below. The input file is at http://www.math.purdue.edu/~lucier/GNATS/GNATS-6/pi.i.gz popov-222% /export/home/lucier/local/lib/gcc-lib/alphaev6-unknown-linux-gnu/3.3/cc1 -fPIC -fno-math-errno -O2 -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor ___H__20_pi {GC 6136k -> 2125k} {GC 27774k -> 14090k} {GC 20667k -> 10672k} {GC 20718k -> 17582k} {GC 48525k -> 18420k} {GC 33107k -> 24477k} ___init_proc ____20_pi Execution times (seconds) garbage collection : 1.24 ( 0%) usr 0.01 ( 2%) sys 1.00 ( 0%) wall cfg construction : 0.26 ( 0%) usr 0.01 ( 2%) sys 0.50 ( 0%) wall cfg cleanup : 0.94 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%) wall trivially dead code : 1.21 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall life analysis : 12.97 ( 4%) usr 0.00 ( 1%) sys 13.00 ( 4%) wall life info update : 1.97 ( 1%) usr 0.00 ( 0%) sys 2.50 ( 1%) wall preprocessing : 0.06 ( 0%) usr 0.02 ( 4%) sys 0.00 ( 0%) wall lexical analysis : 0.09 ( 0%) usr 0.05 ( 9%) sys 0.00 ( 0%) wall parser : 0.26 ( 0%) usr 0.04 ( 6%) sys 0.50 ( 0%) wall expand : 0.08 ( 0%) usr 0.00 ( 1%) sys 0.00 ( 0%) wall varconst : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall integration : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall jump : 0.61 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall CSE : 7.91 ( 2%) usr 0.01 ( 1%) sys 8.00 ( 2%) wall global CSE : 4.73 ( 1%) usr 0.21 (35%) sys 5.00 ( 1%) wall loop analysis : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall CSE 2 : 3.79 ( 1%) usr 0.00 ( 0%) sys 4.00 ( 1%) wall branch prediction : 3.85 ( 1%) usr 0.00 ( 1%) sys 4.00 ( 1%) wall flow analysis : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall combiner : 0.65 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall if-conversion : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall regmove : 0.58 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall scheduling : 159.40 (44%) usr 0.04 ( 6%) sys 159.50 (44%) wall local alloc : 1.27 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%) wall global alloc : 4.20 ( 1%) usr 0.05 ( 8%) sys 4.00 ( 1%) wall reload CSE regs : 109.40 (30%) usr 0.07 (12%) sys 109.50 (30%) wall flow 2 : 0.60 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall if-conversion 2 : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall peephole 2 : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rename registers : 1.71 ( 0%) usr 0.00 ( 0%) sys 2.00 ( 1%) wall scheduling 2 : 44.54 (12%) usr 0.02 ( 3%) sys 44.50 (12%) wall reorder blocks : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall shorten branches : 0.38 ( 0%) usr 0.01 ( 1%) sys 0.50 ( 0%) wall final : 1.39 ( 0%) usr 0.02 ( 4%) sys 1.50 ( 0%) wall rest of compilation : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall TOTAL : 365.42 0.59 366.00 popov-223% ll pi.s -rw-r--r-- 1 lucier users 1829841 Nov 18 23:30 pi.s popov-224% /export/home/lucier/local/lib/gcc-lib/alphaev6-unknown-linux-gnu/3.3/cc1 -fPIC -fno-math-errno -O2 -fno-gcse -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor ___H__20_pi {GC 6136k -> 2125k} {GC 6575k -> 2138k} ___init_proc ____20_pi Execution times (seconds) garbage collection : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall cfg construction : 0.22 ( 2%) usr 0.01 ( 8%) sys 0.50 ( 5%) wall cfg cleanup : 0.74 ( 8%) usr 0.00 ( 0%) sys 0.50 ( 5%) wall trivially dead code : 0.06 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall life analysis : 0.57 ( 6%) usr 0.00 ( 0%) sys 0.50 ( 5%) wall life info update : 0.17 ( 2%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall preprocessing : 0.08 ( 1%) usr 0.02 (12%) sys 0.00 ( 0%) wall lexical analysis : 0.08 ( 1%) usr 0.03 (21%) sys 0.00 ( 0%) wall parser : 0.29 ( 3%) usr 0.03 (25%) sys 0.50 ( 5%) wall expand : 0.09 ( 1%) usr 0.00 ( 1%) sys 0.00 ( 0%) wall varconst : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall integration : 0.02 ( 0%) usr 0.00 ( 1%) sys 0.00 ( 0%) wall jump : 0.46 ( 5%) usr 0.01 ( 4%) sys 0.50 ( 5%) wall CSE : 0.14 ( 2%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall CSE 2 : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall branch prediction : 3.79 (42%) usr 0.01 ( 8%) sys 4.00 (42%) wall flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall combiner : 0.16 ( 2%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall if-conversion : 0.08 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 5%) wall regmove : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall scheduling : 0.66 ( 7%) usr 0.00 ( 1%) sys 1.00 (11%) wall local alloc : 0.15 ( 2%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall global alloc : 0.18 ( 2%) usr 0.01 ( 9%) sys 0.00 ( 0%) wall reload CSE regs : 0.21 ( 2%) usr 0.00 ( 3%) sys 0.50 ( 5%) wall flow 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall if-conversion 2 : 0.06 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall peephole 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rename registers : 0.09 ( 1%) usr 0.00 ( 1%) sys 0.00 ( 0%) wall scheduling 2 : 0.16 ( 2%) usr 0.00 ( 1%) sys 0.50 ( 5%) wall reorder blocks : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall shorten branches : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall final : 0.08 ( 1%) usr 0.00 ( 1%) sys 0.00 ( 0%) wall rest of compilation : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 5%) wall TOTAL : 9.08 0.13 9.50 popov-225% ll pi.s -rw-r--r-- 1 lucier users 113392 Nov 18 23:31 pi.s
From: Brad Lucier <lucier@math.purdue.edu> To: s.bosscher@student.tudelft.nl (Steven Bosscher) Cc: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile Date: Tue, 11 Mar 2003 21:36:08 -0500 (EST) > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 > > Brad, the last time you confirmed this ugly bug is 5 months ago, do you > still see this? If so, maybe this is an Alpha-specific problem? On my > ol' slow K6-2, I get a very reasonable compile time (with -march=i586 > -fPIC -O2 -fno-math-errno): > > Execution times (seconds) ... > TOTAL : 9.66 0.34 11.80 Perhaps you were testing 3.4, where this is fixed? Or perhaps it requires a large number of registers before gcc screws up. Here are the times I now get, first for the 3.3 branch, then for 3.4: popov-1734% gcc/cc1 -fPIC -fno-math-errno -O2 -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor ___H__20_pi ___init_proc ____20_pi Execution times (seconds) cfg construction : 0.24 ( 0%) usr 0.02 ( 3%) sys 0.50 ( 0%) wall cfg cleanup : 0.94 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall trivially dead code : 1.23 ( 0%) usr 0.00 ( 0%) sys 2.50 ( 1%) wall life analysis : 13.07 ( 4%) usr 0.00 ( 0%) sys 12.50 ( 4%) wall life info update : 1.95 ( 1%) usr 0.00 ( 0%) sys 3.00 ( 1%) wall preprocessing : 0.06 ( 0%) usr 0.02 ( 4%) sys 0.00 ( 0%) wall lexical analysis : 0.11 ( 0%) usr 0.04 ( 5%) sys 0.00 ( 0%) wall parser : 0.30 ( 0%) usr 0.03 ( 5%) sys 0.50 ( 0%) wall expand : 0.08 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall varconst : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall integration : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall jump : 0.60 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall CSE : 7.86 ( 2%) usr 0.00 ( 1%) sys 7.50 ( 2%) wall global CSE : 4.63 ( 1%) usr 0.19 (29%) sys 5.00 ( 2%) wall loop analysis : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall CSE 2 : 3.78 ( 1%) usr 0.00 ( 0%) sys 3.50 ( 1%) wall branch prediction : 1.52 ( 0%) usr 0.00 ( 1%) sys 1.50 ( 0%) wall flow analysis : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall combiner : 0.67 ( 0%) usr 0.00 ( 1%) sys 1.00 ( 0%) wall if-conversion : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall regmove : 0.59 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall scheduling : 125.24 (39%) usr 0.03 ( 4%) sys 125.00 (39%) wall local alloc : 1.24 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%) wall global alloc : 4.76 ( 1%) usr 0.09 (14%) sys 4.50 ( 1%) wall reload CSE regs : 110.46 (34%) usr 0.11 (16%) sys 110.50 (34%) wall flow 2 : 0.74 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%) wall if-conversion 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall peephole 2 : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rename registers : 1.56 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%) wall scheduling 2 : 35.85 (11%) usr 0.08 (12%) sys 35.50 (11%) wall reorder blocks : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall shorten branches : 0.39 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall final : 1.37 ( 0%) usr 0.03 ( 4%) sys 1.50 ( 0%) wall rest of compilation : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall TOTAL : 320.50 0.67 321.00 gcc/cc1 -fPIC -fno-math-errno -O2 -mcpu=ev6 pi.i __copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor ___H__20_pi ___init_proc ____20_pi Execution times (seconds) cfg construction : 0.03 ( 1%) usr 0.00 ( 1%) sys 0.03 ( 1%) wall cfg cleanup : 0.10 ( 2%) usr 0.00 ( 0%) sys 0.10 ( 2%) wall trivially dead code : 0.10 ( 2%) usr 0.00 ( 0%) sys 0.10 ( 2%) wall life analysis : 0.19 ( 5%) usr 0.00 ( 0%) sys 0.19 ( 4%) wall life info update : 0.09 ( 2%) usr 0.00 ( 0%) sys 0.09 ( 2%) wall alias analysis : 0.09 ( 2%) usr 0.00 ( 1%) sys 0.09 ( 2%) wall register scan : 0.04 ( 1%) usr 0.00 ( 0%) sys 0.04 ( 1%) wall rebuild jump labels : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall preprocessing : 0.07 ( 2%) usr 0.02 (12%) sys 0.10 ( 2%) wall lexical analysis : 0.08 ( 2%) usr 0.04 (24%) sys 0.12 ( 3%) wall parser : 0.31 ( 7%) usr 0.03 (19%) sys 0.34 ( 8%) wall expand : 0.08 ( 2%) usr 0.00 ( 3%) sys 0.08 ( 2%) wall varconst : 0.02 ( 0%) usr 0.00 ( 2%) sys 0.02 ( 0%) wall integration : 0.02 ( 1%) usr 0.00 ( 1%) sys 0.02 ( 1%) wall jump : 0.02 ( 1%) usr 0.00 ( 2%) sys 0.03 ( 1%) wall CSE : 0.32 ( 8%) usr 0.00 ( 0%) sys 0.32 ( 7%) wall global CSE : 0.59 (14%) usr 0.01 ( 9%) sys 0.61 (14%) wall bypass jumps : 0.12 ( 3%) usr 0.01 ( 4%) sys 0.13 ( 3%) wall CSE 2 : 0.12 ( 3%) usr 0.00 ( 0%) sys 0.12 ( 3%) wall branch prediction : 0.05 ( 1%) usr 0.00 ( 1%) sys 0.05 ( 1%) wall combiner : 0.13 ( 3%) usr 0.00 ( 1%) sys 0.14 ( 3%) wall if-conversion : 0.02 ( 1%) usr 0.00 ( 1%) sys 0.02 ( 1%) wall regmove : 0.05 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 1%) wall scheduling : 0.41 (10%) usr 0.00 ( 2%) sys 0.41 ( 9%) wall local alloc : 0.11 ( 3%) usr 0.00 ( 0%) sys 0.11 ( 3%) wall global alloc : 0.17 ( 4%) usr 0.01 ( 7%) sys 0.18 ( 4%) wall reload CSE regs : 0.21 ( 5%) usr 0.00 ( 3%) sys 0.21 ( 5%) wall flow 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall if-conversion 2 : 0.05 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 1%) wall peephole 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall rename registers : 0.08 ( 2%) usr 0.00 ( 0%) sys 0.08 ( 2%) wall scheduling 2 : 0.15 ( 3%) usr 0.00 ( 1%) sys 0.15 ( 3%) wall reorder blocks : 0.10 ( 2%) usr 0.01 ( 7%) sys 0.12 ( 3%) wall shorten branches : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall final : 0.08 ( 2%) usr 0.00 ( 1%) sys 0.08 ( 2%) wall rest of compilation : 0.10 ( 2%) usr 0.00 ( 0%) sys 0.10 ( 2%) wall TOTAL : 4.20 0.17 4.37 And the code generated by 3.3 is absolutely horrendous; I put the .s files at http://www.math.purdue.edu/~lucier/GNATS/GNATS-6/pi-3.3.s.gz and http://www.math.purdue.edu/~lucier/GNATS/GNATS-6/pi-3.4.s.gz The 3.4-generated code is a model of good behavior and decorum ;-). The patch that fixed this for 3.4 was http://gcc.gnu.org/ml/gcc-cvs/2003-02/msg00742.html Perhaps it's in the RedHat 3.2 branch, too. Brad
From: Steven Bosscher <s.bosscher@student.tudelft.nl> To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org Cc: Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile times in reload CSE regs Date: Wed, 12 Mar 2003 00:48:54 +0100 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 Brad, the last time you confirmed this ugly bug is 5 months ago, do you still see this? If so, maybe this is an Alpha-specific problem? On my ol' slow K6-2, I get a very reasonable compile time (with -march=i586 -fPIC -O2 -fno-math-errno): Execution times (seconds) cfg construction : 0.09 ( 1%) usr 0.01 ( 3%) sys 0.10 ( 1%) wall cfg cleanup : 0.33 ( 3%) usr 0.00 ( 0%) sys 0.33 ( 3%) wall trivially dead code : 0.12 ( 1%) usr 0.00 ( 0%) sys 0.12 ( 1%) wall life analysis : 0.39 ( 4%) usr 0.00 ( 0%) sys 0.39 ( 3%) wall life info update : 0.14 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 2%) wall alias analysis : 0.14 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall register scan : 0.06 ( 1%) usr 0.00 ( 0%) sys 0.06 ( 1%) wall rebuild jump labels : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall preprocessing : 0.10 ( 1%) usr 0.03 ( 9%) sys 0.13 ( 1%) wall lexical analysis : 0.15 ( 2%) usr 0.07 (21%) sys 0.24 ( 2%) wall parser : 0.64 ( 7%) usr 0.06 (18%) sys 0.90 ( 8%) wall expand : 0.21 ( 2%) usr 0.00 ( 0%) sys 0.32 ( 3%) wall varconst : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall integration : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 1%) wall jump : 0.09 ( 1%) usr 0.01 ( 3%) sys 0.10 ( 1%) wall CSE : 0.58 ( 6%) usr 0.01 ( 3%) sys 0.60 ( 5%) wall global CSE : 1.54 (16%) usr 0.05 (15%) sys 2.14 (18%) wall loop analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall bypass jumps : 0.24 ( 2%) usr 0.01 ( 3%) sys 0.25 ( 2%) wall CSE 2 : 0.20 ( 2%) usr 0.00 ( 0%) sys 0.20 ( 2%) wall branch prediction : 0.11 ( 1%) usr 0.00 ( 0%) sys 0.11 ( 1%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall combiner : 0.24 ( 2%) usr 0.00 ( 0%) sys 0.24 ( 2%) wall if-conversion : 0.07 ( 1%) usr 0.00 ( 0%) sys 0.22 ( 2%) wall regmove : 0.06 ( 1%) usr 0.00 ( 0%) sys 0.06 ( 1%) wall local alloc : 0.20 ( 2%) usr 0.00 ( 0%) sys 0.20 ( 2%) wall global alloc : 1.35 (14%) usr 0.04 (12%) sys 1.70 (14%) wall reload CSE regs : 0.83 ( 9%) usr 0.02 ( 6%) sys 0.92 ( 8%) wall flow 2 : 0.06 ( 1%) usr 0.00 ( 0%) sys 0.12 ( 1%) wall if-conversion 2 : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 1%) wall peephole 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 1%) wall rename registers : 0.23 ( 2%) usr 0.00 ( 0%) sys 0.26 ( 2%) wall scheduling 2 : 0.55 ( 6%) usr 0.00 ( 0%) sys 0.58 ( 5%) wall reorder blocks : 0.30 ( 3%) usr 0.01 ( 3%) sys 0.34 ( 3%) wall shorten branches : 0.05 ( 1%) usr 0.00 ( 0%) sys 0.06 ( 1%) wall final : 0.12 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall rest of compilation : 0.16 ( 2%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall TOTAL : 9.66 0.34 11.80 Greetz Steven
From: Steven Bosscher <s.bosscher@student.tudelft.nl> To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org Cc: Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile times in reload CSE regs Date: Wed, 12 Mar 2003 20:19:45 +0100 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 Brad Lucier wrote: > Perhaps you were testing 3.4, where this is fixed? Or perhaps it requires > a large number of registers before gcc screws up. Here are the times I > now get, first for the 3.3 branch, then for 3.4: Uhm, yes I used 3.4. I got so many gcc versions around now, picked the wrong one. For 3.3, I get: at -O0: TOTAL : 4.66 0.24 6.14 at -O2: TOTAL : 316.67 1.71 329.12 Ouch. > The patch that fixed this for 3.4 was > > http://gcc.gnu.org/ml/gcc-cvs/2003-02/msg00742.html > > Perhaps it's in the RedHat 3.2 branch, too. Is that a combination of these two patches? http://gcc.gnu.org./ml/gcc-patches/2003-02/msg00858.html http://gcc.gnu.org./ml/gcc-patches/2003-02/msg01254.html rth mentioned 3 patches, but I can only find these two, and one other of which you said it did not apply to your sources. Any clue why this wasn't backported to 3.3? Greetz Steven
From: Steven Bosscher <s.bosscher@student.tudelft.nl> To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org, rth@redhat.com Cc: Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile times in reload CSE regs Date: Sun, 16 Mar 2003 11:03:09 +0100 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 Richard's patch really improves things a lot for me: http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01478.html If somebody tries this: The patch doesn't apply cleanly, the first hunk of bb-reorder.c needs to be applied manually. The numbers: GCC 3.3, March 12 sources: at -O0: TOTAL : 4.66 0.24 6.14 at -O2: TOTAL : 316.67 1.71 329.12 GCC 3.3, March 16 sources + Richard's patch: at -O0: TOTAL : 3.17 0.15 3.50 at -O2: TOTAL : 5.60 0.17 5.91 The latter is the avarage of three runs because I could hardly believe these numbers. Brad, this is worth a try, don't you think? :-) Greetz Steven
From: Mark Mitchell <mark@codesourcery.com> To: Steven Bosscher <s.bosscher@student.tudelft.nl> Cc: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org, rth@redhat.com Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile times in reload CSE regs Date: 21 Mar 2003 15:17:36 -0800 On Fri, 2003-03-21 at 14:58, Steven Bosscher wrote: > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 > > Richard Henderson posted a patch for this a week ago: > http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01478.html > > I have successfully bootstrapped and regtested the patch on > i586-pc-linux-gnu. > Richard, I suppose you did the same on alpha? > > If so, can the patch be commited and the PR be closed? It looks fine to me. Richard, would you commit the patch if you do not have further reservations? Thanks, -- Mark Mitchell CodeSourcery, LLC mark@codesourcery.com
From: Steven Bosscher <s.bosscher@student.tudelft.nl> To: gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org, lucier@math.purdue.edu, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org, mark@codesourcery.com, rth@redhat.com Cc: Subject: Re: optimization/2001: [3.2/3.3 regression] Inordinately long compile times in reload CSE regs Date: Fri, 21 Mar 2003 23:58:10 +0100 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=2001 Richard Henderson posted a patch for this a week ago: http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01478.html I have successfully bootstrapped and regtested the patch on i586-pc-linux-gnu. Richard, I suppose you did the same on alpha? If so, can the patch be commited and the PR be closed? Greetz Steven
State-Changed-From-To: analyzed->closed State-Changed-Why: Fixed by rth with: http://gcc.gnu.org/ml/gcc-cvs/2003-03/msg01133.html