This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [C++] another testcase for long compile times
- To: Peter Bienstman <Peter dot Bienstman at rug dot ac dot be>
- Subject: Re: [C++] another testcase for long compile times
- From: "Zack Weinberg" <zackw at stanford dot edu>
- Date: Fri, 5 Jan 2001 11:19:41 -0800
- Cc: gcc at gcc dot gnu dot org
- References: <002401c076f8$8a72f310$0454c19d@intec.rug.ac.be>
On Fri, Jan 05, 2001 at 10:19:01AM +0100, Peter Bienstman wrote:
> Hi all,
>
> Over the last weeks, compile time perfomance has improved quite a lot, but
> for some cases, it still takes 10 times as long to compile as compared to
> 2.95.2.
>
> More specifically, these are the timings for the file you can find at
> http://allserv.rug.ac.be/~pbienst/gcc/S_scheme.ii.gz:
I retrieved this file and ran it through a profiled cc1plus. Here are
the results.
$ time ./cc1plus -quiet -version -O3 -march=pentiumpro -ftemplate-depth-30 \
-ffast-math -funroll-loops -malign-double -fstrict-aliasing \
S_scheme.ii -o S_scheme.s
GNU C++ version 2.97 20010105 (experimental) (i686-pc-linux-gnu)
compiled by GNU C version 2.95.3 20001229 (prerelease).
real 17m21.908s
user 17m15.310s
sys 0m6.200s
Flat profile: Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
31.59 252.41 252.41 350236618 0.00 0.00 fixup_var_refs_1
27.52 472.28 219.87 79759 2.76 6.59 fixup_var_refs_insns
13.96 583.86 111.58 65184 1.71 1.71 push_to_sequence
4.55 620.18 36.32 85438845 0.00 0.00 reg_mentioned_p
2.98 644.00 23.82 8606 2.77 15.23 fixup_var_refs
1.47 655.72 11.72 88001798 0.00 0.00 rtx_equal_p
1.41 667.00 11.28 19452836 0.00 0.00 memset
1.18 676.41 9.41 63 149.37 150.53 compute_hash_table
Congratulations, you've found a brand new bottleneck.
The call graph profile has got an enormous cycle in it covering most
of the back end, which makes interpretation confusing. (Recursive
calls to rest_of_compilation... is that safe?) But here's the
highlights.
79759 fixup_var_refs <cycle 1> [11]
[7] 65.8 219.87 305.97 79759 fixup_var_refs_insns <cycle 1> [7]
252.41 47.54 350236618/350236618 fixup_var_refs_1 [8]
4.25 0.00 22299354/22299354 walk_fixup_memory_subreg [55]
1.72 0.00 5086638/6385044 single_set_2 [78]
0.04 0.01 86502/85438845 reg_mentioned_p [14]
0.00 0.00 42/1483517 cfree [119]
0.00 0.00 4/88001798 rtx_equal_p [24]
0.00 0.00 2/2162690 find_reg_note [357]
0.00 0.00 2/59261 prev_nonnote_insn [2617]
42 hash_lookup <cycle 1> [936]
-----------------------------------------------
1384492808 fixup_var_refs_1 [8]
252.41 47.54 350236618/350236618 fixup_var_refs_insns <cycle 1> [7]
[8] 37.5 252.41 47.54 350236618+1384492808 fixup_var_refs_1 [8]
36.22 11.32 85210660/85438845 reg_mentioned_p [14]
0.00 0.00 118/314646 copy_rtx [334]
0.00 0.00 42/42 copy_most_rtx [2133]
0.00 0.00 118/1062710 validate_change <cycle 9> [577]
0.00 0.00 42/42 find_fixup_replacement [2247]
0.00 0.00 4/2097 validate_replace_rtx [1454]
0.00 0.00 6/188317 recog_memoized_1 [209]
0.00 0.00 4/200452 gen_reg_rtx [284]
0.00 0.00 4/553812 emit_insn [181]
0.00 0.00 4/905588 gen_rtx_fmt_ee [126]
0.00 0.00 4/485623 start_sequence [247]
0.00 0.00 4/2928 emit_insn_before [1649]
0.00 0.00 4/485623 end_sequence [254]
0.00 0.00 4/285487 gen_sequence [452]
0.00 0.00 2/2 fixup_stack_1 [2545]
1384492808 fixup_var_refs_1 [8]
-----------------------------------------------
1691 gen_mem_addressof <cycle 1> [1558]
6915 schedule_fixup_var_refs <cycle 1> [2792]
[11] 16.4 23.82 107.27 8606 fixup_var_refs <cycle 1> [11]
107.14 0.05 62589/65184 push_to_sequence [12]
0.01 0.04 62589/485623 end_sequence [254]
0.01 0.01 8564/10671 push_to_full_sequence [1146]
0.01 0.00 62583/334819 get_last_insn [827]
0.00 0.01 8564/10671 end_full_sequence [1463]
0.00 0.00 8612/111321 get_insns [1318]
79759 fixup_var_refs_insns <cycle 1> [7]
-----------------------------------------------
42 put_reg_into_stack <cycle 1> [916]
6873 put_var_into_stack <cycle 1> [1169]
[2792] 0.0 0.00 0.00 6915 schedule_fixup_var_refs <cycle 1> [2792]
6915 fixup_var_refs <cycle 1> [11]
Which basically tells us that put_var_into_stack/put_reg_into_stack
are being called a reasonable number of times, but do far too much
work. And that the excess work is being done inside fixup_var_refs
(which has lots of subroutines; most in function.c).
This is old old code, and I don't properly understand its algorithm.
But my guess would be that your code generates a small number of huge
functions and fixup_var_refs_insns has to walk over the entire insn
list looking for references to variables, which may only occur in a
few places.
push_to_sequence, incidentally, is slow because it has to scan the
entire sequence to get to its end, to set last_insns. I think most if
not all of the calls from fixup_var_refs can be replaced by
push_to_full_sequence (which is given first and last pointers).
0.01 0.00 3/65184 insert_insn_on_edge
0.48 0.00 283/65184 emit_output_reload_insns
3.95 0.00 2309/65184 emit_input_reload_insns
107.14 0.05 62589/65184 fixup_var_refs
[12] 14.0 111.58 0.05 65184 push_to_sequence
0.01 0.05 65184/485623 start_sequence
Full profiling report available on request.
zw