This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [C++] another testcase for long compile times


On Fri, Jan 05, 2001 at 10:19:01AM +0100, Peter Bienstman wrote:
> Hi all,
> 
> Over the last weeks, compile time perfomance has improved quite a lot, but
> for some cases, it still takes 10 times as long to compile as compared to
> 2.95.2.
> 
> More specifically, these are the timings for the file you can find at
> http://allserv.rug.ac.be/~pbienst/gcc/S_scheme.ii.gz:

I retrieved this file and ran it through a profiled cc1plus.  Here are
the results.

$ time ./cc1plus -quiet -version -O3 -march=pentiumpro -ftemplate-depth-30 \
  -ffast-math -funroll-loops -malign-double -fstrict-aliasing \
  S_scheme.ii -o S_scheme.s
GNU C++ version 2.97 20010105 (experimental) (i686-pc-linux-gnu) 
    compiled by GNU C version 2.95.3 20001229 (prerelease).
real    17m21.908s
user    17m15.310s
sys     0m6.200s

Flat profile: Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 31.59    252.41   252.41 350236618    0.00     0.00  fixup_var_refs_1
 27.52    472.28   219.87    79759     2.76     6.59  fixup_var_refs_insns
 13.96    583.86   111.58    65184     1.71     1.71  push_to_sequence
  4.55    620.18    36.32 85438845     0.00     0.00  reg_mentioned_p
  2.98    644.00    23.82     8606     2.77    15.23  fixup_var_refs
  1.47    655.72    11.72 88001798     0.00     0.00  rtx_equal_p
  1.41    667.00    11.28 19452836     0.00     0.00  memset
  1.18    676.41     9.41       63   149.37   150.53  compute_hash_table

Congratulations, you've found a brand new bottleneck.

The call graph profile has got an enormous cycle in it covering most
of the back end, which makes interpretation confusing.  (Recursive
calls to rest_of_compilation... is that safe?)  But here's the
highlights.

                            79759             fixup_var_refs <cycle 1> [11]
[7]  65.8  219.87  305.97   79759             fixup_var_refs_insns <cycle 1> [7]
           252.41   47.54 350236618/350236618     fixup_var_refs_1 [8]
             4.25    0.00 22299354/22299354   walk_fixup_memory_subreg [55]
             1.72    0.00 5086638/6385044     single_set_2 [78]
             0.04    0.01   86502/85438845    reg_mentioned_p [14]
             0.00    0.00      42/1483517     cfree [119]
             0.00    0.00       4/88001798    rtx_equal_p [24]
             0.00    0.00       2/2162690     find_reg_note [357]
             0.00    0.00       2/59261       prev_nonnote_insn [2617]
                               42             hash_lookup <cycle 1> [936]
-----------------------------------------------
                          1384492808          fixup_var_refs_1 [8]
           252.41   47.54 350236618/350236618 fixup_var_refs_insns <cycle 1> [7]
[8]  37.5  252.41   47.54 350236618+1384492808    fixup_var_refs_1 [8]
            36.22   11.32 85210660/85438845   reg_mentioned_p [14]
             0.00    0.00     118/314646      copy_rtx [334]
             0.00    0.00      42/42          copy_most_rtx [2133]
             0.00    0.00     118/1062710     validate_change <cycle 9> [577]
             0.00    0.00      42/42          find_fixup_replacement [2247]
             0.00    0.00       4/2097        validate_replace_rtx [1454]
             0.00    0.00       6/188317      recog_memoized_1 [209]
             0.00    0.00       4/200452      gen_reg_rtx [284]
             0.00    0.00       4/553812      emit_insn [181]
             0.00    0.00       4/905588      gen_rtx_fmt_ee [126]
             0.00    0.00       4/485623      start_sequence [247]
             0.00    0.00       4/2928        emit_insn_before [1649]
             0.00    0.00       4/485623      end_sequence [254]
             0.00    0.00       4/285487      gen_sequence [452]
             0.00    0.00       2/2           fixup_stack_1 [2545]
                          1384492808             fixup_var_refs_1 [8]
-----------------------------------------------
                             1691       gen_mem_addressof <cycle 1> [1558]
                             6915       schedule_fixup_var_refs <cycle 1> [2792]
[11] 16.4   23.82  107.27    8606          fixup_var_refs <cycle 1> [11]
           107.14    0.05   62589/65184       push_to_sequence [12]
             0.01    0.04   62589/485623      end_sequence [254]
             0.01    0.01    8564/10671       push_to_full_sequence [1146]
             0.01    0.00   62583/334819      get_last_insn [827]
             0.00    0.01    8564/10671       end_full_sequence [1463]
             0.00    0.00    8612/111321      get_insns [1318]
                            79759             fixup_var_refs_insns <cycle 1> [7]
-----------------------------------------------
                                  42        put_reg_into_stack <cycle 1> [916]
                                6873        put_var_into_stack <cycle 1> [1169]
[2792]   0.0    0.00    0.00    6915    schedule_fixup_var_refs <cycle 1> [2792]
                                6915        fixup_var_refs <cycle 1> [11]

Which basically tells us that put_var_into_stack/put_reg_into_stack
are being called a reasonable number of times, but do far too much
work.  And that the excess work is being done inside fixup_var_refs
(which has lots of subroutines; most in function.c).

This is old old code, and I don't properly understand its algorithm.
But my guess would be that your code generates a small number of huge
functions and fixup_var_refs_insns has to walk over the entire insn
list looking for references to variables, which may only occur in a
few places.

push_to_sequence, incidentally, is slow because it has to scan the
entire sequence to get to its end, to set last_insns.  I think most if
not all of the calls from fixup_var_refs can be replaced by
push_to_full_sequence (which is given first and last pointers).

                0.01    0.00       3/65184       insert_insn_on_edge
                0.48    0.00     283/65184       emit_output_reload_insns
                3.95    0.00    2309/65184       emit_input_reload_insns
              107.14    0.05   62589/65184       fixup_var_refs
[12]    14.0  111.58    0.05   65184         push_to_sequence
                0.01    0.05   65184/485623      start_sequence

Full profiling report available on request.

zw

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]