This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


> On Sat, 10 Aug 2002, Noel Yap spake:
>>  parser                :   6.12 (65%) usr   0.75
>> (53%) sys  10.85 (63%) wall
>> ...
>>  parser                :   6.46 (65%) usr   0.63
>> (53%) sys   9.98 (62%) wall
>> ...
> Thanks,
> Noel

I have trouble believing that bison is taking that amount of time. There are a
lot of calls from the parser that are counted as PARSE. And flag_syntax_only
doesn't turn off as much as you might think. 

In my COBOL front end, all I do in the parse file is build a 'tree'. Although
many people told me bison would be too slow, let alone using flex, profiling
shows them to be a non-issue. The problem is the code generation.

According to a gprof on the largest gcc module (insn-recog.c) the parser is
only 0.43% of the total run time. On the other hand the GC figures very
prominently in the top 100 functions. This is of course without taking into
account the additional effect on cache hit rates of the larger working set
that results from using GC. On my system, this program takes about 90 seconds
to compile, but preprocessing takes less than one second. The RTL time is very
large.

The largest hand coded code gcc module (combine.c) shows broadly similar
results. The parser remains negligible. The GC is somewhat lower presumably
due to the smaller size of the program. GC remains significant, even apart
from working set/cache effects.

Compiling combine.c takes 7 seconds with -O0, 15 seconds with -O1 and 25
seconds with -O2. Nearly everyone uses -O2 so it is clear where the time is
being spent in most cases - doing optimisation. Even in -O0 a fair bit of
time, maybe 2-3 seconds, is spent optimising. 

Conclusion:

1. The fault dear Bison, is in ourselves not in you.

2. Same for the preprocessor, except maybe for C++ where many headers are
included. This is one of many design problems with the C++ language IMHO but
maybe something can be done to help.

3. GC chews up a substantial amount of time, especially in non-optimised
compiles. GC needs to be improved, but any further changes to GC should be
evidence based and subject to peer review. This would have two beneficial
effects: firstly reduced thrashing of front end developers keeping up with
significant changes of unknown benefit; and secondly we could be confident
that changes represent significant progress.

4. We do need some good numbers on how much GCC is affected by cache misses.
This would give us an idea how much effort should be devoted to improving
working set size and locality. There are lots of ways to improve locality and
reduce working sets. But let's find out if it is needed before we start
coding.

5. Most of the time in GCC compiles is spent in optimisation. So, the focus
should be there. The RTL phase of GCC is poorly understood, by anyone. Code
that is not well understood and that people are afraid to touch is invariably
inefficient. 

Two gprof outputs follow.

Tim Josling

insn-recog.c:
 %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  4.84      2.14     2.14 23088715     0.00     0.00  ggc_set_mark
  3.87      3.85     1.71  2153853     0.00     0.00  ggc_mark_rtx_children_1
  3.40      5.35     1.50  1070849     0.00     0.01  cse_insn
  3.28      6.80     1.45     1169     1.24     1.47  verify_flow_info
  2.83      8.05     1.25  5252491     0.00     0.00  for_each_rtx
  2.67      9.23     1.18                             htab_traverse
  1.90     10.07     0.84      456     1.84     2.99  init_alias_analysis
  1.81     10.87     0.80 10643243     0.00     0.00  find_reg_note
  1.72     11.63     0.76  6645463     0.00     0.00  side_effects_p
  1.68     12.37     0.74  2561968     0.00     0.00  fold_rtx
  1.52     13.04     0.67  6523785     0.00     0.00  ggc_alloc
  1.47     13.69     0.65   799692     0.00     0.00  gt_ggc_mx_lang_tree_node
  1.45     14.33     0.64  2176585     0.00     0.00  canon_reg
  1.38     14.94     0.61  2676602     0.00     0.00  rtx_cost
  1.15     15.45     0.51  1967977     0.00     0.00  ggc_mark_rtx_children
  1.06     15.92     0.47  1747317     0.00     0.00  insert
  1.00     16.36     0.44  4171744     0.00     0.00  canon_hash
  1.00     16.80     0.44  1526861     0.00     0.00  exp_equiv_p
  0.95     17.22     0.42  1356669     0.00     0.00  propagate_one_insn
  0.91     17.62     0.40  7510741     0.00     0.00  canon_rtx
  0.88     18.01     0.39   554639     0.00     0.00  count_reg_usage
  0.86     18.39     0.38  3718565     0.00     0.00  note_stores
  0.82     18.75     0.36    88538     0.00     0.01  find_reloads
  0.77     19.09     0.34       43     7.91     7.91  poison_pages
  0.72     19.41     0.32    49907     0.01     0.01  preprocess_constraints
  0.70     19.72     0.31  1011445     0.00     0.00  invalidate
  0.70     20.03     0.31   558511     0.00     0.00  reg_scan_mark_refs
  0.66     20.32     0.29   650208     0.00     0.00  constrain_operands
  0.63     20.60     0.28   774372     0.00     0.00  mark_used_regs
  0.63     20.88     0.28    24927     0.01     0.01 
count_or_remove_death_notes
  0.61     21.15     0.27  2880996     0.00     0.00  mark_set_1
  0.59     21.41     0.26  7613625     0.00     0.00  approx_reg_cost_1
  0.57     21.66     0.25  1516590     0.00     0.00 
simplify_binary_operation
  0.57     21.91     0.25  1014709     0.00     0.00  mention_regs
  0.57     22.16     0.25   177560     0.00     0.00  validate_value_data
  0.57     22.41     0.25     7063     0.04     0.10  compute_transp
  0.54     22.65     0.24   539000     0.00     0.00  copy_rtx
  0.52     22.88     0.23  1172954     0.00     0.00  insn_extract
  0.52     23.11     0.23   109544     0.00     0.00  record_reg_classes
  0.50     23.33     0.22  1495174     0.00     0.00  reg_mentioned_p
  0.48     23.54     0.21    51125     0.00     0.23  cse_basic_block
  0.48     23.75     0.21    51125     0.00     0.00  cse_end_of_basic_block
  0.45     23.95     0.20  1886796     0.00     0.00  legitimate_address_p
  0.45     24.15     0.20   501459     0.00     0.00  find_best_addr
  0.45     24.35     0.20   354991     0.00     0.00  mark_jump_label
  0.43     24.54     0.19  6597365     0.00     0.00  get_cse_reg_info
  0.43     24.73     0.19   598028     0.00     0.00  copy_rtx_if_shared
  0.43     24.92     0.19        1   190.00 40549.99  yyparse
  0.41     25.10     0.18   279766     0.00     0.00  simplify_plus_minus
...

combine.c:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  2.63      0.29     0.29   146391     0.00     0.01  cse_insn
  2.45      0.56     0.27  2791299     0.00     0.00  find_reg_note
  2.45      0.83     0.27   872878     0.00     0.00  for_each_rtx
  2.18      1.07     0.24  1844867     0.00     0.00  side_effects_p
  2.00      1.29     0.22  2403378     0.00     0.00  ggc_set_mark
  2.00      1.51     0.22     1779     0.12     0.18  verify_flow_info
  1.72      1.70     0.19  2090080     0.00     0.00  ggc_alloc
  1.63      1.88     0.18                             htab_traverse
  1.45      2.04     0.16   196228     0.00     0.00  gt_ggc_mx_lang_tree_node
  1.36      2.19     0.15  2787969     0.00     0.00  bitmap_bit_p
  1.36      2.34     0.15    21830     0.01     0.01  preprocess_constraints
  1.27      2.48     0.14  2235546     0.00     0.00  canon_rtx
  1.27      2.62     0.14    42175     0.00     0.01  find_reloads
  1.18      2.75     0.13  1707121     0.00     0.00  mark_set_1
  1.18      2.88     0.13   328751     0.00     0.00  fold_rtx
  1.09      3.00     0.12   624046     0.00     0.00  propagate_one_insn
  1.09      3.12     0.12   288895     0.00     0.00  count_reg_usage
  1.00      3.23     0.11   276995     0.00     0.00  constrain_operands
  1.00      3.34     0.11   128667     0.00     0.00  ggc_mark_rtx_children_1
  1.00      3.45     0.11    77278     0.00     0.00  validate_value_data
  1.00      3.56     0.11      786     0.14     0.43  init_alias_analysis
  0.82      3.65     0.09  1502223     0.00     0.00  note_stores
  0.82      3.74     0.09  1093219     0.00     0.00  get_cse_reg_info
  0.82      3.83     0.09   291031     0.00     0.00  m16m
  0.82      3.92     0.09   157999     0.00     0.00  mark_jump_label
  0.82      4.01     0.09    43257     0.00     0.00 
reload_cse_simplify_operands
  0.82      4.10     0.09    42404     0.00     0.00  record_reg_classes
  0.73      4.18     0.08   513121     0.00     0.00  find_base_term
  0.73      4.26     0.08   497589     0.00     0.00  insn_extract
  0.73      4.34     0.08   256522     0.00     0.00  reg_scan_mark_refs
  0.64      4.41     0.07  1126581     0.00     0.00  returnjump_p_1
  0.64      4.48     0.07   450028     0.00     0.00  mark_used_reg
  0.64      4.55     0.07   417871     0.00     0.00  mark_used_regs
  0.64      4.62     0.07   386598     0.00     0.00  loc_mentioned_in_p
  0.64      4.69     0.07   299594     0.00     0.00  bitmap_operation
  0.64      4.76     0.07   298619     0.00     0.00  canon_reg
  0.64      4.83     0.07   227797     0.00     0.00  copy_rtx_if_shared
  0.64      4.90     0.07   150028     0.00     0.00  ggc_mark_rtx_children
  0.64      4.97     0.07                             htab_find_slot_with_hash
  0.54      5.03     0.06  1747388     0.00     0.00  bitmap_set_bit
  0.54      5.09     0.06   794615     0.00     0.00  record_set
  0.54      5.15     0.06   538361     0.00     0.00  canon_hash
  0.54      5.21     0.06   497904     0.00     0.00  extract_insn
  0.54      5.27     0.06   322376     0.00     0.00  rtx_cost
  0.54      5.33     0.06   139171     0.00     0.00  try_forward_edges
  0.54      5.39     0.06   111797     0.00     0.00  cselib_subst_to_values
  0.54      5.45     0.06    61617     0.00     0.00  fold
  0.54      5.51     0.06    12204     0.00     0.01  cse_end_of_basic_block
  0.54      5.57     0.06     9853     0.01     0.01 
count_or_remove_death_notes
  0.54      5.63     0.06        1    60.00 10319.57  yyparse
  0.45      5.68     0.05  1149752     0.00     0.00  rtx_equal_p
  0.45      5.73     0.05   544172     0.00     0.00  ix86_decompose_address
  0.45      5.78     0.05   405992     0.00     0.00  insns_for_mem_walk
  0.45      5.83     0.05   308718     0.00     0.00  volatile_refs_p
...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]