This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Speed/profile of gcc3.4


Hi!

To give some more data to the speed of g++ discussion I built a profiling
compiler and ran it over the tramp3d.cpp testcase
(http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
on the (flat) profile are

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  7.85     24.64    24.64 71506294     0.00     0.00  ggc_alloc
  3.37     35.23    10.59 75978109     0.00     0.00  htab_find_slot_with_hash
  3.17     45.20     9.97 15526171     0.00     0.00  walk_tree
  3.06     54.80     9.60  3895596     0.00     0.00  gt_ggc_mx_lang_tree_node
  2.10     61.40     6.60    43652     0.00     0.00  fixup_var_refs_insns
  2.07     67.91     6.51 116225166     0.00     0.00  ggc_set_mark
  2.05     74.35     6.44    17854     0.00     0.00  init_alias_analysis
  1.51     79.10     4.75   221721     0.00     0.00  htab_expand
  1.47     83.71     4.61     1044     0.00     0.02  store_motion
  1.17     87.37     3.66     8618     0.00     0.00  loop_regs_scan
  1.14     90.96     3.59 13512961     0.00     0.00  fixup_var_refs_1
  1.03     94.19     3.23   238610     0.00     0.00  compute_transp
  1.03     97.41     3.22  2216398     0.00     0.00  emit_insn
  0.94    100.37     2.96 27841267     0.00     0.00  note_stores
  0.94    103.31     2.94 20724190     0.00     0.00  splay_tree_splay_helper
  0.89    106.09     2.78 11630295     0.00     0.00  for_each_rtx
  0.88    108.85     2.76  2243042     0.00     0.00  cse_insn
  0.88    111.60     2.75  7037958     0.00     0.00  reg_scan_mark_refs
  0.79    114.08     2.48 16796134     0.00     0.00  find_loads
  0.71    116.31     2.23 129249560     0.00     0.00  bitmap_set_bit
  0.68    118.45     2.14  6755763     0.00     0.00  count_reg_usage
  0.67    120.55     2.10 58964152     0.00     0.00  find_reg_note
  0.61    122.47     1.92  3829519     0.00     0.00  constrain_operands
  0.60    124.35     1.88 13513006     0.00     0.00  fixup_var_refs_insn
  0.59    126.19     1.84 22372827     0.00     0.00  mark_set_1

Ugh.

The htab_find_slot_with_hash stuff should maybe splitted up because
it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
only? In this case we could save the costly division/modulo calculations.

ggc_alloc - err - which collector do we use on default? The page or the zone collector?
How do I select a different collector?

For the page collector, inside ggc_alloc we should use __builtin_expect()
for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
around ggc_alloc() with a __builtin_constant_p() could be used to speed up
the order calculation. Also push_depth/push_by_depth could make use of
__builtin_expect() and put the realloc out of line.  Was the use of
prefetch in ggc_pop_context benchmarked?

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]