This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Faster compilation speed: cache behavior

FYI, here are the results of a fairly crude test that I did
using one of the Apple performance tools.

This table shows where the L3 cache misses are coming from.
Our performance tool shows which instruction causes a cache
miss, and then I found which function each of those
instructions came from.

 Using cc1plus: 7257 samples
 10.5%  0x0003eb8c      cp_tree_node_structure
  4.3%  0x00016fec      walk_namespaces_r
  3.5%  0x00016e88      vtable_decl_p
  3.2%  0x90074224      memset
  2.7%  0x0024a15c      ht_lookup
  2.7%  0x0014403c      list_length
  2.2%  0x0003fd34      gt_ggc_mx_lang_tree_node
  1.4%  0x0024a150      ht_lookup
  1.2%  0x0015e7e4      wrapup_global_declarations
  1.2%  0x0015e820      wrapup_global_declarations

Using cc1: 3814 samples
  8.9%  0x00017164      lookup_tag
  6.9%  0x00023310      gt_ggc_mx_lang_tree_node
  3.6%  0x000699d4      ht_lookup
  3.5%  0x90074224      memset
  2.5%  0x000699c8      ht_lookup
  2.3%  0x000239ec      gt_ggc_mx_lang_tree_node
  1.7%  0x000af9a8      check_global_declarations
  1.7%  0x0008b500      list_length
  1.7%  0x000afe44      compile_file
  1.6%  0x00069bc0      ht_expand

As these numbers suggest, using cc1plus takes much longer than
using cc1.

The fact that list_length and ht_lookup and cp_tree_node_structure
are so high suggests that we've got poor locality in tree node
allocation.  The fact that cp_tree_node_structure is so high
suggests that we're probably getting a lot of cache misses
during garbage collection.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]