This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed: cache behavior


Matt Austern <austern@apple.com> writes:

> FYI, here are the results of a fairly crude test that I did
> using one of the Apple performance tools.
> 
> This table shows where the L3 cache misses are coming from.
> Our performance tool shows which instruction causes a cache
> miss, and then I found which function each of those
> instructions came from.
> 
>   Using cc1plus: 7257 samples
>   10.5%  0x0003eb8c      cp_tree_node_structure
>    4.3%  0x00016fec      walk_namespaces_r
>    3.5%  0x00016e88      vtable_decl_p
>    3.2%  0x90074224      memset

Does Apple's memset use dcbz?  I'm curious why memset should cause any
cache misses at all.

>    2.7%  0x0024a15c      ht_lookup
>    2.7%  0x0014403c      list_length
>    2.2%  0x0003fd34      gt_ggc_mx_lang_tree_node
>    1.4%  0x0024a150      ht_lookup
>    1.2%  0x0015e7e4      wrapup_global_declarations
>    1.2%  0x0015e820      wrapup_global_declarations
>    ...
> 
> Using cc1: 3814 samples
>    8.9%  0x00017164      lookup_tag
>    6.9%  0x00023310      gt_ggc_mx_lang_tree_node
>    3.6%  0x000699d4      ht_lookup
>    3.5%  0x90074224      memset
>    2.5%  0x000699c8      ht_lookup
>    2.3%  0x000239ec      gt_ggc_mx_lang_tree_node
>    1.7%  0x000af9a8      check_global_declarations
>    1.7%  0x0008b500      list_length
>    1.7%  0x000afe44      compile_file
>    1.6%  0x00069bc0      ht_expand
> 
> As these numbers suggest, using cc1plus takes much longer than
> using cc1.
> 
> The fact that list_length and ht_lookup and cp_tree_node_structure
> are so high suggests that we've got poor locality in tree node
> allocation.  The fact that cp_tree_node_structure is so high
> suggests that we're probably getting a lot of cache misses
> during garbage collection.

Yes, cp_tree_node_structure is the first time each tree is looked at
in GC.  It's not surprising that GC will have a lot of cache misses.
GC looks at every tree (and every pointer) exactly once, so for GC
locality is irrelevant; there is no re-use.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]