This is the mail archive of the
mailing list for the GCC project.
Faster compilation speed: cache behavior
- From: Matt Austern <austern at apple dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 20 Aug 2002 14:25:10 -0700
- Subject: Faster compilation speed: cache behavior
FYI, here are the results of a fairly crude test that I did
using one of the Apple performance tools.
This table shows where the L3 cache misses are coming from.
Our performance tool shows which instruction causes a cache
miss, and then I found which function each of those
instructions came from.
Using cc1plus: 7257 samples
10.5% 0x0003eb8c cp_tree_node_structure
4.3% 0x00016fec walk_namespaces_r
3.5% 0x00016e88 vtable_decl_p
3.2% 0x90074224 memset
2.7% 0x0024a15c ht_lookup
2.7% 0x0014403c list_length
2.2% 0x0003fd34 gt_ggc_mx_lang_tree_node
1.4% 0x0024a150 ht_lookup
1.2% 0x0015e7e4 wrapup_global_declarations
1.2% 0x0015e820 wrapup_global_declarations
Using cc1: 3814 samples
8.9% 0x00017164 lookup_tag
6.9% 0x00023310 gt_ggc_mx_lang_tree_node
3.6% 0x000699d4 ht_lookup
3.5% 0x90074224 memset
2.5% 0x000699c8 ht_lookup
2.3% 0x000239ec gt_ggc_mx_lang_tree_node
1.7% 0x000af9a8 check_global_declarations
1.7% 0x0008b500 list_length
1.7% 0x000afe44 compile_file
1.6% 0x00069bc0 ht_expand
As these numbers suggest, using cc1plus takes much longer than
The fact that list_length and ht_lookup and cp_tree_node_structure
are so high suggests that we've got poor locality in tree node
allocation. The fact that cp_tree_node_structure is so high
suggests that we're probably getting a lot of cache misses
during garbage collection.