This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GNU C++ 4.0.1/4.1.0 cache misses on MICO sources.
On Tue, 17 May 2005, Mike Stump wrote:
On May 17, 2005, at 3:16 PM, Karel Gardas wrote:
1) the most expensive seems to be comptypes -- at least from data L2
refill point of view (~17%)
2) comptypes is also the most CPU intensive operation since the most
of time is spent there
I think comptypes can be sped up by canonicalizing types better, and also
adding a conservative hash and checking it first.
Perhaps, anyway this is box with 1GB RAM. Now, I've just for fun used:
0) compiler params used were:
-I../include --param ggc-min-expand=30 --param ggc-min-heapsize=4096
-Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
and the picture at least for 4.1.0 is completely different, see below,
which means that for machine with small memory gcc misses L2 cache much
more, about 529 CLK per one miss, also the top cache misses provider seems
to be GC, second comptypes.
Cheers,
Karel
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states
) count 1000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|ICACHE_MISSES:...|DATA_CACHE_REF...|
samples| %| samples| %| samples| %| samples| %|
------------------------------------------------------------------------
5795921 100.000 3695597 100.000 2946594 100.000 1095111 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states
) count 1000
samples % samples % samples % samples % symbol name
442873 7.6411 277095 7.4980 406 0.0138 210537 19.2252 gt_ggc_mx_lang_tree_node
357714 6.1718 297393 8.0472 341 0.0116 92100 8.4101 ggc_set_mark
208484 3.5971 364311 9.8580 48844 1.6576 88551 8.0860 comptypes
176284 3.0415 96291 2.6056 66753 2.2654 27903 2.5480 ggc_alloc_stat
158048 2.7269 188948 5.1128 26549 0.9010 13119 1.1980 lookup_fnfields_1
120791 2.0841 17681 0.4784 12771 0.4334 1178 0.1076 dfs_walk_all
101900 1.7581 8530 0.2308 4541 0.1541 1293 0.1181 record_reg_classes
97854 1.6883 28305 0.7659 9740 0.3306 5843 0.5336 walk_tree
80856 1.3951 6314 0.1709 33168 1.1256 990 0.0904 find_reloads
79626 1.3738 4311 0.1167 743 0.0252 640 0.0584 _cpp_lex_direct
75468 1.3021 64101 1.7345 22 7.5e-04 20321 1.8556 cp_tree_node_structure
60301 1.0404 7343 0.1987 6487 0.2202 2986 0.2727 splay_tree_splay_helper
57714 0.9958 41027 1.1102 4436 0.1505 16364 1.4943 ht_lookup_with_hash
56687 0.9780 7502 0.2030 313 0.0106 422 0.0385 _cpp_clean_line
51682 0.8917 71809 1.9431 1513 0.0513 21801 1.9908 compparms
51528 0.8890 65441 1.7708 10699 0.3631 4356 0.3978 lookup_field_1
51470 0.8880 41211 1.1151 20647 0.7007 17549 1.6025 tsubst
50100 0.8644 43384 1.1739 19750 0.6703 18065 1.6496 htab_find_slot_with_hash
49868 0.8604 91428 2.4740 2472 0.0839 41355 3.7763 push_to_top_level
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com