This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
GNU C++ 4.0.1/4.1.0 cache misses on MICO sources.
- From: Karel Gardas <kgardas at objectsecurity dot com>
- To: GCC Mailing List <gcc at gcc dot gnu dot org>
- Date: Tue, 17 May 2005 22:36:29 +0200 (CEST)
- Subject: GNU C++ 4.0.1/4.1.0 cache misses on MICO sources.
Hello,
I've tried to meassure some cache misses of 4.0.1 and 4.1.0 C++
compilers by using oprofile on amd64 box while compiling MICO sources
and found that:
0) compiler options used were:
-I../include -Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
1) the most expensive seems to be comptypes -- at least from L1 and L2
DTLB misses point of view (~13%)
2) comptypes is also the most CPU intensive operation since the most
of time is spent there
3) some other L1 and L2 DTLB misses expensive functions seems to be:
push_to_top_level(~5%), htab_find_slot_with_hash(~5%),
ht_lookup_with_hash(~4%), lookup_fnfields_1(~4%)
4) for 4.0.1 every L1 and L2 DTLB miss happens every 2275 CLK event
5) for 4.1.0 every L1 and L2 DTLB miss happens every 2332 CLK event
6) 4.1.0 is a _bit_ faster than 4.0.1
7) tables were produced after three cycles of "make; find . -name '*.o'
-exec rm \{} \;"
I've thought that L1 and L2 DTLB misses are the most important for the
overall performance or performance degradation, if not please correct
me since this is my first attempt to measure and interpret such data.
First few lines of produced tables are below. One table is for overall
cc1plus run and one is for symbol listing.
Please let me know if you find something like that useful so I will
continue from time to time to provide such data or if it is completely
useless and I will try to help somewhere else.
Thanks!
Karel
GCC 4.0.1 20050514 (prerelease):
silence:~$ ~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v
Using built-in specs.
Target: amd64-linux-gnu
Configured with: ../gcc-4_0-branch/configure --prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt amd64-linux-gnu
Thread model: posix
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) count 1000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...|
samples| %| samples| %| samples| %| samples| %|
------------------------------------------------------------------------
4498408 100.000 2728674 100.000 197695 100.000 3734282 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun
t 1000
samples % samples % samples % samples % symbol name
191205 4.5167 346985 13.4574 25558 13.8668 100870 2.8451 comptypes
134792 3.1841 84111 3.2621 5996 3.2532 287969 8.1223 ggc_alloc_stat
130635 3.0859 161496 6.2634 7606 4.1267 474363 13.3796 lookup_fnfields_1
100161 2.3660 5841 0.2265 153 0.0830 12492 0.3523 record_reg_classes
85299 2.0150 16765 0.6502 350 0.1899 36418 1.0272 dfs_walk_all
81984 1.9367 13907 0.5394 135 0.0732 39432 1.1122 find_reloads
78803 1.8615 18008 0.6984 586 0.3179 16583 0.4677 walk_tree
63327 1.4959 1979 0.0768 130 0.0705 24860 0.7012 _cpp_lex_direct
54152 1.2792 38433 1.4906 7770 4.2157 88230 2.4886 ht_lookup_with_hash
52226 1.2337 6949 0.2695 78 0.0423 2365 0.0667 _cpp_clean_line
47768 1.1284 40274 1.5620 8978 4.8711 65595 1.8501 htab_find_slot_with_hash
46236 1.0922 5905 0.2290 710 0.3852 32132 0.9063 splay_tree_splay_helper
45524 1.0754 55568 2.1551 1725 0.9359 73780 2.0810 lookup_field_1
44070 1.0410 33720 1.3078 1965 1.0661 47199 1.3313 tsubst
42073 0.9939 9121 0.3537 494 0.2680 20246 0.5710 grokdeclarator
41105 0.9710 19844 0.7696 581 0.3152 12929 0.3647 cp_walk_subtrees
37812 0.8932 61645 2.3908 10128 5.4951 6142 0.1732 push_to_top_level
GCC 4.1.0 20050514 (experimental):
silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v
Using built-in specs.
Target: amd64-unknown-linux-gnu
Configured with: ../gcc-main/configure --prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu
Thread model: posix
gcc version 4.1.0 20050514 (experimental)
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun
t 1000
CPU_CLK_UNHALT...|DATA_CACHE_MIS...|L1_AND_L2_DTLB...|L1_DTLB_MISSES...|
samples| %| samples| %| samples| %| samples| %|
------------------------------------------------------------------------
4505282 100.000 2641789 100.000 193179 100.000 3666902 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 1000
Counted L1_DTLB_MISSES_L2_DTLB_HITS events (L1 DTLB misses and L2 DTLB hits) with a unit mask of 0x00 (No unit mask) coun
t 1000
samples % samples % samples % samples % symbol name
188907 4.2302 346968 13.1545 25652 13.3726 104639 2.8740 comptypes
155510 3.4823 86426 3.2766 6713 3.4995 263278 7.2311 ggc_alloc_stat
129618 2.9025 149269 5.6592 6987 3.6424 487011 13.3761 lookup_fnfields_1
104383 2.3374 6488 0.2460 169 0.0881 9317 0.2559 record_reg_classes
90854 2.0345 14472 0.5487 264 0.1376 33677 0.9250 dfs_walk_all
90136 2.0184 24639 0.9341 663 0.3456 23587 0.6478 walk_tree
81124 1.8166 6738 0.2555 63 0.0328 28316 0.7777 find_reloads
78124 1.7494 3998 0.1516 154 0.0803 30305 0.8324 _cpp_lex_direct
57288 1.2828 40331 1.5291 8237 4.2940 98403 2.7027 ht_lookup_with_hash
55880 1.2513 7466 0.2831 100 0.0521 1187 0.0326 _cpp_clean_line
49160 1.1008 59362 2.2506 1748 0.9112 79866 2.1936 lookup_field_1
48784 1.0924 70640 2.6781 2231 1.1630 26856 0.7376 compparms
48030 1.0755 42436 1.6089 9417 4.9092 61766 1.6965 htab_find_slot_with_hash
47940 1.0735 38711 1.4676 2053 1.0702 53454 1.4682 tsubst
47034 1.0532 6084 0.2307 671 0.3498 32065 0.8807 splay_tree_splay_helper
45679 1.0229 7168 0.2718 448 0.2335 21898 0.6014 grokdeclarator
44777 1.0027 18205 0.6902 529 0.2758 13609 0.3738 cp_walk_subtrees
39890 0.8933 65131 2.4693 10764 5.6114 6737 0.1850 push_to_top_level
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com