This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

cache misses in gcc 3.3


I ran a slightly older gcc 3.3 on linux in cachegrind 
(http://developer.kde.org/~sewardj/) on an Athlon with 256K cache.
cachegrind is a cache simulator and gives detailed reports about
the caching behaviour of a program.

The test case was one large file from the linux 2.4 kernel source
(tcp_input.c) compiled with -O2

The interesting columns are I2mr and D2mr (ignoring L1 cache for now)

The top cache missers are not surprising. Top three is comfortably
hold by the garbage collector  (ggc_set_mark, gt_ggc_mx_lang_tree_node,
memset). It was run with the too small 4MB heap default.

Then for_each_rtx - lots of optimization passes walking through 
the code all the time.

side_effects_p is a pig regarding the dcache.  It seems to walk through
a significant part of the RTL quite often. Perhaps some simple
prefetching could speed it up?

constrain_operands seems to have the same problem. Suprisingly it gets
quite a lot of icache misses too.  It is not that big a function
so that's a big surprising (perhaps -Os for it would help) 

One general observation is that the instruction cache is less stressed
than the data cache. This suggests that it is better to implement 
new algorithms with code than with state machines driven with tables.

One particularly interesting case for this I investigated nearer is the
bison parser (bison.simple).  In my test it took >10% and much longer
than lexical analysis, which is quite slow (on fast compilers lexical
analysis is slower than the parser). It shows quite a bit dcache misses
too.

I investigated this a bit further using vg_annotate. The hot parts
are data cache misses when the bison parser fetches a new state from
its state tables. The problem seems to be that the other passes in
rest_of_compilation blow away all cache state and when the parser runs
again it has to refetch all the LALR(1) tables from main memory, which
is costly (perhaps it needs a few prefetches as short time help?)

I suspect using an handwritten top-down parser would be somewhat faster
here because it encodes the grammer in code, not data and I would expect it 
to have a smaller working set and be less affected by data caches overflowing.

-Andi


--------------------------------------------------------------------------------
I1 cache:         65536 B, 64 B, 2-way associative
D1 cache:         65536 B, 64 B, 2-way associative
L2 cache:         262144 B, 64 B, 8-way associative
Command:          /pkg/gcc-3.3-030124/lib/gcc-lib/i686-pc-linux-gnu/3.3/cc1 -O2 tcp_input.i
Events recorded:  Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Events shown:     Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw


--------------------------------------------------------------------------------
                        Ir      I1mr      I2mr          Dr      D1mr      D2mr          Dw      D1mw    D2mw 
--------------------------------------------------------------------------------
18,446,744,072,045,728,445 5,923,770 1,150,972 805,699,845 7,724,585 2,865,482 522,750,208 1,233,655 809,434  PROGRAM TOTALS

--------------------------------------------------------------------------------
         Ir    I1mr   I2mr         Dr    D1mr    D2mr         Dw    D1mw    D2mw  file:function
--------------------------------------------------------------------------------
172,398,900      38     37 53,636,310 250,492 116,947 17,621,244   2,379     839  ggc-page.c:ggc_set_mark
100,861,678  22,609    911  6,094,224   5,070     161 79,872,781 423,396 331,747  ???:memset
 93,221,935     371    371 22,777,108 863,697 837,499 19,204,130   1,982     909  gt-c-decl.h:gt_ggc_mx_lang_tree_node
 80,421,115   3,047  1,419 22,739,407  21,348   1,210 13,869,790   2,535     136  rtlanal.c:for_each_rtx
 67,914,814  12,937  1,589 21,950,185  86,360  21,983 11,618,234  17,148   3,309  ggc-page.c:ggc_alloc
 45,466,105     442    442  9,796,518 314,064 267,408  6,490,270     122      52  gtype-desc.c:gt_ggc_mx_rtx_def 
 42,527,220 172,435 18,354 16,389,146  15,361   1,838  9,103,320   2,923      61  cse.c:cse_insn
 42,296,870  37,486  1,253 14,394,579 107,831  13,903  7,455,819   1,527      37  rtlanal.c:side_effects_p
 40,474,855 104,575  4,790  8,641,610  10,605   1,630  4,106,599   4,087     652  recog.c:constrain_operands
 39,103,794   3,800  3,675 11,887,615   6,397   3,136  3,202,125     999     599  regclass.c:record_reg_classes
 31,810,357  23,838  2,146 12,820,501 175,648  27,730  3,690,166   3,189     651  bison.simple:yyparse
 31,293,300   6,213  3,920 10,455,781   8,520   2,636  3,898,817     917     102  cse.c:rtx_cost
 30,689,036  15,007  2,731  8,847,622   9,206   3,580  7,307,109     605      56  tree-inline.c:walk_tree
 30,221,891   4,735  2,848  9,736,810  15,819   2,650  5,148,742     128      14  flow.c:mark_set_1 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]