This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed

Le Sat, Aug 10, 2002, à 06:20:51PM -0700, Linus Torvalds a écrit:

> >The numbers I get on a p4 with cachegrind are *much* worse in all cases.
> >
> >The miss rates are all >2%, which is a far cry from 0.1% and 0.0%.
> One thing to look out for when looking at cache miss numbers is what
> they actually _mean_.
> That is particularly true when it comes to the percentages. Are the
> percentages relative to #instructions, or #memops, or #line fetches (the
> latter ends up being interesting especially for I$).

These are percentages relative to the number of accesses. L2 percentages are
also relative to the original number of accesses, not to the number of L1

> The _best_ number to get (and in the end, the only one that really
> matters) is "cycles spent waiting on cache" and "cycles spent doing
> useful work", but I don't think valgrind gives you that.  The P4
> counters should do it, though. 

Indeed, cachegrind won't tell you when there was a miss but the hardware was
smart enough to do something useful while it waits for the cache.
Despite this limitation, shouldn't 
	(number_of_L1_misses * N) + (number_of_L2_misses * M) * cycle_len
[where N is roughly 10 and M roughly 200, or updated figures] be a ballpark
figure of the time lost waiting for RAM to catch up?

> If you wan tto use the HW counters under Linux, get "oprofile" from
> (I don't think it does P4 events yet, though)
The site says it doesn't yet.

	-- Cyrille


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]