This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


> On Tue, 20 Aug 2002, Richard Earnshaw wrote:
> > > I had done that on alpha, but didn't initially report the figures.  Would
> > > a comparison to 2.95 also be useful?
> >
> > Certainly -- the numbers don't really mean anything unless we have
> > something to compare them against.
> 
> I figured so.  (Wow, I hadn't built a 2.95 toolchain in a long time.)
> 
> > > gcc version 3.3 20020802 (experimental)
> > >
> > > ---------------------------------------------------------------------------
> > > cc1 -O2 reload.i
> > >
> > > issues/cycles = 0.51  issues/dcache_miss = 26.93  issues/dtb_miss = 1214.36
> 
> gcc version 2.95.3 20010315 (release)
> 
> cc1 -O2 reload.i
> issues/cycles = 0.54  issues/dcache_miss = 26.31  issues/dtb_miss = 2488.
> 
> cc1 reload.i
> issues/cycles = 0.52  issues/dcache_miss = 26.30  issues/dtb_miss = 3306.
> 
> Now that's interesting.  No real change in L1 cache performance, but TLB
> misses nearly cut in half vs. 3.3.
> 
> Trying L3 misses (both with -O0):
> 
> 3.3: issues/bcache_miss = 370
> 2.95.3: issues/bcache_miss = 437
> 
> Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while
> other stats are close.  Hmm.
> 
> > So if I understand these figures correctly, then
> >
> > dcache_miss/dtb_miss ~= 45
> >
> > That is, one in 45 dcache fetches also requires a tlb walk.
> 
> That's how I see it.

OK, now consider it this way.  Each cache line miss will cause N bytes to 
be fetched from memory -- I don't know the details, but lets assume that's 
32 bytes, a typical value.  Each tlb entry will address one page -- again 
I don't know the details but 4K is common on many machines.

So, with gcc 2.95.3 we have

-O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95
-O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127

Since each dcache miss represents 32 bytes of memory we have 3040 (95 * 
32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100% 
of each page being accessed for each miss (it will be lower than this in 
practice, since some lines in a page will probably be fetched more than 
once and others not at all).

However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best* 
access less than half the memory in each page we touch.

> How expensive is a TLB miss, anyway?  I hadn't expected it would occur
> often enough in gcc to be significant.  Note the IPC ratio stays constant,
> but as I understand it, TLB is handled in software, so maybe those cycles
> are counted by iprobe?

A cache miss probably takes about twice as long if we also miss in the 
TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded 
TLB, then it could take significantly longer.

R.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]