This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Faster compilation speed
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Jeff Sturm <jsturm at one-point dot com>
- Cc: Richard dot Earnshaw at arm dot com, David Edelsohn <dje at watson dot ibm dot com>, Richard Henderson <rth at redhat dot com>, "David S. Miller" <davem at redhat dot com>, gcc at gcc dot gnu dot org
- Date: Thu, 22 Aug 2002 09:53:19 +0100
- Subject: Re: Faster compilation speed
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> On Tue, 20 Aug 2002, Richard Earnshaw wrote:
> > > I had done that on alpha, but didn't initially report the figures. Would
> > > a comparison to 2.95 also be useful?
> >
> > Certainly -- the numbers don't really mean anything unless we have
> > something to compare them against.
>
> I figured so. (Wow, I hadn't built a 2.95 toolchain in a long time.)
>
> > > gcc version 3.3 20020802 (experimental)
> > >
> > > ---------------------------------------------------------------------------
> > > cc1 -O2 reload.i
> > >
> > > issues/cycles = 0.51 issues/dcache_miss = 26.93 issues/dtb_miss = 1214.36
>
> gcc version 2.95.3 20010315 (release)
>
> cc1 -O2 reload.i
> issues/cycles = 0.54 issues/dcache_miss = 26.31 issues/dtb_miss = 2488.
>
> cc1 reload.i
> issues/cycles = 0.52 issues/dcache_miss = 26.30 issues/dtb_miss = 3306.
>
> Now that's interesting. No real change in L1 cache performance, but TLB
> misses nearly cut in half vs. 3.3.
>
> Trying L3 misses (both with -O0):
>
> 3.3: issues/bcache_miss = 370
> 2.95.3: issues/bcache_miss = 437
>
> Wall-clock time is nearly 2/1 for these tests, as are TLB misses, while
> other stats are close. Hmm.
>
> > So if I understand these figures correctly, then
> >
> > dcache_miss/dtb_miss ~= 45
> >
> > That is, one in 45 dcache fetches also requires a tlb walk.
>
> That's how I see it.
OK, now consider it this way. Each cache line miss will cause N bytes to
be fetched from memory -- I don't know the details, but lets assume that's
32 bytes, a typical value. Each tlb entry will address one page -- again
I don't know the details but 4K is common on many machines.
So, with gcc 2.95.3 we have
-O2 dcache_miss/tlb_miss = 2488 / 26.31 ~= 95
-O0 dcache_miss/tlb_miss = 3306 / 26.30 ~= 127
Since each dcache miss represents 32 bytes of memory we have 3040 (95 *
32) and 4064 bytes fetched per tlb miss we have very nearly 75% and 100%
of each page being accessed for each miss (it will be lower than this in
practice, since some lines in a page will probably be fetched more than
once and others not at all).
However, for gcc 3 we have 1440 and 1920 bytes; that is, we *at best*
access less than half the memory in each page we touch.
> How expensive is a TLB miss, anyway? I hadn't expected it would occur
> often enough in gcc to be significant. Note the IPC ratio stays constant,
> but as I understand it, TLB is handled in software, so maybe those cycles
> are counted by iprobe?
A cache miss probably takes about twice as long if we also miss in the
TLB, assuming tlb walking is done in hardware -- if you have a soft-loaded
TLB, then it could take significantly longer.
R.