This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.1 is still very slow, compared to 2.95.3

> On Sat, May 18, 2002 at 01:12:23PM +0100, Richard Earnshaw wrote:
> > Memory use efficiency:  I suspect we have many partially used pages, since 
> > each page of memory is only used for objects of a single size we end up 
> > with many pages with just a few items in them; in particular persistent 
> > objects can now be scattered anywhere across that memory, rather than 
> > being gathered in a single block.  We now have to allocate far more pages 
> > for a small compilation than we did before; I rarely see compilation of a 
> > C file requiring less than 8M now, it used to be around 3-4M for a typical 
> > file in GCC.  To make matters worse we regularly touch most of those pages 
> > rather than just a subset of them, which means the OS can't usefully page 
> > any of them out.
> I haven't looked at the code closely, but if your suspicions are right,
> it's likely that rounding up object sizes would help... doing some stats
> on pages occupancy will help.  I'll check if I can instrument some things.
> Right now, my compile times may have to do with cache hits...  I have at
> least one killer box where paging cannot possibly be an issue.
> Paradoxically, investigating slowdowns is best run on real fast boxes.
> I'm going to try building m68k crosses when I can, and see whether there
> are similar slowdowns on m68k, considering Jan's remark about the frontends
> being vastly differnet.

Thinking about it there may be yet another issue at play here for 
small(er) machines: the number of TLB walks required.  If most memory 
references are grouped, then cache line fetching may be a major effect, 
however, if they are truly scattered over memory, then cache line fetches 
could equally well require a TLB walk as well before the line can be 

Although TLB entries cover much larger areas of memory (normally a page), 
there are only a finite number of them... and fetching cache lines from 
ten separate pages will require ten TLB entries...

With the obstack code the body of an insn would be laid out in a single 
chunk of memory, typically something like

(insn (set (reg:SI 55) (const_int 65512)))

would be in a single block of memory as

|reg 55 | const_int 65512 | set ... | insn ... |

so walking the RTL of this insn might involve a couple of cache fills and 
one TLB walk.

Now each entry might well be in a separate page (or have padding bytes 
after it).  Requiring additional cache fetches and additional TLB walks.

So, with ggc we either waste more memory on padding, or we waste pages on 
separation of types; the more types we have the more pages we need and the 
more TLB misses we are likely to have.

Do we have machines where we can profile things like cache and TLB 
activity in user code?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]