This is the mail archive of the
mailing list for the GCC project.
Re: gcc 3.1 is still very slow, compared to 2.95.3
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: espie at nerim dot net
- Cc: Richard dot Earnshaw at arm dot com, Jan Hubicka <jh at suse dot cz>, gcc at gcc dot gnu dot org
- Date: Sat, 18 May 2002 14:05:24 +0100
- Subject: Re: gcc 3.1 is still very slow, compared to 2.95.3
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> On Sat, May 18, 2002 at 01:12:23PM +0100, Richard Earnshaw wrote:
> > Memory use efficiency: I suspect we have many partially used pages, since
> > each page of memory is only used for objects of a single size we end up
> > with many pages with just a few items in them; in particular persistent
> > objects can now be scattered anywhere across that memory, rather than
> > being gathered in a single block. We now have to allocate far more pages
> > for a small compilation than we did before; I rarely see compilation of a
> > C file requiring less than 8M now, it used to be around 3-4M for a typical
> > file in GCC. To make matters worse we regularly touch most of those pages
> > rather than just a subset of them, which means the OS can't usefully page
> > any of them out.
> I haven't looked at the code closely, but if your suspicions are right,
> it's likely that rounding up object sizes would help... doing some stats
> on pages occupancy will help. I'll check if I can instrument some things.
> Right now, my compile times may have to do with cache hits... I have at
> least one killer box where paging cannot possibly be an issue.
> Paradoxically, investigating slowdowns is best run on real fast boxes.
> I'm going to try building m68k crosses when I can, and see whether there
> are similar slowdowns on m68k, considering Jan's remark about the frontends
> being vastly differnet.
Thinking about it there may be yet another issue at play here for
small(er) machines: the number of TLB walks required. If most memory
references are grouped, then cache line fetching may be a major effect,
however, if they are truly scattered over memory, then cache line fetches
could equally well require a TLB walk as well before the line can be
Although TLB entries cover much larger areas of memory (normally a page),
there are only a finite number of them... and fetching cache lines from
ten separate pages will require ten TLB entries...
With the obstack code the body of an insn would be laid out in a single
chunk of memory, typically something like
(insn (set (reg:SI 55) (const_int 65512)))
would be in a single block of memory as
|reg 55 | const_int 65512 | set ... | insn ... |
so walking the RTL of this insn might involve a couple of cache fills and
one TLB walk.
Now each entry might well be in a separate page (or have padding bytes
after it). Requiring additional cache fetches and additional TLB walks.
So, with ggc we either waste more memory on padding, or we waste pages on
separation of types; the more types we have the more pages we need and the
more TLB misses we are likely to have.
Do we have machines where we can profile things like cache and TLB
activity in user code?