This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.3 garbage collector defaults


> > That's not quite a test of how much memory gets allocated,
> > but, if we've turned off the collector, it shouldn't be so very
> > different.  If allocation patterns have changed, looks like the
> > change is something more subtle.
> 
> Could it be a cache thrashing due to different usage patterns? Should be 
> easy to check for using a tool like oprofile on linux, which can 
> report the hardware cachemiss counters.
> 
> Or you could run it in cachegrind.

We've been through this discussion before, but let me try and add some 
diagrams that show why I think the GC model is leading to poor performance.

When we had obstacks, the allocation model went something like this

Allocate a block

 +---+
 | a |
 +---+

Now allocate some more

 +---+-----+
 | a |  b  |
 +---+-----+

Ok, now b is dead, so unwind it, then allocate some more

 +---+----+
 | a |  c |
 +---+----+

And some more

 +---+----+----+
 | a |  c | d  |
 +---+----+----+

Now everything after a is dead, so unwind, then allocate some more

 +---+----+
 | a |  e |
 +---+----+

etc.  Note that as we decide memory is dead we reclaim it fairly rapidly.  
Exactly how often depended on the type of memory being allocated, but 
typically, it was not less often than once per function.  In some cases it 
could be several times per function.  Unwinding was a trivial thing that 
was achieved by resetting the tail pointer.

With GC we have the model

Allocate a block

 +---+
 | a |
 +---+

Now allocate some more

 +---+-----+
 | a |  b  |
 +---+-----+

Ok, now b is dead, so drop references to it, then allocate some more (note 
we do not reclaim b at this point).

 +---+-----+----+
 | a |  b  |  c |
 +---+-----+----+

And some more

 +---+-----+----+----+
 | a |  b  |  c |  d |
 +---+-----+----+----+

Now everything after a is dead, so drop references, then allocate some more

 +---+-----+----+----+----+
 | a |  b  |  c |  d |  e |
 +---+-----+----+----+----+

etc, eventually, we run a GC pass and recover the slots b, c and d which 
we can reuse.

 +---+-----+----+----+----+
 | a | XXX | XXX| XXX|  e |
 +---+-----+----+----+----+

Note that this has much higher transient use of memory, particularly if we 
allocate blocks of memory for trivial purposes, only to discard the result 
very quickly; this is often the case with RTL, where we allocate some RTL 
in order to try something, only to discover that it is not valid and throw 
it away.  Further, since B is now dead memory all the effort to bring it 
into the cache has been wasted and we must now bring new memory into the 
cache when we start to operate on C -- again, for very short lived objects 
the obstack approach was clearly better since the memory would likely 
still be in the cache.

Of course, neither of the above descriptions gives the full picture; we 
had to have multiple obstacks, because some objects had a different 
lifetime from others, and we had to predict accurately (or at least, 
conservatively) which obstack an object should be allocated on.  And with 
GC we have to put objects of different sizes on separate pages so that we 
can run the collector more efficiently and we can have tertiary effects 
because the pattern of memory usage can lead to extra TLB walks.

On a sufficiently large compilation (sufficient for the GC to kick in 
several times) the *total* memory used by the compiler will probably be 
about the same for both approaches, but the transient memory usage 
patterns will remain very different, since the GC approach is terrible for 
its short-term reuse of dead memory.

So in summary, if we had a machine with no cache at all (or an infinite 
cache), the GC model would probably perform as well as, maybe even 
slightly better than, the obstack model.  But as soon as we add caches 
that are a tiny fraction of the compiler's working set, the generalized GC 
model (IMO) just plain sucks.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]