This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed


In article <Pine.LNX.4.44.0208092227500.2273-100000@kevin-pc.atkinson.dhs.org> you write:
>On Fri, 9 Aug 2002, Linus Torvalds wrote:
>
>> And that, in turn, is probably impossible to fix as long as gcc uses
>> garbage collection for most of its internal memory management.  There
>> just aren't all that many worse ways to f*ck up your cache behaviour
>> than by using lots of allocations and lazy GC to manage your memory. 
>
>Excuse the interruption, but from what I read a good generational garbage 
>collector can be just as fast as manually managing memory?

All the papers I've seen on it are total jokes.  But maybe I've looked
at the wrong ones. 

One fundamental fact on modern hardware is that data cache locality is
good, and not being in the cache sucks.  This is not likely to change. 
In particular, this means that if you allocate stuff, you want to re-use
the stuff you just freed _as_soon_as_possible_ - preferably before the
previously dirty data has ever even been evicted from the cache, so that
you can re-use the thing to avoid reading it in, but also to avoid
writing out stale data. 

This implies that any lazy de-allocation is bad. When a piece of memory
is free, you want to de-allocate it _immediately_, so that the next
allocation gets to re-use it and gets the cache footprint "for free".

Generational garabage collectors tend to never re-use hot objects, and
often do the copying between generations making things even worse on the
cache.  Compaction helps subsequent use somewhat, but is in itself
inherently costly, and the indirection (or fixup) implied by it can
limit other optimization. 

Sure, by being lazy you can sometimes win in icache footprint (and in
instruction count - a lot of the "GC is fast" papers seem to rely on the
fact that you can do other optimizations if you're lazy), but you lose
big in dirty dcache footprint.  And since dcache is much more expensive
than instructions, you're better off doing explicit memory management
with refcounting (optionally helped by the programming language, of
course.  You can make exact refcounting be your "GC" with some language
support). 

However, there's another, more fundamental issue.  It's the _mindset_. 
The GC mindset tends to go hand-in-hand with pointer chasing, while
people who use explicit allocators tend to be happier with doing things
like "realloc()" and trying to use arrays and indexes instead of linked
lists and just generally trying to avoid allocating lots of small
things.  Which tends to be better on the cache. 

Yes, I generalize. Don't we all?

For example, if you have an _explicit_ refcounting system, then it is
quite natural to have operations like "copy-on-write", where if you
decide to change a tree node you do something like

	copy_on_write(node_t **np)
	{

		note_t *node = *np;
		if (node->count > 1)
			newnode = copy_alloc(node);
			*np = newnode;
			node->count--;
			node = newnode;
		}
		return node;
	}

and then before you change a tree node you do

	node = copy_on_write(&tree->node);
	.. we now know we are the exclusive owners of "node" ..

which tends to be very efficient - it allows sharing, even if sharing is
often not the common case (and doesn't do any extra allocations for the
common case of an access that was already exclusively owned).

(If you want to be thread-safe you need to be more careful yet, and have
thread-safe "get_node()/put_node()" actions etc.  Most applications
don't need to be that careful, but you'll see a _lot_ of this inside an
operating system). 

In contrast, in a GC system where you do _not_ have access to the
explicit refcounting, you tend to always copy the node, just because you
don't know if the original node might be shared through another tree or
not.  Even if sharing ends up not being the most common case.  So you do
a lot of extra work, and you end up with even more cache pressure. 

Are the GC systems that do refcounting internally _and_ expose the
information upwards to the user? I bet there are. But the fact is, the
rest of them (99.9%) give those few well-done GC's a bad name.

"So what about circular data structures? Refcounting doesn't work for
them".  Right.  Don't do them.  Or handle them very very carefully (ie
there can be a "head" that gets special handling and keeps the others
alive). Compilers certainly almost always end up working with DAG's, not
cyclic structures. Make it a rule.

Does it take more effort? Yes.  The advantage of GC is that it is
automatic.  But CG apologists should just admit that it causes bad
problems and often _encourages_ people to write code that performs
badly. 

I really think it's the mindset that is the biggest problem.  A GC
system with explicitly visible reference counts (and immediate freeing)
with language support to make it easier to get the refcounts right
(things like automatically incrementing the refcounts when passing the
object off to others) wouldn't necessarily be painful to use, and would
clearly offer all the advantages of just doing it all by hand. 

That's not the world we live in, though.

		Linus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]