This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Faster compilation speed [zone allocation]

Michael Matz wrote:
This would lead to the idea of obstacks (without growing obstacks) per
data structure type, IOW to a zone allocator, which is not a bad thing.
Some kind of zone allocator seems to make a lot of sense.
I will discuss trees, as I understand their usage better than rtl,
but similar ideas can be a good idea for rtl too, I suspect.

The idea is do *never* to reference counting or gc on individual
tree nodes.  Instead, all trees are allocated in "zones", where a
zone is a logical group of nodes.  For example a zone may consist
of all the declaration in the same scope.  Or all the expressions
in a given function.  Zones are allocated/deallocated as a group.
A zone may depend on other zones, if it contains references to
those zones.  A zone cannot be deallocated until all zones that
depend on it are deallocated.   For this to work, the assumption
is that zones form a tree structure, or at least a directed
acyclic graph.  I'll comment on this further below.

We can collect unused zones using any of mark-sweep, reference-counting,
or manual freeing (like we used to do with obstacks).   Either way
the overhead is less than collecting individual tree nodes.  If we
use reference counting, which I suspect may work best, then we only
store a reference count per zone, not per none.

Of course we also want locality of reference, so ideally each zone is
a single contiguous chunk of menory.  This may be too restrictive,
so I suggest instead that a zone consists of a list of chunks,
each a continguous chunk of memory, appropriately aligned.  This
allows zones to grow when needed.  It also provides one way to handle
cycles:  If two zones need to mutually reference each other, one
solution is to combine the two zones into a single zone by just chaining
their chunks together.

Of course combining zones together risks putting too much into a single
zone, which may delaying deallocation too much.  One example is the
tree of declarations.  Assume we have a zone for top-level declarations.
Also assume that each function has one or more zones for local
declarations.  Depending on our code generation strategy, we may wish
to free the latter when we are done compiling the function.  There
are various solutions:  One is a mechanism to break up zones, using
sub-zones.  A cleaner solution:  If the function-local scopes are only
used while compiling that function, then there is no need for a pointer
from the function from the local scopes - just reference the local
scopes from a global root.

There may be one or more current "nursary zones", into which newly
allocated nodes go.  For locality these will be chunks that have
available space for adding new nodes.  For example the current scope
would use a nursary zone.  We only have a small number of nursary
zones at a time.  For example, once we have exit a scope, we "graduate"
the chunk, give back any unused space in the chunk, and we no longer
allow new nodes to be allocated in that chunk.

This model works best if we avoid re-arranging the structure of trees
after nodes are created.  We also want to avoid creating temporary
nodes that get thrown away.  For example 'fold' would need some work.
We create expressions in the current function inside the "expression
nursary".  If 'fold' does some constant folding, the we may have
garbage nodes that are now unused.  That suggests we do like copying
generational collectors.  Hence:

The nursary is a memory buffer (perhaps 16k), plus a bunch of "nursary
roots".  All pointers into the nursary have to be registered.
Each node in the nursary is assocated with a zone.  When the
nursary is full, we scan the nursary roots, plus all the nodes
that follow from them.  Each live node is copied to the appropriate
zone.  This is best done is two passes:  The first pass scans from
the nursary roots, and keeping a count of how much space each zone
needs.  We then allocate an appropriately-sized chunk for each zone.
The second pass can scan the nursary in address order, copying over
nodes.  The tricky part of course is fixing up addresses within
the nursary.  In most cases there will be a pointer from a more
recent node to an older one.  Fixing these up is easy:  Just add a
forwarding pointer in the older node after it has been copied.  We
can handle forward references (from older nodes to newer ones) by
creating a new nursary root on the fly.

I think this can be implemented without wholesale gcc changes.  It
should dratically improve locality of reference, *and* (I hope)
reduce gc time.
	--Per Bothner

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]