This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC/JIT and precise garbage collection support?


On Fri, 2015-07-10 at 11:13 +0200, Armin Rigo wrote:
> Hi David, hi Basile,
> 
> On 10 July 2015 at 03:53, David Malcolm <dmalcolm@redhat.com> wrote:
> > FWIW PyPy (an implementation of Python) defaults to using true GC, and
> > could benefit from GC support in GCC; currently PyPy has a nasty hack
> > for locating on-stack GC roots, by compiling to assembler, then carving
> > up the assembler with regexes to build GC metadata.
> 
> A first note: write barriers, stack walking, and so on can all be
> implemented manually.  The only thing that cannot be implemented
> easily is stack maps.
> 
> Here's in more details how the PyPy hacks work, in case there is
> interest.  It might be possible to do it cleanly with minimal changes
> in GCC (hopefully?).
> 
> The goal: when a garbage collection occurs, we need to locate and
> possibly change the GC pointers in the stack.  (They may have been
> originally in callee-saved registers, saved by some callee.)  So this
> is about writing some "stack map" that describes where the values are
> around all calls in the stack.  To do that, we put in the C sources "v
> = pypy_asm_gcroot(v);" for all GC-pointer variables after each call
> (at least each call that can recursively end up collecting):
> 
> 
> /* The following pseudo-instruction is used by --gcrootfinder=asmgcc
>    just after a call to tell gcc to put a GCROOT mark on each gc-pointer
>    local variable.  All such local variables need to go through a "v =
>    pypy_asm_gcroot(v)".  The old value should not be used any more by
>    the C code; this prevents the following case from occurring: gcc
>    could make two copies of the local variable (e.g. one in the stack
>    and one in a register), pass one to GCROOT, and later use the other
>    one.  In practice the pypy_asm_gcroot() is often a no-op in the final
>    machine code and doesn't prevent most optimizations. */
> 
> /* With gcc, getting the asm() right was tricky, though.  The asm() is
>    not volatile so that gcc is free to delete it if the output variable
>    is not used at all.  We need to prevent gcc from moving the asm()
>    *before* the call that could cause a collection; this is the purpose
>    of the (unused) __gcnoreorderhack input argument.  Any memory input
>    argument would have this effect: as far as gcc knows the call
>    instruction can modify arbitrary memory, thus creating the order
>    dependency that we want. */
> 
> #define pypy_asm_gcroot(p) ({void*_r; \
>         asm ("/* GCROOT %0 */" : "=g" (_r) :       \
>          "0" (p), "m" (__gcnoreorderhack));    \
>         _r; })
> 
> 
> This puts a comment in the .s file, which we post-process.  The goal
> of this post-processing is to find the GCROOT comments, see what value
> they mention, and track where this value comes from at the preceding
> call.  This is the messy part, because the value can often move
> around, sometimes across jumps.
> 
> We also track if and where the callee-saved registers end up being saved.
> 
> At the end we generate some static data: a map from every CALL
> location to a list of GC pointers which are live across this call,
> written out as a list of callee-saved registers and stack locations.
> This static data is read by custom platform-specific code in the stack
> walker.
> 
> This works well enough because, from gcc's point of view, all GC
> pointers after a CALL are only used as arguments to "v2 =
> pypy_asm_gcroot(v)".  GCC is not allowed to do things like precompute
> offsets inside GC objects---because v2 != v (which is true if the GC
> moved the object) and v2 is only created by the pypy_asm_gcroot()
> after the call.
> 
> The drawback of this "asm" statement (besides being detached from the
> CALL) is that, even though we say "=g", a stack pointer will often be
> loaded into a register just before the "asm" and spilled again to a
> (likely different) stack location afterwards.  This creates some
> pointless data movements.  This seems to degrade performance by at
> most a few percents, so it's fine for us.
> 
> So how would a GCC-supported solution look like?  Maybe a single
> builtin that does a call and at the same time "marks" some local
> variables (for read/write).  It would be enough if a CALL emitted from
> this built-in is immediately followed by an assembler
> pseudo-instruction that describe the location of all the local
> variables listed (plus context information: the current stack frame's
> depth, and where callee-saved registers have been saved).  This would
> mean the user of this builtin still needs to come up with custom tools
> to post-process the assembler, but it is probably the simplest and
> most flexible solution.  I may be wrong about thinking any of this
> would be easy, though...

Presumably you'd want some kind of:

  -fgenerate-stack-maps

option (perhaps taking an argument describing the data format?)

Thinking aloud here, maybe a way to implement this is to have some kind
of annotation on each call describing the GC-vars that are live at the
call.

This somehow has to be created/maintained through the various IR formats
and lowerings.

I'm still relatively new to the GCC backend, so take the following with
a large pinch of salt...

AIUI, we lose precise type information when we expand from gimple to RTL
(do we?), so presumably to keep GC-precision we'd want some kind of
thing during the gimple-to-RTL expansion that annotates RTL expressions
with GC information.  Perhaps just a flag on RTL nodes to note those
nodes that are GC-holding pointers?

AIUI, we have CALL_INSN instructions all the way through the RTL phase
of the backend, so we can identify which locations in the generated code
are calls; presumably we'd need at each CALL_INSN to determine somehow
which RTL expressions tagged as being GC-aware are live (perhaps a
mixture of registers and fp-offset expressions?)

So presumably we could use that information (maybe in the final pass) to
write out some metadata describing for each %pc callsite the relevant GC
roots.

Armin: does this sound like what you need?
RTL experts: does this sound (at least) vaguely sane?

Dave


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]