GC incremental

Sun Sep 30 17:47:00 GMT 2001

Blocking reads would have to be handled along the lines that Corey suggests.
The wrapping code for read was intended as an example, and in retrospect is
clearly naive.  It should probably be changed so that it doesn't
accidentally get in the way.

The problem is that in general not only can system calls like read fail
unexpectedly, but
there isn't necessarily a way to redo the call after it fails.

In general, I wouldn't expect a lot of improvement in overall GC overhead
out of the incremental/generational mode.  Based on my experiments, it helps
or hurts depending on the application.  The real reson to use it is to
reduce GC latency.  If you don't currently have a problem with the GC
latency, I wouldn't go down this road.

The real way to avoid the system call issues is to put dirty bit maintenance
in the kernel.  AFAIK, only Solaris currently does that.  And (based on now
very old experiments) that had a very slow implementation, making it much
less useful.  Based on conversations with David Mosberger, it sounds like it
might be possible to do this right on some Linux platforms.  But it hasn't
been done, and I haven't heard from many potential users.

Hans

From: minyard@acm.org

Jeff Sturm <jsturm@one-point.com> writes:

> On Sun, 30 Sep 2001, Bryce McKinlay wrote:
> > The GC_begin_syscall() aquires the global allocation lock,
effectivly 
> > making all IO single-threaded.
> 
> Eww.  That's not good.
> 
> > Also why is the wrapping neccessary anyway? Is it just that kernal 
> > writes to the heap dont get noticed by the GC, or is it more serious

> > than that?
> 
> As I understand it, kernel writes to read-only memory cannot occur,
i.e.
> a read() to a heap address may fail with errno == EFAULT.
> 
> But that should be rare in libgcj, where most system calls act on
> stack addresses.  One exception is java::io::FileDescriptor::read
which
> writes directly to a byte array.  A byte array is pointer free, so it
> really doesn't need to be write protected.  I don't know if the GC
> makes this distinction however... it looks as though GC_protect_heap
> protects everything.

It's not rare enough to ignore, and for a blocking read, it will occur
and be very bad.  And locking can only occur for a minimum of page, so
even if your data doesn't have any pointers, some data beside it
might.  And all GCJ objects have pointers, BTW, even arrays.

> 
> One alternative to the syscall wrappers is modifying
FileDescriptor.read
> to pass a stack address and copy the buffer.  It's hard to say if that
> would be any better or worse than serializing I/O.

If you have a huge read, you can't do this (unless you can guarantee a
huge available stack space).

In my GC, I solved the problem by waiting for the file with select(),
then locking the memory writable in the GC and performing the read.
That way, you don't block any other threads (except for the GC if it
needs to scan those pages).  It also makes it easier to interrupt the
thread with a signal, since select() will always return on a signal
call.  Of course, you have to handle that properly, but that's pretty
easy.

-Corey