GC failure w/ THREAD_LOCAL_ALLOC ?
Boehm, Hans
hans_boehm@hp.com
Wed Mar 20 14:03:00 GMT 2002
I just tried Bryce's test on an Itanium here, since I had a prebuilt gcj
3.1. It uses the stock CVS garbage collector. I couldn't get it to fail.
I will try on X86, though that will take a bit longer.
Hans
> -----Original Message-----
> From: Michael Smith [mailto:msmith@spinnakernet.com]
> Sent: Wednesday, March 20, 2002 10:15 AM
> To: Bryce McKinlay
> Cc: java@gcc.gnu.org; Boehm, Hans
> Subject: Re: GC failure w/ THREAD_LOCAL_ALLOC ?
>
>
> Bryce McKinlay wrote:
> > While testing thread local allocation on PowerPC, I ran
> into a problem
> > which is also reproducable on x86. The attached stress-test-case
> > GCTest.java will lock up with ~100% reproducability with
> > THREAD_LOCAL_ALLOC enabled. It runs fine without THREAD_LOCAL_ALLOC.
> >
> > What I am seeing in the debugger is most threads waiting in
> > GC_suspend_handler, but one thread segfaulting in GC_mark_read.
> > libjava's segv handler gets called and the collector is re-entered
> > during the stack trace, causing the freeze.
>
> I actually ran into this problem in my application 2 months
> ago (using
> gcc version 3.1 20010911 (experimental)), and reported it to Hans. I
> couldn't water down my application to create such a simple
> test case, so
> tracking it down was somewhat difficult.
>
> From the stack trace I provided back in January, Hans intially
> responded with:
>
> Hans Boehm wrote:
> > I'm not terribly worried about the SIGSEGV getting turned into a
> > deadlock. Such things seem to be largely unavoidable.
> >
> > I would like to understand where the SIGSEGV is coming
> from. Typically
> > a failure here is caused by a bogus object descriptor. This may
> > happen because something was overwritten by client code, or because
> > there's an undiscovered bug in the GC, or in the gcj generated
> > descriptor.
>
> With some further pointers, it turns out there _was_ a bogus object
> descriptor. At my last contact with Hans, he suspected the
> problem was
> related to THREAD_LOCAL_ALLOC, but was unable to find any likely
> problems when reviewing the code. Here's an excerpt:
>
> Hans Boehm wrote:
> > I spent a bit of time:
> >
> > - Staring at the thread-specific-storage implementation, and
> >
> > - adding some tests for thread-local allocation to gctest.
> >
> > The new tests failed to make the problem reproducible here.
> >
> > I cleaned up a few things. The only thing substantive I found was
> > that specific.c could fail if one of the thread stacks
> ended up at the
> > extreme high end of the addres space, i.e. if 0xfffff000 is the
> > address of a valid stack page. Are you configuring your kernel in
> > some nonstandard way, e.g. to maximize virtual address space?
> > Otherwise this seems unlikely to account for the problem,
> since that's
> > normally kernel address space on Linux/X86, as I recall.
> (I vaguely
> > recall that Mandrake Linux might do something strange in
> this area.)
>
> Hans sent me new versions of specific.c and specific.h to fix
> the above
> mentioned problem (thread stacks at the high end of the
> address space),
> but I never had the chance to try them out. I had a workaround that
> made the problem go away for me, and other work priorities are
> preventing me from continuing to dig into the issue.
>
> My workarounds were to increase the initial heap size of my
> application
> (reducing the required garbage collections), and turning on
> GC_IGNORE_GCJ_INFO (which I had to add to gcj's version of
> the collector
> since it was added after the version I am using). Neither of which
> really "fixes" the problem though. They just make it much
> more unlikely
> that I'll hit the problem (I haven't since then).
>
> regards,
> michael
>
More information about the Java
mailing list