Small example of livelock regression in garbage collector for GCJ 3.3 under Win32

Ranjit Mathew rmathew@hotmail.com
Wed May 21 05:51:00 GMT 2003


>>You need -fuse-divide-subroutine and -fcheck-references.  Ranjit ahd a
>>look at using Windows' structured exception handling to cope with
>>these problems, but I don't think that it's done yet.

Andrew, if I understand this correctly, you are suggesting that
we use these options for the time being till such a time
that we can fix the SEH stuff.

These will make things a bit slower, but keep it correct:

http://gcc.gnu.org/onlinedocs/gcj/Configure-time-Options.html

Right?

This weekend I tried my hands at it again, but it looks
much tougher than I thought it would be. :-(


> Ranjit, could you educate me more with this? I reread the related posts
> but am still not getting what the core issue is. And I still need to know
> what --enable-sljl-exceptions means.

SJLJ (setjmp/longjmp) is an exception handling mechanism
based (roughly) on the idea of a "try" doing a setjmp( ) and
a "throw" doing a longjmp( ). It is expensive even in the
normal case of no exceptions being thrown and therefore
a *much* better and far more efficient mechanism was
devised based on table lookups using DWARF-2 debugging
information for each method.

Great introductions to these mechanisms can be found in:

"C++ Exception Handling for IA-64" (Christophe de Dinechin):
http://www.usenix.org/events/osdi2000/wiess2000/full_papers/dinechin/dinechin_html/

"EH Newbies Howto" for GCC (Aldy Hernandez, et al):
http://gcc.gnu.org/ml/gcc/2002-07/msg00391.html

MinGW has complete (in the "mingw-local" patchset) support
for DW2 as well as SJLJ but has to go with SJLJ for
the sole reason that DW2 EH does not work across
callbacks while SJLJ does:

     http://gcc.gnu.org/ml/gcc/2003-05/msg00608.html

The FSF sources will build MinGW by default with
SJLJ while the "mingw-local" patchset will build it
by default with DW2 - therefore, it is necessary
*while building from a "mingw-local" patchset patched
source tree* to supply an explicit "--enable-sjlj-exceptions".

Since Java programs make such heavy use of try/catch,
it makes me VERY sad to see how much of a performance
impact this will have. :-(


> It appears that -fcheck-references and -fuse-divide-subroutine will fix
> NullPointerException and DivideByZeroException, but won't
> fix the root of the problem, right?

They'll probably avoid the problem (and make things even
slower ~ouch~) but will not fix the underlying problem.

A wonderful explanation of Win32 Structured Exception
Handling (SEH) can be found in:

http://www.microsoft.com/msj/0197/Exception/Exception.aspx

A Win32 application can register a "catch all" exception
handler by calling SetUnhandledExceptionFilter( ):

http://msdn.microsoft.com/library/en-us/debug/base/setunhandledexceptionfilter.asp

When a fault (like accessing a NULL reference or trying
to divide by zero) occurs and there are no intermediate
handlers, Win32 will ultimately call this handler routine.

In libgcj, there is such an exception handler in win32.cc
named win32_exception_handler( ) which throws a
NullPointerException or ArithmeticException as appropriate.

*However* there are two things seriously wrong with this:

1. Win32 expects this handler to return to it indicating
    what it should do (EXCEPTION_CONTINUE_SEARCH,
    EXCEPTION_CONTINUE_EXECUTION, etc.) after this. Since
    our handler does not, two "real" NullPointerExceptions,
    for example, are enough for a program to hang on Win2K!

2. The MinGW runtime installs its own "catch all" filter
    named "_gnu_exception_handler" to implement signals
    (crt1.c in the mingw-runtime sources). We do not play
    nicely with this at all (and neither does it with us).
    In particular, SetUnhandledExceptionFilter( ) returns
    the address of the previous top-level handler, if any,
    else NULL, letting each know of the existence of the
    other.

The reason "throw new NullPointerException( )" does not
exhibit the same problem, as Oyvind notes, is that it
does not involve SEH at all, unlike a "real"
NullPointerException caused by a fault accessing an
invalid pointer.

Mohan, perhaps you can rebuild your GCJ with the
flags Andrew suggests for the time being so that we
can avoid this nasty situation. (BTW, your snapshot
shows a version number of "3.3.1 prerelease" instead
of "3.3", probably because you did a CVS checkout from
the 3.3 branch instead of building from the 3.3
tarballs.)

Ranjit.

-- 
Ranjit Mathew          Email: rmathew AT hotmail DOT com

Bangalore, INDIA.      Web: http://ranjitmathew.tripod.com/




More information about the Java mailing list