This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libgcj/13708] [3.4/3.5 regression] java program crashes at startup, UTF-8 environment


------- Additional Comments From mec dot gnu at mindspring dot com  2004-01-19 00:30 -------
Subject: Re:  [3.4/3.5 regression] java program crashes at startup, UTF-8 environment

jmisc.current.pinskia.exe executes with no segfault on my system.

I figured out the difference between jmisc.current.pinskia.exe and
jmisc.current.static.exe.

===

Short version of the analysis:

  . _Jv_CreateJavaVM calls java.lang.ClassLoader.<clinit>
  . java.lang.ClassLoader.<clinit> calls java.lang.VMClassLoader.getSystemClassLoader
  . java.lang.VMClassLoader.getSystemClassLoader calls System.getProperty
  . java.lang.System.getProperty invokes java.lang.System.<clinit>
  . java.lang.System.<clinit> constructs some java.io.PrintStream objects
  . java.io.PrintStream.PrintStream calls gnu.gcj.convert.UnicodeToBytes.getDefaultEncoder
  . getDefaultEncoder loads a class whose name depends on the
    configuration options of libjava as well as the environment.
  . This class is "gnu.gcj.convert.Output_" + getProperty("file.encoding")
    . In jmisc.current.pinskia.exe:
      . getProperty("file.encoding") is "8859-1".
      . getDefaultEncoder tries to load "gnu.gcj.convert.Output_8859_1".
      . This class is already present in the static-linked executable.
      . Works fine.
    . In jmisc.current.static.exe:
      . getProperty("file.encoding") is "UTF-8".
      . getDefaultEncoder tries to load "gnu.gcj.convert.Output_UTF8".
      . This class is not already present in the static-linked executable.
      . So _Jv_FindClass uses the system class loader to go get it.
      . We are still in java.lang.ClassLoader.<clinit>, remember!
      . So the recursive call to ClassLoader.getSystemClassLoader returns NULL!
      . _Jv_FindClass segfaults on sys->loadClass because sys is NULL!

Here is the difference between Andrew's executable and my executable.
jmisc.current.pinskia.exe was built with a libgcj where the system
property "file.encoding" is statically set to "8859-1".
jmisc.current.static.exe was built with a libgcj where the system
property "file.encoding" is dynamically initialized with a call to
"setlocale", and my "file.encoding" is "UTF-8".  This difference comes
from a configuration check in natRuntime.cc for DEFAULT_FILE_ENCODING.

>From there, it turns out that gnu.gcj.convert.Output_8859_1 is
statically linked into the program, but gnu.gcj.convert.Output_UTF8 is not.
And the unicode code is running inside ClassLoader.<clinit>,
so if it reaches for an output converter that is not already statically
linked, then it dies.

===

Long version of the analysis:

Start with this code in natRuntime.cc in
gcc/libjava/java/lang/natRuntime.cc:

  #if ! defined (DEFAULT_FILE_ENCODING) && defined (HAVE_ICONV) \
      && defined (HAVE_NL_LANGINFO)

  static char *
  file_encoding ()
  {
    setlocale (LC_CTYPE, "");
    char *e = nl_langinfo (CODESET);
    if (e == NULL || *e == '\0')
      e = "8859_1";
    return e;
  }

  #define DEFAULT_FILE_ENCODING file_encoding ()

  #endif

  #ifndef DEFAULT_FILE_ENCODING
  #define DEFAULT_FILE_ENCODING "8859_1"
  #endif

  static char *default_file_encoding = DEFAULT_FILE_ENCODING;

In my executable, jmisc.current.static.exe, the "file_encoding" function
is defined.  There is a global ctor for "default_file_encoding".
The global ctor runs at the right time and initializes
"default_file_encoding" to the string "UTF-8".

In your executable, jmisc.current.pinskia.exe, there is no
"file_encoding" function.  The string "default_file_encoding" is
statically initialized to "8859_1".

Note that this initialization depends on the values of HAVE_ICONV and
HAVE_NL_LANGINFO when gcj was built.

Next, java::lang::Runtime::insertSystemProperties does a simple:

  SET ("file.encoding", default_file_encoding);

Later on, gnu.gcj.convert.UnicodeToBytes gets called.  Breakpoint on
that and do a stack trace.  This is the killer stack trace!  Here is the
stack trace in both executables, jmisc.current.static.exe and
jmisc.current.pinskia.exe.

  #0  gnu.gcj.convert.UnicodeToBytes.getDefaultEncoder
  #1  java.io.PrintStream.PrintStream
  #2  java.lang.System.<clinit>
  #3  java::lang::Class::initializeClass
  #4  java.lang.System.getProperty
  #5  java.lang.VMClassLoader.getSystemClassLoader
  #6  java.lang.ClassLoader.<clinit>
  #7  java::lang::Class::initializeClasss
  #8  _Jv_CreateJavaVM
  #9  _Jv_RunMain
  #10 JvRunMain
  #11 main

See, the runtime is still initializing ClassLoader.<clinit>.
ClassLoader.systemClassLoader is going to be NULL until this
initialization is finished.

ClassLoader.<clinit> has dragged in a bunch of other run-time
initialization.  getSystemClassLoader called get Property, which
initializes System.<clinit>.  System.<clinit> constructs the three
standard streams (standard input, standard output, standard error).
That invokes the Unicode encoder.

UnicodeToBytes.getDefaultEncoder says:

  if (defaultEncoding == null)
    {
      String encoding
	= canonicalize (System.getProperty("file.encoding",
					   "8859_1"));
      String className = "gnu.gcj.convert.Output_" + encoding;
      try
	{
	  Class defaultEncodingClass = Class.forName(className);
	  defaultEncoding = encoding;
	}

In jmisc.current.pinskia.exe, the property "file.encoding" has the
value "8859-1".  className is "gnu.gcj.convert.Output_8859_1".

In my executable, jmisc.current.static.exe, the property "file.encoding"
has the value "UTF-8".  className is "gnu.gcj.convert.Output_UTF8".

Next, getDefaultEncoder calls Class.forName(className).
This gets down to _Jv_FindClass.

Look at _Jv_FindClass (in natClassLoader.cc):

  jclass
  _Jv_FindClass (_Jv_Utf8Const *name, java::lang::ClassLoader *loader)
  {
    jclass klass = _Jv_FindClassInCache (name, loader);

    if (! klass)
      {
	jstring sname = _Jv_NewStringUTF (name->data);

	java::lang::ClassLoader *sys
	  = java::lang::ClassLoader::getSystemClassLoader ();

	if (loader)
	  {
	    ...
	  }
	else
	  {
	    // Load using the bootstrap loader jvmspec 5.3.1.
	    klass = sys->loadClass (sname, false); 

	    // Register that we're an initiating loader.
	    if (klass)
	      _Jv_RegisterInitiatingLoader (klass, 0);
	  }

If klass is in the cache, then _Jv_FindClass is happy and just
returns.

If klass is not in the cache, and loader is NULL (which it is),
then this code attempts to use the system loader.  But the system
loader is NULL because we are still initializing it!  See the stack
trace above.  That causes a segfault on "sys->loadClass (...)".

So what determines whether a class is in the cache?
_Jv_FindClassInCache has two hash tables, "loaded_classes" and
"initiated_classes".  If you break on _Jv_RegisterClassHookDefault,
you can see this stack trace:

  _Jv_RegisterClassHookDefault
  _Jv_RegisterClasses
  frame_dummy
  completed.1

frame_dummy is in gcc/crtstuff.c.  It calls _Jv_RegisterClasses
on __JCR_LIST__ to register all the classes in __JCR_LIST__.
This is simply the list of classes that are linked into the executable.

I dumped all the classes in __JCR_LIST__ in jmisc.current.static.exe.
There are 664 classes.  The classes from gnu.gcj.convert are:

  gnu.gcj.convert.BytesToUnicode
  gnu.gcj.convert.Input_8859_1
  gnu.gcj.convert.Input_iconv
  gnu.gcj.convert.IOConverter
  gnu.gcj.convert.UnicodeToBytes
  gnu.gcj.convert.Output_8859_1
  gnu.gcj.convert.Output_iconv

My executable does not have gnu.gcj.convert.Output_UTF8 linked in.
So when getDefaultEncoder attempts to dynamically load it, it crashes,
because getSystemClassLoader is still being initialized.

Next, let me change the example program slightly:

  import gnu.gcj.convert.*;
  public class j2
  {
    public static void main (String[] args)
    {
      Output_UTF8 foo = new gnu.gcj.convert.Output_UTF8 ();
      return;
    }
  }

This program works!

This code will work if any of these conditions are true:

  The executable is linked with a shared libgcj.so.  I suspect that in a
  shared library, every class is in the __JCR_LIST__ for that library,
  whether it is used or not (how else could it work)?  That is
  jmisc.current.shared.exe.

  The executable is built with a Java that was configured such that
  default_class_name is always "8859_1" instead of dynamically fetched
  from the locale.  That is jmisc.current.pinskia.exe.

  The program contains an explicit reference to the output converter,
  such gnu.gcj.convert.Output_UTF8, or whatever the user will select at
  runtime via the locale environment variables.

=== How To Reproduce This

Make sure that libgcj is built on a system where HAVE_ICONV and
HAVE_NL_LANGINFO are true.  Then build jmisc.exe with -static.
Set $LANG to "en_US.UTF-8" and run the test program.

It would be handy to change the test program to print the value
of java.lang.System.getProperty("file.encoding").

=== How To Fix This

This needs some thought.  ClassLoader.<clinit> wants to run very early.
While this is running, it is restricted to classes that are linked into
the program, either statically or shared.  It can't class-load any
classes.

This is the cycle:

  ClassLoader.<clinit> calls VmClassLoader.getSystemClassLoader
  getSystemClassLoaders calls System.getProperty
  System.getProperty invokes System.<clinit>
  System.<clinit> constructs some new PrintStream's
  PrintStream.PrintStream drags in the unicode stuff
  unicode stuff needs the class loader

We could break this cycle at any of those links.

My first idea is to change VMClassLoader.getSystemClassLoader to use
some low-level function rather than System.getProperty, so that it does
not depend on all of System.<clinit>.  That's a lot of code!

My second idea is to re-organize System so that System.<clinit> is more
light weight and safe to invoke before the class loader has been
initialized.

Another idea is to change UnicodeToBytes.getDefaultEncoder so that it
uses if/else logic rather than calling Class.forName.  getDefaultEncoder
is invoked during class loader initialization, so it has to to work
without using the class loader:

  String encoding = canonicalize (System.getProperty("file.encoding", "8859_1"));
       if (encoding == "8859_1"  ) { return new Output_8859_1  (); }
  else if (encoding == "ASCII"   ) { return new Output_ASCII   (); }
  else if (encoding == "EUCJIS"  ) { return new Output_EUCJIS  (); }
  else if (encoding == "JavaSrc" ) { return new Output_JavaSrc (); }
  else if (encoding == "SJIS"    ) { return new Output_SJIS    (); }
  else if (encoding == "UTF8"    ) { return new Output_UTF8    (); }
  else if (encoding == "iconv"   ) { return new Output_iconv   (); }
  else {
      throw new NoClassDefFoundError (
	"missing default encoding " + encoding " + " (class " +
	className + "not found)";
  }

Yeah, it's ugly.  From my perspective, the ugly part is that so much
code is invoked before the class loader is initialized.

Alternatively, you could declare that "-static" does not work with
libgcj, and libgcj must be a shared library.  It would be nice to have a
better diagnostic than a core dump.  Specifically, if native C++ code is
about to use a NULL system class loader, than throw a diagnostic rather
than dereferencing a NULL pointer.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13708


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]