Bug 24616 - linking non-existing BC-compiled classes: NoClassDefFoundErrors should be deferred
Summary: linking non-existing BC-compiled classes: NoClassDefFoundErrors should be def...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 4.0.0
: P3 normal
Target Milestone: 4.2.0
Assignee: Robert Schuster
URL:
Keywords:
Depends on:
Blocks: 24638
  Show dependency treegraph
 
Reported: 2005-11-01 16:20 UTC by Robert Schuster
Modified: 2006-02-01 13:45 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-11-01 19:38:21


Attachments
A test for the linking mechanism (2.10 KB, application/octet-stream)
2005-11-01 19:32 UTC, Robert Schuster
Details
lazy linker test setup (2.47 KB, application/octet-stream)
2005-11-02 11:09 UTC, Robert Schuster
Details
preliminary patch - just for review (1.61 KB, text/plain)
2005-11-05 15:59 UTC, Robert Schuster
Details
preliminary patch - just for review (1.88 KB, text/plain)
2005-11-07 10:34 UTC, Robert Schuster
Details
preliminary patch - just for review (4.69 KB, text/plain)
2005-11-09 15:18 UTC, Robert Schuster
Details
lazy linker test setup (2.71 KB, text/plain)
2005-11-09 15:22 UTC, Robert Schuster
Details
lazy linking - first part (3.97 KB, text/plain)
2005-11-16 16:35 UTC, Robert Schuster
Details
lazy linking - part 2 (2.11 KB, patch)
2005-11-16 16:51 UTC, Robert Schuster
Details | Diff
lazy linking - part 2 (2.41 KB, patch)
2005-11-17 12:39 UTC, Robert Schuster
Details | Diff
updated test setup (2.86 KB, application/octet-stream)
2005-12-05 11:52 UTC, Robert Schuster
Details
updated and fixed test setup (2.89 KB, application/octet-stream)
2006-01-31 16:45 UTC, Robert Schuster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Schuster 2005-11-01 16:20:24 UTC
In various situations where the BC-compiled classes are linked and a NoClassDefFoundError is thrown, this error condition should be deferred to the time when the actual erroneous code location is executed.

The situation always involves that a class cannot be resolved (because the bytecode is not available). That means:

- invoking a static method of a missing class
- accessing a static member of a missing class
- creating an array from a missing class
- creating a multi array from a missing class

- accessing a valid static member whose type is missing
- accessing a valid non-static field whose type is missing

- invoking a non-static method of a missing class (true?)

There may be more cases. I have to investigate that.
Comment 1 Robert Schuster 2005-11-01 19:32:01 UTC
Created attachment 10103 [details]
A test for the linking mechanism

This is a slightly bigger test for the linking mechanism. Unpack the tar.bz2, put gcj4 in your path and run make.

How does it work:
- there is a single source file which contains 4 classes. After the compilation to bytecode the file test/linker/TestClass.class is removed.
- a jar is created from the remaining classes
- a shared object is compiled from the jar
- a gcjdb file is created
- the classes in the jar are registered in the db file

If everything is fine then run the shell script:
bc-run.sh

You will see that gij fails to load and execute the main class. On Sun (can be invoked by nf-run.sh if 'java' runs the proprietary VM) java the test will succeed. The test contains some expected failures as well as situations where no error happens although variables of the missing class test.linker.TestClass are involved.

As a bonus there is a setup for gij in interpreted mode. This is for another PR.
Comment 2 Robert Schuster 2005-11-01 19:38:21 UTC
More hints for the test:

running the start script as:
??-run.sh nothing

should succeed on every vm in every variant since the critical code locations are not touched in any way. However this only succeed for gij in interpreted mode when my patch for PR #17021 is applied.
Comment 3 Robert Schuster 2005-11-02 11:09:30 UTC
Created attachment 10113 [details]
lazy linker test setup

This is a newer version of the test for class linking. The change is that the actual tests are not run inside the methods of the LazyLinker class but in separate ones which are then loaded via reflection.

By doing so loading the LazyLinker class itself works on all test environments and one can inspect the steps until the test is run more detailed.
Comment 4 Robert Schuster 2005-11-05 14:57:07 UTC
By further inspecting the problem I got the impression that gcj's approach of linking is flawed. As I understand it gcj does linking-on-initialization but for the java language, being built-around the idea of late binding, it would be better if gcj does the linking step when a particular entity (field, method) is used for the first time.

That could mean that we should fill in the atables with the address to a certain linkMe() function which then replaces itself if this succeeds or throws an Error (NoClassDefFoundError, NoSuchMethodError or NoSuchFieldError).

Comments?
Comment 5 Robert Schuster 2005-11-05 15:59:57 UTC
Created attachment 10153 [details]
preliminary patch - just for review

Here is a first start for a patch that makes access to static methods lazy. The idea is to find out when the target class is missing and put a pointer to a generic method which throws a NoClassDefFoundError on execution into the atable slot.

I post this in the hope to receive comments about the approach. Please ignore debugging stuff in link.cc - this is not the final patch. :)

One thing however remains to be solved: The function throwing the NoClassDefFoundError cannot provide the information which class definition is missing. Is there a function from which I can get the class name the method belongs to?
Comment 6 Andrew Haley 2005-11-06 16:15:27 UTC
You can get the class by generating a special-purpose thunk using the libffi invocation interface and pointing the vector at the thunk.

Virtual methods are easy to do by adding a suitable "NoClassDefFoundError" method to class Object; this method will then be inherited by every class.

Instance variables are tricky, but can be done either by adding an inline check or changing the ABI so that virtual accessor methods are generated for all public instance variables.  There aren't very many such variables so this might not be much of a problem.

Comment 7 Robert Schuster 2005-11-07 10:34:03 UTC
Created attachment 10161 [details]
preliminary patch - just for review

As andrew suggested I tried preparing a closure that is stored in the atable and will be called when the static method of the missing class is invoked.

Unfortunately gcj crashes when entering ffi_call in link.cc:743. I assume that this has something to do with the way I am allocating memory for the ffi structures cif and closure.

I just learned about ffi and tried what I am doing in gcj in a tutorial app where it works fine. Can somebody help me what and tell me what I am doing wrong here?
Comment 8 Andrew Haley 2005-11-07 10:40:16 UTC
It's not possibe from this description to tell when this problem occurs, nor which kind of error it is.  What does "crash" mean?  When does it occur?
Comment 9 Robert Schuster 2005-11-07 12:51:37 UTC
Ok, more details:
With my patch applied I run the test application in lazylinker.tar.bz2. Set up the variable CLASSPATH to point to lib/java/lazylinker.jar, start gdb with gij, set a breakpoint at link.cc:743 and run the interpreter with "-Dgnu.gcj.precompiled.db.path=lazylinker.gcjdb test.linker.LazyLinker invokeStaticMethod".

Then step into ffi_call. When it reaches the first 'call' instruction in the assembler code (SysV.S:55) it crashes.

Here is a backtrace:

#0  0xb6d2f4dc in memcpy () from /lib/tls/libc.so.6
#1  0xb79d788c in ffi_prep_args (stack=0xbff41224 "\200x\016", ecif=0xbff41258)
    at ../../../gcc/libffi/src/x86/ffi.c:108
#2  0xb79d7921 in ffi_call_SYSV () at ../../../gcc/libffi/src/x86/sysv.S:55
#3  0x00028f00 in ?? ()
#4  0x00000000 in ?? ()
#5  0x00000001 in ?? ()
#6  0x00000001 in ?? ()
#7  0xb7f45fb4 in ?? () from /lib/ld-linux.so.2
#8  0x000006f9 in ?? ()
#9  0x08057320 in ?? ()
#10 0xbff41300 in ?? ()
#11 0xbff412a0 in ?? ()
#12 0x08059e50 in ?? ()
#13 0x08059ec8 in ?? ()
#14 0xb6e1874e in test::linker::LazyLinker::main ()
   from /home/rob/devel/test/lazy-linker/lib/bc/lazylinker.so
#15 0xbff41488 in ?? ()
#16 0xb6e1826a in test.linker.LazyLinker.main(java.lang.String[]) ()
   from /home/rob/devel/test/lazy-linker/lib/bc/lazylinker.so
Previous frame inner to this frame (corrupt stack?)
Comment 10 Anthony Green 2005-11-07 14:47:22 UTC
This patch seems overly complicated to me.  I don't think we need the trampoline function and the ffi_call.   Since we're just planning on throwing an exception, it seems like you should just be able to pass the class name in as a closure argument (cast as a ffi_cif?) and throw the exception directly, dispensing entirely with the ffi_cif and all the other interpreter->native call support. 

Let me know if I'm not describing this clearly.

Comment 11 Andrew Haley 2005-11-07 14:52:36 UTC
You're not describing this clearly.  :-)

We need to point the execution vector at a piece of code that throws an exception with the appropriate args.  Now, how should we do that?

Comment 12 Anthony Green 2005-11-07 15:06:30 UTC
(In reply to comment #11)
> You're not describing this clearly.  :-)
> 
> We need to point the execution vector at a piece of code that throws an
> exception with the appropriate args.  Now, how should we do that?

The closure mechanism was specifically designed for when you want to call interpreted code.  We don't want to do this here; we just want to throw an exception with the right argument (stored in the closure object).

The current patch uses the closure mechanism to call the trampoline, which in turn uses the ffi_call mechanism to call the exception throwing function.  But we don't need to use ffi_call here, we can just call the exception throwing function directly.  Then you'll realize that these functions don't need to be separate at all.  Then you'll realize that you don't need to bother setting up the ffi_cif - all you need is the exception argument.  

Does this help explain?






Comment 13 Robert Schuster 2005-11-07 20:03:54 UTC
anthony you're right. It should work without ffi_call invocation.

Thanks for the review. I try to find out whether this fixes my segfault too, tomorrow.
Comment 14 Robert Schuster 2005-11-08 10:15:48 UTC
> But we don't need to use ffi_call here, we can just call the exception 
> throwing function directly.
Right. That worked fine.

> Then you'll realize that these functions don't need to be separate at all.
Yep. I made the trampoline function the error throwing function now.

> Then you'll realize that you don't need to bother setting up
> the ffi_cif - all you need is the exception argument.
I doubt that this is right. The ffi_prep_closure() needs to know which arguments are given to it. AFAIK the caller can cast a ffi_closure pointer to any kind of  function pointer (OK, except varargs).
Comment 15 Anthony Green 2005-11-08 12:54:20 UTC
(In reply to comment #14)
> > Then you'll realize that you don't need to bother setting up
> > the ffi_cif - all you need is the exception argument.
> I doubt that this is right. The ffi_prep_closure() needs to know which
> arguments are given to it. 

Ah - you're right, of course.  But in this case it seems like we can simply define it as zero argument function, returning void.
 
Comment 16 Robert Schuster 2005-11-09 15:18:42 UTC
Created attachment 10187 [details]
preliminary patch - just for review

This is another preview of the patch. The patch begins to more and more depend on my verifier fixes (you can see that in the prims.cc diff). I hope I will be able to send that in soon.

If you are using the provided test setup then with this patch you will be able run:
invokeMethod (this one is probably not correct, as it call a constructor which is AFAIK a static method after all)
invokeStaticMethod

accessStaticFieldOfMissingType
accessFieldOfMissingType

in BC-compiled classes mode and it behaves like the JDK.

The fixes for the field access may look a bit untidy. Well I apologize. The linker is big machine for me and do not have a complete oversight about it. I had to carefully adjust it at several places. If you have suggestions how to make it more elegant please tell me so.

There is more to come and I expect the changes to get bigger. At this state I fixed 3 (or 4) of the mentioned problems in the PR and I would like to make the patch pretty and send it in. After that I will continue working on the remaining issues. Objections?
Comment 17 Robert Schuster 2005-11-09 15:22:27 UTC
Created attachment 10188 [details]
lazy linker test setup

This is a small update to the test setup. BC-compilation is now done with debugging symbols and there is a .gdbinit file which I use quiete often.
Comment 18 Andrew Haley 2005-11-09 15:28:40 UTC
It's probably not the best idea to solve everything in this bug in a single patch.

Better make several patches, for the different issues.

Also, if there are some verifier changes needed, let's get those committed first.
Comment 19 Robert Schuster 2005-11-16 16:35:20 UTC
Created attachment 10254 [details]
lazy linking - first part

This is the first part of the patch. See above for what it does.
Comment 20 Robert Schuster 2005-11-16 16:51:14 UTC
Created attachment 10255 [details]
lazy linking - part 2

This is the second part of the lazy linking patch. Applying both fixes most of the problems with BC-compiled classes.

How does it work:
At first I introduced a new class state for classes which somehow exist in the runtime but cannot be used to the full extent (instantiation, array creation, field access, ...). In the commentation I named such classes phantom classes and they are set to the JV_STATE_PHANTOM state when created (which is sometimes inevitable).

Whenever such a class reaches a _Jv-function that cannot make use of such class it throws a NoClassDefFoundError. You can see that in _Jv_AllocObjectNoFinalizer and _Jv_AllocObjectArray. By doing so gij behaves like the JDK.

There is only one flaw which I am going to tackle tomorrow: The accessStaticField test case throws a rather odd StringIndexOutOfBoundsException. I assume that I mixed something up while trying to make field resolution more lazy.

To my surprise this patch fixes all the problems with gij in interpreted more although this was not intended.
Comment 21 Andrew Haley 2005-11-16 17:20:20 UTC
Accesses to static fields should be fixed already when compiling BC.

We generate a call to _Jv_ResolvePoolEntry(Class, int) for every static field reference, which resolves the target class and returns a pointer to the field.  

If, for some reason, we're resolving the field too soon, then that is a new bug -- it has worked before now.
Comment 22 Robert Schuster 2005-11-17 12:39:25 UTC
Created attachment 10262 [details]
lazy linking - part 2

This is the same patch as above but corrects the behavior in case that the static field of a missing class is touched.

Anyway I missed setting the JV_CONSTANTS_Resolved flag after the creation of the phantom class and had to add special handling for the case that the owner of a field is such a phantom class (-> throw NoClassDefFoundError, as always ;) )
Comment 23 Andrew Haley 2005-11-17 13:46:05 UTC
mm, I was wrong about static fields.

The problem is that a JV_CONSTANT_Fieldref constant pool entry points to a JV_CONSTANT_Class, and at the present time we attempt to resolve that JV_CONSTANT_Class reference at link time rather than at runtime.

If we had used a JV_CONSTANT_String instead of a JV_CONSTANT_Class to represent a class in a JV_CONSTANT_Fieldref this would have worked.  Oh well.
Comment 24 Andrew Haley 2005-12-02 17:43:57 UTC
This is great, but an additional test case for dynamic method invocation is needed.

We need to be able to continue if a class T that contains a method that refers to a missing class M is initialized.  At that time, the otable slot for class T that refers to the missing method must be initialized to point to the offset for  
a method in java.lang.Object that throws NoClassDefFoundError.

This is the proper test case:

class invokeMethod implements Runnable
{
  void notCalledMethod()
  {
	TestClass t = new TestClass();
	t.fail();
  }

  public void run(){
  }
}
Comment 25 Robert Schuster 2005-12-05 11:34:12 UTC
aph, this would be your test case right?

class T {
  void fail(){
      System.out.println("fail-0");
      M m = new M();

      System.out.println("fail-1");
      m.test();
  }
}

// Bytecode removed
class M {
  void test(){
  }
}

Considered that someone calls fail on a T object a NoClassDefFoundError should be thrown immediately?

I think this is not right and that is why I put the println() invocations into the method. What I observed is that the interpreter runs the code until it reaches a situation where it cannot continue. In the above example this is the instantiation of the M class.

My patch already addresses this by making M a phantom class.
Comment 26 Robert Schuster 2005-12-05 11:52:11 UTC
Created attachment 10407 [details]
updated test setup

Added two more tests:
invokeMethodIndirect1
invokeMethodIndirect2

A new class HelperClassInvokeMethod contains a non-static method which refers to a missing class and one that doesnt.

In invokeMethodIndirect1 we call the method which does not touch the missing class. This is done in invokeMethodIndirect2.

Additionaly this method contains some println() statements before and after the statements that will cause a NoClassDefFoundError.

Andrew suggested that the whole method must fail with that error (when it is called) but I think the interpreter should execute code as far as possible. This is reinforced by the successful execution of the println() statements.

On the JDK (as well as gij in interpreted and BC-compiled mode with my linker patches applied) the invokeMethodIndirect2 test prints: 'fail-0' and then the NoClassDefFoundError is thrown.
Comment 27 Robert Schuster 2005-12-05 13:52:57 UTC
I changed the PR's title to reflect more clearly what it is about.

This is about *missing* classes not about methods and fields which have been removed, changed or whatever.
Comment 28 Robert Schuster 2006-01-31 16:45:42 UTC
Created attachment 10766 [details]
updated and fixed test setup
Comment 29 Robert Schuster 2006-02-01 13:45:35 UTC
Fixed by: http://gcc.gnu.org/ml/java-patches/2006-q1/msg00124.html