Binary compatibility and finding compiled classes

Mon Oct 11 11:11:00 GMT 2004

Am Mon, den 11.10.2004 schrieb Andrew Haley um 11:49:
> Jakob Praher writes:
>  > hi,
>  > 
>  > Am Don, den 07.10.2004 schrieb Tom Tromey um 19:58:
>  > > One thing we'd like to be able to do with the new binary compatibility
>  > > code is compile a .jar to a .so, drop it somewhere, and then have the
>  > > application pick it up without excessive configuration.
>  > this would be a great idea.
>  > 
>  > > There have been a few different proposals, and implementations, of
>  > > this idea.  I think it would be useful to clear some of this up now,
>  > > before the next release, so we can at least be clear about what is
>  > > supported and what is experimental; maybe we can also remove anything
>  > > we know we don't want.
>  > > 
>  > 
>  > * as you said the location of any given Class file is denoted by the
>  > CodeSource of the Class file.
> 
> Um, maybe.  I've been running some enterprise-scale apps and the
> CodeSource is sometimes less than informative.  All you can really
> depend on is the byte array itself.

hmm - the codesource reflects the exact position of the loaded jar file
- this probably is not informative when it comes to dynamic created
classfiles, class file transformation or when the jar/class file is
replaced. - Are there any more circumstances where this is not
informative?

Btw: java 5 defines a Bytecode transformer API that could probably mean
that more apps are going to do some kind of bytecode transformation in
the future. But on the other hand gcj could provide an implemenation
that tags classes that are transformed so that the cache gets
invalidated or run by the interpreter.

> 
> [ ... ]
> 
>  > so when a class gets loaded it is searched whether a cache entry
>  > exists.
> 
> This is almost entirely done already.  The problem happens when you
> have a bunch of bytes passed to defineClass and the corresponding
> compiled code is in a shared object somewhere -- you then have to map
> a checksum to a shared object.  For that you need a database of some
> kind.  

great to here that. 

hmm. I am trying to understand this. Is then the problem with custom
class loaders? - if things go over the gcj classloaders, the association
between loadClass (the fqcn) and defineClass the byte array is known.
and given the set of classloaders and the the fcqn the url of the jar
file can be determined (getResource).

Let's take the following example

class MyCustomClassLoader extends URLClassLoader {

   /* new class loader interface - overwrite findClass */

   protected Class findClass( String fqcn, ) {

	byte[] b = ....

	defineClass( fqcn, b, 0, b.length );
   }

}

given that defineClass is eventually implemented by gcj it can determine
the CodeSource of the class loader that called the defineClass method,
since the dynamic type is of that custom classloader.

so given the "new" defineClass method (with the fqcn as a parameter) the
gcj class loader has the following information:

* code source of the custom class loader (by calling this->getClass(
).getProectionDomain( ).getCodeSource( ) );

* fqcn

* byte array (class bytes)

perhaps that's not enough information, to get to the so file then. If
getResource is implemented well (*1), the gcj class loader could also
call this, to get the URL where the custom class loader thinks the class
is at.

(*1): by this I mean that if a CustomClassLoader defines a findClass for
a fqcn name then it should also implement getResource(fqcn) to reflect
the location of where it would load that class (this is not mandatory I
think)

> 
> If we use a scheme similar to that in the ELF format we can do that
> with very little runtime overhead.  The idea is to have a hash table
> that is entirely pointer free, so it can be mmap()ed directly.
> Lookups can be done using (on average) a single read operation.  It's
> very important that access be fast because there may be in access of
> 10000 compiled classes in a system.

hmm. I am little bit clueless here. Isn't the problem, that the mapping
can vary? 
Let me see, perhaps I can understand it:

If you have a shared library that was initiated by the loading of a
class called foo.Person and which was loaded by lets say URLClassLoader
from jar:file:/opt/myapp/share/foo.jar, how would that mapping look
like?

I mean yes the shared object would contain all the classes in the
foo.jar. So perhaps you would only want to speed up the lookup, so that
for instance if foo.Address is also laoded by the same application, it
would normally have to:

* compute a digest over the jar location
* see if that is already mapped to an so
* if yes initialize this class and go on
* if no load the so and update the cache

Ok. I think I see where this is going. You want to speed up the
findClass implementation to not having to go through the jar file and
see if that class is in there - am I right? 
So you want to cache all the classes in the so in a pointer free hash
table to have a super fast lookup.

But in order to achieve that, you have to first look at all the already
laoded shared object files and their classfile cache - since if you go
over the jars first the alogrithm will be as costly as before. Another
thing that comes to my mind is that you still need to validate the jar
location at every class loading - or perhaps in the background at idle
time - so that you know when the jar side is changing - otherwise if a
jar gets redeployed you never notice that because you'll try to satisfy
the loading throught the so first ...

> 
>  > Ahh and did you look at making a Inline Cache of indirection table
>  > lookups?
> 
> Not yet, no.  I suspect that if this turns out to be a significant
> performance impediment we'd do better to use hash tables in the
> objects, as is done by the ELF linker.
> 
ok. So you mean, like the symbol table the ELF linker uses for linking?
Should get my hands on the linkers and loaders book ... 

-- Jakob