This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Binary compatibility and finding compiled classes
- From: Tom Tromey <tromey at redhat dot com>
- To: GCJ Hackers <java at gcc dot gnu dot org>
- Date: 07 Oct 2004 11:58:32 -0600
- Subject: Binary compatibility and finding compiled classes
- Reply-to: tromey at redhat dot com
One thing we'd like to be able to do with the new binary compatibility
code is compile a .jar to a .so, drop it somewhere, and then have the
application pick it up without excessive configuration.
There have been a few different proposals, and implementations, of
this idea. I think it would be useful to clear some of this up now,
before the next release, so we can at least be clear about what is
supported and what is experimental; maybe we can also remove anything
we know we don't want.
Current attempts:
- libgcj will compute a .so's name from the class name and try to
dlopen it. We've done this for a long time. It does cause
problems, for instance we can load a .so compiled against a
different libgcj, thus pulling in that libgcj and wreaking havoc.
In my working BC tree, I re-enabled the "duplicate class
registration" abort just so I could detect this happening... in
fact I suspect we must keep that error for the time being, in order
to properly die when we accidentally load a non-BC library.
- "gcjlib" URLs. This is the second approach (neglecting
SharedLibLoader, but that must be used explicitly and doesn't really
fit into this discussion). It is nicer than implicit loading in a
couple of ways. First, you can have different packages in a single
.so. Second, it must be explicitly added to a URLClassLoader, so
classes wind up being loaded by the proper loader.
One problem with this approach is that, because it is explicit, you
must modify the application. Sometimes this is easy (Eclipse 2),
but sometimes it is hard (Eclipse 3 doesn't use URLClassLoader but
finds the bits itself -- an unfortunately common approach)
- The GCJLIBS directory. On the BC branch is some code in
URLClassLoader that will look alongside a .jar for a GCJLIBS
directory containing a .so named after the jar. This is convenient
since it requires no application changes. It suffers from the same
problems as the gcjlib URL scheme, though.
- gcj-jit. This is the most dynamic approach, compiling class files
as we see them. It is independent of the source of the class
files, a big strength. However it has high latency for compilation
and at the present time it does not scale well (the GC died after a
few hundred .so files were mapped -- and if not the GC I would
expect some other component to croak).
Brainstorming:
- Andrew had the idea to use extended attributes on a .jar file to
point to the corresponding .so. However it turns out that these
attributes aren't enabled by default on a certain OS near and dear
to us. Also, I suspect this approach may suffer from the same
problems as the GCJLIBS approach.
- Hash map. One idea to avoid the scalability problem is to teach
gcj-jit that multiple .class signatures can map to a single .so.
Then we can compile each Eclipse 3 .jar to a single .so, build
maps, and somehow feed the info to libgcj.
- Factory class. One idea is to defer low-level decision-making.
The idea goes, we define an API for making these decisions. Then
at startup we consult a property; this property's value is the name
of a factory class which we instantiate. Decisions are delegated
to this instance. This gives us a lot of flexibility, and
hopefully lets us avoid being locked into a mistake.
One drawback of this approach is that it requires setting a
property, which means touching startup scripts or linking with the
proper -D option. (OTOH, more magical approaches can sometimes
suffer because they are magical...)
- CodeSource hackery. We could do something like the GCJLIBS
approach, but instead of changing URLClassLoader, we would change
VMClassLoader.defineClass to look at the class' CodeSource, and try
to find a .so "nearby". We'd probably need the hash map idea to
make this work well.
I think we will need at least the hash map and CodeSource ideas. I'm
also partial to the factory class approach, but I haven't tried to
write down what the interface might look like; there may be
undesirable consequences of this.
Some important considerations:
- Avoid excessive application changes. We went down this road with
Eclipse and, unless upstream is very friendly, it just isn't
maintainable.
- Be invisible. Real applications out there depend on all kinds of
things, like CodeSource pointing where they expect it to point.
- Performance. The point of this exercise is to make things perform
well.
- Versioning. We've been hurt a bit by early decisions that didn't
take into account compatibility needs of the future (i.e., the
class-to-soname loading idea causes crashes pretty regularly for
me). We need to be future-proof in some intelligent way. This is
at least partly taken care of by the BC code.
Currently I think we're considering the 4.0 BC code a preview.
We'll try to preserve binary compatibility with future releases
(assuming one of us actually gets around to adding a version number
to the BC output :-), but I don't think we're planning to promise
it until some later release.
Anyone have other ideas, comments, or constraints?
Arguments, inspiration, condemnations, praise, or kvetching?
I would like to do some experimenting in this area soon; now that I've
got Eclipse 3 running with the interpreter I'm itching to see it
running precompiled.
Finally let me say that the BC ABI has been a huge success so far. It
has already turned precompiling Eclipse 2 from a multi-month effort to
something that is essentially trivial. We should all shower praise on
Andrew and Bryce.
Tom