This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Binary compatibility and finding compiled classes


One thing we'd like to be able to do with the new binary compatibility
code is compile a .jar to a .so, drop it somewhere, and then have the
application pick it up without excessive configuration.

There have been a few different proposals, and implementations, of
this idea.  I think it would be useful to clear some of this up now,
before the next release, so we can at least be clear about what is
supported and what is experimental; maybe we can also remove anything
we know we don't want.

Current attempts:

- libgcj will compute a .so's name from the class name and try to
  dlopen it.  We've done this for a long time.  It does cause
  problems, for instance we can load a .so compiled against a
  different libgcj, thus pulling in that libgcj and wreaking havoc.

  In my working BC tree, I re-enabled the "duplicate class
  registration" abort just so I could detect this happening... in
  fact I suspect we must keep that error for the time being, in order
  to properly die when we accidentally load a non-BC library.

- "gcjlib" URLs.  This is the second approach (neglecting
  SharedLibLoader, but that must be used explicitly and doesn't really
  fit into this discussion).  It is nicer than implicit loading in a
  couple of ways.  First, you can have different packages in a single
  .so.  Second, it must be explicitly added to a URLClassLoader, so
  classes wind up being loaded by the proper loader.

  One problem with this approach is that, because it is explicit, you
  must modify the application.  Sometimes this is easy (Eclipse 2),
  but sometimes it is hard (Eclipse 3 doesn't use URLClassLoader but
  finds the bits itself -- an unfortunately common approach)

- The GCJLIBS directory.  On the BC branch is some code in
  URLClassLoader that will look alongside a .jar for a GCJLIBS
  directory containing a .so named after the jar.  This is convenient
  since it requires no application changes.  It suffers from the same
  problems as the gcjlib URL scheme, though.

- gcj-jit.  This is the most dynamic approach, compiling class files
  as we see them.  It is independent of the source of the class
  files, a big strength.  However it has high latency for compilation
  and at the present time it does not scale well (the GC died after a
  few hundred .so files were mapped -- and if not the GC I would
  expect some other component to croak).


Brainstorming:

- Andrew had the idea to use extended attributes on a .jar file to
  point to the corresponding .so.  However it turns out that these
  attributes aren't enabled by default on a certain OS near and dear
  to us.  Also, I suspect this approach may suffer from the same
  problems as the GCJLIBS approach.

- Hash map.  One idea to avoid the scalability problem is to teach
  gcj-jit that multiple .class signatures can map to a single .so.
  Then we can compile each Eclipse 3 .jar to a single .so, build
  maps, and somehow feed the info to libgcj.

- Factory class.  One idea is to defer low-level decision-making.
  The idea goes, we define an API for making these decisions.  Then
  at startup we consult a property; this property's value is the name
  of a factory class which we instantiate.  Decisions are delegated
  to this instance.  This gives us a lot of flexibility, and
  hopefully lets us avoid being locked into a mistake.

  One drawback of this approach is that it requires setting a
  property, which means touching startup scripts or linking with the
  proper -D option.  (OTOH, more magical approaches can sometimes
  suffer because they are magical...)

- CodeSource hackery.  We could do something like the GCJLIBS
  approach, but instead of changing URLClassLoader, we would change
  VMClassLoader.defineClass to look at the class' CodeSource, and try
  to find a .so "nearby".  We'd probably need the hash map idea to
  make this work well.

I think we will need at least the hash map and CodeSource ideas.  I'm
also partial to the factory class approach, but I haven't tried to
write down what the interface might look like; there may be
undesirable consequences of this.

Some important considerations:

- Avoid excessive application changes.  We went down this road with
  Eclipse and, unless upstream is very friendly, it just isn't
  maintainable.

- Be invisible.  Real applications out there depend on all kinds of
  things, like CodeSource pointing where they expect it to point.

- Performance.  The point of this exercise is to make things perform
  well.

- Versioning.  We've been hurt a bit by early decisions that didn't
  take into account compatibility needs of the future (i.e., the
  class-to-soname loading idea causes crashes pretty regularly for
  me).  We need to be future-proof in some intelligent way.  This is
  at least partly taken care of by the BC code.

  Currently I think we're considering the 4.0 BC code a preview.
  We'll try to preserve binary compatibility with future releases
  (assuming one of us actually gets around to adding a version number
  to the BC output :-), but I don't think we're planning to promise
  it until some later release.


Anyone have other ideas, comments, or constraints?
Arguments, inspiration, condemnations, praise, or kvetching?

I would like to do some experimenting in this area soon; now that I've
got Eclipse 3 running with the interpreter I'm itching to see it
running precompiled.

Finally let me say that the BC ABI has been a huge success so far.  It
has already turned precompiling Eclipse 2 from a multi-month effort to
something that is essentially trivial.  We should all shower praise on
Andrew and Bryce.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]