Introduction -- the GCJ Binary Compatibility ABI

A number of Java applications are written under the assumption that they will only ever be interpreted by a JVM, and not natively compiled by an ahead-of-time compiler like GCJ.

These Java applications might make assumptions about their runtime environment which make it very difficult to run them natively compiled without significant source code modifications. Specifically, these applications contain their own code to read bytecode streams and load them as classes using the ClassLoader.defineClass() method. The Eclipse IDE is a well-known example where virtually the entire application is loaded in this manner.

You can see how that would be a problem, because there *is* no bytecode stream for a natively compiled class. If you natively compile a program which uses defineClass() you end up with one class trying to read a bytecode stream for another class which may not even exist as bytecode, and the house of cards falls down. Further, when you natively compile a class that goes looking to load other classes, the old default was to look for it as only a natively compiled class.

To get around this problem, GCJ4 has a new compilation mode called indirect dispatch which uses the new Binary Compatibility ABI. You can think of the old compilation mode as "direct dispatch" if you like. To read about all the details see ftp://gcc.gnu.org/pub/gcc/summit/2004/GCJ%20New%20ABI.pdf

In a nutshell though, to do its job, GCJ uses gcj-dbtool (see gcj-dbtool(1)) which generates a ".db" file that gets used by gcj later on. The idea is, you first natively compile your .jar or .class files (which the app you plan on natively compiling depends on) using the =-findirect-dispatch= option. Then you use gcj-dbtool to squirrel away info about those natively compiled classes in a .db database file.

During native compilation of the program you're building, when GCJ sees =defineClass()= getting called, the bytecode passed is matched (using the map contained in the .db file) against shared libraries containing the classes which you've already natively compiled.

The mapping from a class to the compiled form in the .db file maps the signature of a class (a crytographic checksum) to a shared library. You might think of it as using GCJ as a kind of caching JIT.

There are other important details about the BC-ABI which deal with making sure that certain kinds of binary compatibility rules are followed by native code. The stuff here about having native code do the right thing when dealing with .class files is a nice side-effect. :)

Most typical Java applications that do not utilize custom classloaders don't need to worry about any of this. The BC-ABI can be enabled (by passing the =-findirect-dispatch= option to gcj for the code you're building, independently of =gcj-dbtool=, to generate a binary that should continue to work despite changes to libgcj and other dependent class libraries (see gcj(1)). On the other hand, if you compile an application using the old style "C++ ABI" (that is, without using the -findirect-dispatch option), it will break as soon as any changes to the public APIs of dependent class libraries are made -- just like how it does with C++. In the past, as you can imagine, this was a significant limitation and an impediment to more widespread adoption of GCJ.

After the BC-enabled native-compilation build is complete, to the outside world, the original application still does what what it used to do -- for example, loading bytecode as usual from, say, some .jar file that hasn't been natively compiled. However, the =defineClass()= calls it had been making have now been modified; they first try and find a natively compiled form of the class they're looking for (likely now in a shared library), and then fall back on interpreting the .class files (likely in a .jar file) if no natively compiled class can be found.

If you want to be able to call interpreted code from native code, you need to use -findirect-dispatch when building your native code. If you don't use indirect dispatch, your code will not be able to load bytecode classes when necessary. You can sneak around that requirement if, in your code, you never mention new ThatClassIWant(); , manually load classes, and use factory methods, but that would quickly become monotonous.

Calling native code from interpreted code works fine regardless of the ABI. Using indirect dispatching for interpreted code is, by definition, the default. Yes, using =-findirect-dispatch= for native compilation should be the default, but it isn't yet -- mostly because it's not quite finished. The major remaining roadblocks are:

Certainly, the goal is to make BC the default ABI as soon as the issues are resolved.

There is also a performance hit from the BC-ABI. The tests Bryce McKinlay have done show this to be relatively small (<= 10%) for most applications, but it would be good to have more data.

Compiling the JAR

*IMPORTANT*: While <code>-findirect-dispatch</code> sometimes works when compiling direct from Java source files to native code, it doesn't always work. That case has not been fully implemented yet (as of gcj 4.0.x), and is unsupported. The only supported way of compiling with <code>-findirect-dispatch</code> is the way described herein.

The first step is to compile all the JAR files in your application. For instance:

gcj -shared -findirect-dispatch -Wl,-Bsymbolic -fjni -fPIC myapp.jar -o myapp.jar.so

This will compile all the contained classes. Note that no class path setting is required; with the BC ABI all classes are linked at runtime. When using <tt>-findirect-dispatch</tt> you must currently use <tt>-Wl,-Bsymbolic</tt>.

Setting up the database

First create a database.

gcj-dbtool -n myapp.db

Now, add all the jar files you compiled to the database, e.g.:

gcj-dbtool -a myapp.db myapp.jar myapp.jar.so

Run it

Now you can run your application. Launch it with gij, just as if you were launching it with the java command. However, point gij at the .db you created:

gij --cp myapp.jar -Dgnu.gcj.precompiled.db.path=myapp.db org.package.ClassName etc

Note: The application's jar files have to be in the classpath nevertheless.

Convenience

Have a bunch of Jar files and are a bit lazy to do all the BC compilation work manually? Here is a script that descends into every directory, processes every JAR file with GCJ and adds the new library to the database file.

# ${1} is the name of a GCJ database file
gcj-dbtool -n ${1}

for JAR_FILE in `find -iname "*.jar"`
do
        echo "Compiling ${JAR_FILE} to native"
        gcj -shared -findirect-dispatch -Wl,-Bsymbolic -fjni -fPIC -o ${JAR_FILE}.so ${JAR_FILE}
        gcj-dbtool -a ${1} ${JAR_FILE} ${JAR_FILE}.so
done

Invoke this script with the name of the database file and wait some time. If everything is fine this will leave you with all the native libraries and a filled database which can now be used for running apps with GCJ.

Examples

Interested in some real life examples? Head over to the GNU Classpath Showcase page. There you can see how easy it is to build native Eclipse.

Troubleshooting

1. How can I tell whether code is running in interpreted mode or native mode?

Run gij with -verbose:class.

Alternatively, there is a quirk in gij 4.0 that you can take advantage of. Stack traces from gij will show "<code>(Unknown Source)</code>" for lines that are being interpreted, and library names for compiled code. *NOTE:* This "<code>(Unknown Source)</code>" quirk will go away in 4.1.

2. My code is running in interpreted mode, even though I followed the instructions above. Why?

In gcc 4.0, no errors are reported for <code>.jar.so</code> files that fail to load for some reason - they are just silently ignored. This policy may be changed in future.

There are two likely reasons for code to be incorrectly running in interpreted mode:

a) You forgot to compile code that uses JNI (Java Native Interface) with the <code>-fjni</code> compiler option. In this case the VM will repeatedly try, and fail, to load and link the library, each time it tries to load a class corresponding to that library. (The library fails to link because it tries to link in CNI style, which fails.)

or

b) Your mapping file (e.g. <code>classmap.db</code>) is out of date, or not on the <code>gnu.gcj.compiled.db.path</code>

3. When I run my application under gdb, gdb is loading my library many times, over and over again

You probably forgot to specify <code>-fjni</code> when you compiled. See 2.a. above.

None: How_to_BC_compile_with_GCJ (last edited 2008-01-10 19:38:46 by localhost)