<!DOCTYPE article PUBLIC "-//Davenport//DTD DocBook V3.0//EN">
<article>
<artheader>
<title>Java/C++ integration</title>
<subtitle>Writing native Java methods in natural C++</subtitle>
<authorgroup>
<author>
<firstname>Per</firstname><surname>Bothner</surname>
<affiliation>
<orgname>Cygnus Solutions</orgname>
<address>
<email>bothner@cygnus.com</email>
<street>1325 Chesapeake Terrace</street>
<city>Sunnyvale</city>, <state>CA</state> <postcode>94089</postcode>,
<country>USA</country>
</address>
</affiliation>
</author>
</authorgroup>
<date>November, 1997</date>
</artheader>

<sect1><title>Background</title>
<para>
Not all the code in a Java application can be written in Java.  Some
must be written in a lower-level language, either for efficiency
reasons, or to access low-level facilities not accessible in Java.
For this reason, Java methods may be specified as <quote>native</quote>.
This means that the method has no method body (implementation)
in the Java source code.  Instead, it has a special flag which
tells the Java virtual machine to look for the method using
some unspecified lookup mechanism.
</para>
<para>
Sun's original Java Development Kit (JDK) version 1.0 defined a
programming interface for writing native methods in C.
This provided rather direct and efficient access to the underlying
VM, but was not officially documented, and was tied to specifics
of the VM implementation.  (There was little attempt to make it an
abstract <acronym>API</acronym> that could work with any VM.)
</para>
<para><emphasis>This document is a proposal and a work-in-progress.
It is not a specification, and Cygnus makes no commitment to implement
any part of the proposal.
Note also that I use the word <quote>Java</quote> (a trademark of
Sun Microsystems) rather casually.  This needs to be cleaned up.
(Cygnus has not yet decided what we will call our implementation
of the Java language platform.)
</emphasis></para>
<para>
Assymmetrix has a Supercede Java environment that boasts
<quote>seamless</quote> C++/Java integration.
That needs to be investigated.
</para>
<sect2><title>The Java Native Interface</title>
<para>
In JDK 1.1, Sun defined a <quote>Java Native Interface</quote>
(<acronym>JNI</acronym>) that defines the
offical portable programming interface for writing such
<quote>native methods</quote> in C or C++.
This is a binary interface (<acronym>ABI</acronym>), allowing someone
to ship a compiled library of <acronym>JNI</acronym>-compiled native code,
and have it work with any VM implementation
(for that platform).  The downside is that it is a rather heavy-weight
interface, with substantial overheads.  For example, for native code to
access a field in an object, it needs to make two function calls
(though the result of the first can be saved for future accesses).
This is cumbersome to write and slow at run-time.
Worse, for some applications, is that the field is specified by a
run-time string, and found by searching run-time <quote>reflective</quote>
data structures.
Thus the JNI requires the availability at run-time of complete
reflective data (names, types, and positions of all fields, methods,
and classes).  The reflective data has other uses (there is a standard
set of Java classes for accessing the reflective data), but when memory
is tight, it is a luxury many applications do not need.
</para>
<para>
As an example, here is a small Java example of a class
intended for timing purposes.  (This could be written in portable
Java, but let us assume for some reason we don't want to do that.)
<literallayout>
package timing ;
class Timer {
  private long last_time;
  private String last_comment;
  /** Return time in milliseconds since last call,
   * and set last_comment. */
  native long sinceLast(String comment);
}
</literallayout>
This is how it could be programmed using the <acronym>JNI</acronym>:
<literallayout> 
extern "C" /* specify the C calling convention */ 
    jdouble Java_Timer_sinceLast (
         JNIEnv *env,           /* interface pointer */
         jobject obj,           /* "this" pointer */
         jstring comment)   /* argument #1 */
{
  // Note that the results of the first three statements
  // could be saved for future use (though the results
  // have to be made "global" first).
  jclass cls = env->FindClass("timing.Timer");
  jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
  jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
                                             "Ljava_lang_String;");

  jlong old_last_time = env->GetLongField(obj, last_time_id);
  jlong new_last_time = calculate_new_time();
  env->SetLongField(obj, last_time_id, new_last_time);
  env->SetObjectField(obj, last_comment_id, comment);
  return new_last_time - old_last_time;
}
</literallayout>
Note the first <literal>env</literal> parameter, which is a pointer to
a thread-specific area, which also includes a pointer to a table of
functions.  The entire JNI is defined in terms of these functions,
which cannot be inlined (since that would make JNI methods no
longer binary compatible across VMs).
</para>
<para>
The Cygnus Java product will support the JNI, but we will also offer
a more efficient, lower-level, and more natural native API.
The basic idea is to make GNU Java compatible with GNU C++ (G++), and provide
a few hooks in G++ so C++ code can access Java objects as naturally
as native C++ objects.  The rest of this paper goes into details
about this integrated Java/C++ model.
</para>
<para>
We will go into more detail about this "Kaffe Native Interface"
(<acronym>KNI</acronym>) in this paper.  However, the key is that the
calling conventions and data accesses for KNI are the same as for
normal nonnative Java methods.  Thus there is no extra
<classname>JNIEnv</classname> parameter, and the C++ programmer gets
direct access to the VM representation.  This does require co-ordination
between the C++ and Java implementations.
</para>
<para>
Here is the earlier example written using KNI:
<literallayout>
#include "timing_Timer.h"

timing::Timer::sinceLast(jstring comment)
{
  jlong old_last_time = this->last_time_id;
  jlong new_last_time = calculate_new_time();
  this->last_time_id = new_last_time;
  this->last_comment_id = comment;
  return new_last_time - old_last_time;
}
</literallayout>
This uses the following automatically-generated
<filename>timing_Timer.h</filename>:
<literallayout>
#include &lt;kni.h&gt; // "Kaffe Native Interface"
class timing {
  class Timer : public java::lang::Object {
    jlong last_time;
    jstring last_comment;
  public:
    jlong virtual sinceLast(jstring comment);
  };
};
</literallayout>
</para>
</sect2>
</sect1>

<sect1><title>Utility macros</title>
<para>
Whether or not we are using the JNI, we still need a toolkit of utility
functions so C++ code code can request various services of the VM.
For operations that have a direct correspondence in C++ (such as accessing
an instance field or throwing an exception), we want to use the C++ facility.
For other features, such as creating a Java string from a nul-terminated
C string, we need utility functions.
In such cases we define a set of interfaces that have similar names
and functionality as the JNI functions, except that they do not
depend on a <literal>JNIEnv</literal> pointer.
</para>
<para>
For example, the JNI interface to get a Java string from a C string is
the following in C:
<literallayout>
jstring str = (*env)->NewStringUTF(env, "Hello");
</literallayout>
and the following in C++:
<literallayout>
jstring str = env->NewStringUTF("Hello");
</literallayout>
(The C++ interface is just a set of inline methods that warp the C interface.)
</para>
<para>
In the KNI, we do not use a <literal>JNIEnv</literal> pointer, so the
usage is:
<literallayout>
jstring str = JvNewStringUTF("Hello");
</literallayout>
We use the prefix <literal>Jv</literal> to indicate the KNI facilities.
</para>
<para>
It is useful to be able to conditionally compile the same source to
use either the fast KNI or the portable JNI.
That is possible, with some minor inconvenience,
because when <literal>USE_JNI</literal> is defined, the <literal>Jv</literal>
features are defined as macros that expand to JNI functions:
<literallayout>
#if USE_JNI
#define JNIENV() JvEnv /* Must be available in scope. */
#define JvNewStringUTF(BYTES) \
  ((JNIENV())->NewStringUTF(BYTES))
#else /* ! USE_JNI */
extern "C" jstring JvNewStringUTF (const char*);
#endif /* ! USE_JNI */
</literallayout>
</para>
<para>
Field access are more tricky.  When using JNI, we have to use
a <literal>jfieldId</literal>, but when using KNI we can access the
field directly.  We require that the programmer uses a convention where
the <literal>jfieldId</literal> used to access a field named
<literal>foo</literal> is <literal>foo_id</literal>.
<literallayout>
#if USE_JNI
#define JvGetLongField(OBJ, FIELD) \
  (JNIENV()->GetLongField(OBJ, FIELD##_id))
#else
#define JvGetLongField(OBJ, FIELD) ((OBJ)->FIELD)
#endif
</literallayout>
</para>
<para>
Here is how we can write the earlier example to support either interface:
<literallayout>
#if USE_JNI
extern "C" jdouble
Java_Timer_sinceLast (JNIEnv *JvEnv, jobject JvThis,
                      jstring comment)
#else
jdouble
timing::Timer::sinceLast(jstring comment)
#endif
{
#if USE_JNI
  jclass cls = env->FindClass("timing.Timer");
  jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
  jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
                                             "Ljava_lang_String;");
#endif
  jlong old_last_time = JvGetLongField(JvThis, last_time);
  jlong new_last_time = calculate_new_time();
  JvSetLongField(JvThis, last_time, new_last_time);
  JvSetObjectField(JvThis, last_comment, comment);
  return new_last_time - old_last_time;
}
</literallayout>
</para>
</sect1>

<sect1><title>Using the C language</title>
<para>
Some programmers might prefer to write Java native methods using C.
The main advantages of that are that C is more universally available
and more portable.  However, if portability to multiple Java implementations
is important, one should use the JNI.  Still, it might be nice to have
<literal>Jv</literal>-style macros that would allow one to select between
portable JNI-based C, or Kaffe-optimize KNI.  The problem is that an
efficient KNI-style interface is much more inconvenient in C than in C++.
In C++, we can have the compiler handle inheritance, exception handling,
name mangling of methods, and so on.  In C the programmer would have to
do much more of this by hand.  It should be possible to come up with a
set of macros for programmers willing to do that.  I am not convinced
that this is a high priority, given that most environments that support
C and Java will also support C++.  The main issue is whether it is OK
to require a C++ compiler to build the Kaffe native methods.
If using C++ makes it easier to write core Java libraries more efficiently,
I think the trade-off is worth it.
</para>
</sect1>

<sect1><title>Packages</title>
<para>
The only global names in Java are class names, and packages.
A <firstterm>package</firstterm> can contains zero or more classes, and
also zero or more sub-packages.
Every class belongs to either an unnamed package or a package that
has a hierarchical and globally unique name.
</para>
<para>
A Java package is mapped to a C++ <firstterm>namespace</firstterm>.
The Java class <literal>java.lang.String</literal>
is in the package <literal>java.lang</literal>, which is a sub-package
of <literal>java</literal>.  The C++ equivalent is the
class <literal>java::lang::String</literal>,
which is in the namespace <literal>java::lang</literal>,
which is in the namespace <literal>java</literal>.
</para>
<para>
The suggested way to do that is:
<literallayout>
// Declare the class(es), possibly in a header file:
namespace java {
  namespace lang {
    class Object;
    class String;
  }
}

class java::lang::String : public java::lang::Object
{
  ...
};
</literallayout>
</para>
<sect2><title>Leaving out package names</title>
<para>
Having to always type the fully-qualified class name is verbose.
It also makes it more difficult to change the package containing a class.
The Java <literal>package</literal> declaration specifies that the
following class declarations are in the named package, without having
to explicitly name the full package qualifiers.
The <literal>package</literal> declaration can be followed by zero or
more <literal>import</literal> declarations, which allows either
a single class or all the classes in a package to be named by a simple
identifier.  C++ provides something similar
with the <literal>using</literal> declaration and directive.
</para>
<para>
A Java simple-type-import declaration:
<literallayout>
import <replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable>;
</literallayout>
allows using <replaceable>TypeName</replaceable> as a shorthand for
<literal><replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable></literal>.
The C++ (more-or-less) equivalent is a <literal>using</literal>-declaration:
<literallayout>
using <replaceable>PackageName</replaceable>::<replaceable>TypeName</replaceable>;
</literallayout>
</para>
<para>
A Java import-on-demand declaration:
<literallayout>
import <replaceable>PackageName</replaceable>.*;
</literallayout>
allows using <replaceable>TypeName</replaceable> as a shorthand for
<literal><replaceable>PackageName</replaceable>.<replaceable>TypeName</replaceable></literal>
The C++ (more-or-less) equivalent is a <literal>using</literal>-directive:
<literallayout>
using namespace <replaceable>PackageName</replaceable>;
</literallayout>
</para>
</sect2>
<sect2><title>Nested classes as a substitute for namespaces</title>
<para>
G++ does not implement namespaces yet.
However, it does implement nested classes, which provide similar
(though less convenient) functionality.
This style seems to work:
<literallayout>
class java {
  class lang {
    class Object { } ;
    class String;
  };
};

class java::lang::String : public java::lang::Object
{ ... }
</literallayout>
Note that the generated code (including name mangling)
using nested classes is the same as that using namespaces.
</para>
</sect2>
</sect1>

<sect1><title>Object model</title>
<para>
From an implementation point of view we can consider Java to be a subset
of C++.  Java has a few important extensions, plus a powerful standard
class library, but on the whole that does not change the basic similarity.
Java is a hybrid object-oriented language, with a few native types,
in addition to class types.  It is class-based, where a class may have
static as well as per-object fields, and static as well as instance methods.
Non-static methods may be virtual, and may be overloaded.  Overloading in
resolved at compile time by matching the actual argument types against
the parameter types.  Virtual methods are implemented using indirect calls
through a dispatch table (virtual function table).  Objects are
allocated on the heap, and initialized using a constructor method.
Classes are organized in a package hierarchy.
</para>
<para>
All of the listed attributes are also true of C++, though C++ has
extra features (for example in C++ objects may also be allocated statically
or in a local stack frame in addition to the heap).
So the most important task in integrating Java and C++ is to
remove gratuitous incompatibilities.
</para>
<sect2><title>Object references</title>
<para>
We implement a Java object reference as a pointer to the start
of the referenced object.  It maps to a C++ pointer.
(We cannot use C++ references for Java references, since
once a C++ reference has been initialized, you cannot change it to
point to another object.)
The <literal>null</literal> Java reference maps to the <literal>NULL</literal>
C++ pointer.
</para>
<para>
Note that in JDK an object reference is implemented as
a pointed to a two-word <quote>handle</quote>.  One word of the handle
points to the fields of the object, while the other points
to a method table.  GNU Java does not use this extra indirection.
</para>
</sect2>
<sect2><title>Primitive types</title>
<para>
Java provides 8 <quote>primitives</quote> types:
<literal>byte</literal>, <literal>short</literal>, <literal>int</literal>,
<literal>long</literal>, <literal>float</literal>, <literal>double</literal>,
<literal>char</literal>, and <literal>boolean</literal>.
These as the same as the following C++ <literal>typedef</literal>s
(which are defined in a standard header file):
<literal>jbyte</literal>, <literal>jshort</literal>, <literal>jint</literal>,
<literal>jlong</literal>, <literal>jfloat</literal>,
<literal>jdouble</literal>,
<literal>jchar</literal>, and <literal>jboolean</literal>.

<informaltable frame="all" colsep="1" rowsep="0">
<tgroup cols="3">
<thead>
<row>
<entry>Java type</entry>
<entry>C/C++ typename</entry>
<entry>Description</entry>
</thead>
<tbody>
<row>
<entry>byte</entry>
<entry>jbyte</entry>
<entry>8-bit signed integer</entry>
</row>
<row>
<entry>short</entry>
<entry>jshort</entry>
<entry>16-bit signed integer</entry>
</row>
<row>
<entry>int</entry>
<entry>jint</entry>
<entry>32-bit signed integer</entry>
</row>
<row>
<entry>long</entry>
<entry>jlong</entry>
<entry>64-bit signed integer</entry>
</row>
<row>
<entry>float</entry>
<entry>jfloat</entry>
<entry>32-bit IEEE floating-point number</entry>
</row>
<row>
<entry>double</entry>
<entry>jdouble</entry>
<entry>64-bit IEEE floating-point number</entry>
</row>
<row>
<entry>char</entry>
<entry>jchar</entry>
<entry>16-bit Unicode character</entry>
</row>
<row>
<entry>boolean</entry>
<entry>jboolean</entry>
<entry>logical (Boolean) values</entry>
</row>
<row>
<entry>void</entry>
<entry>void</entry>
<entry>no value</entry>
</row>
</tbody></tgroup>
</informaltable>

</para>
</sect2>
<sect2><title>Object fields</title>
<para>
Each object contains an object header, followed by the instance
fields of the class, in order.  The object header consists of
a single pointer to a dispatch or virtual function table.
(There may be extra fields <quote>in front of</quote> the object,
for example for
memory management, but this is invisible to the application, and
the reference to the object points to the dispatch table pointer.)
</para>
<para>
The fields are laid out in the same order, alignment, and size
as in C++.  Specifically, 8-bite and 16-bit native types
(<literal>byte</literal>, <literal>short</literal>, <literal>char</literal>,
and <literal>boolean</literal>) are <emphasis>not</emphasis>
widened to 32 bits.
Note that the Java VM does extend 8-bit and 16-bit types to 32 bits
when on the VM stack or temporary registers.
The JDK implementation
and earlier versions of Kaffe also extends 8-bit and 16-bit
object fields to use a full 32 bits.  However, GNU Java was recently changed
so that 8-bit and 16-bits fields now only take 8 or 16 bits in an object.
In general Java field sizes and alignment are now the same as C and C++.
</para>
</sect2>

<sect2><title>Arrays</title>
<para>
While in many ways Java is similar to C and C++,
it is quite different in its treatment of arrays.
C arrays are based on the idea of pointer arithmetic,
which would be incompatible with Java's security requirements.
Java arrays are true objects (array types inherit from
<literal>java.lang.Object</literal>).  An array-valued variable
is one that contains a reference (pointer) to an array object.
</para>
<para>
Referencing a Java array in C++ code is done using the
<literal>JArray</literal> template, which as defined as follows:
<literallayout>
class __JArray : public java::lang::Object
{
public:
  int length;
};

template&lt;class T&gt;
class JArray : public __JArray
{
  T data[0];
public:
  T&amp; operator[](jint i) { return data[i]; }
};
</literallayout>

The following convenience <literal>typedefs</literal>
(matching <acronym>JNI</acronym>) are provided.
<literallayout>
typedef __JArray *jarray;
typedef JArray&lt;jobject&gt; *jobjectArray;
typedef JArray&lt;jboolean&gt; *jbooleanArray;
typedef JArray&lt;jbyte&gt; *jbyteArray;
typedef JArray&lt;jchar&gt; *jcharArray;
typedef JArray&lt;jshort&gt; *jshortArray;
typedef JArray&lt;jint&gt; *jintArray;
typedef JArray&lt;jlong&gt; *jlongArray;
typedef JArray&lt;jfloat&gt; *jfloatArray;
typedef JArray&lt;jdouble&gt; *jdoubleArray;
</literallayout>
</para>

</sect2>


<sect2><title>Overloading</title>
<para>
Both Java and C++ provide method overloading, where multiple
methods in a class have the same name, and the correct one is chosen
(at compile time) depending on the argument types.
The rules for choosing the correct method are (as expected) more complicated
in C++ than in Java, but the fundamental idea is the same.
We do have to make sure that all the <literal>typedef</literal>s for
Java types map to distinct C++ types.
</para>
<para>
Common assemblers and linkers are not aware of C++ overloading,
so the standard implementation strategy is to encode the
parameter types of a method into its assembly-level name.
This encoding is called <firstterm>mangling</firstterm>,
and the encoded name is the <firstterm>mangled name</firstterm>.
The same mechanism is used to implement Java overloading.
For C++/Java interoperability, it is important to use the
<emphasis>same</emphasis> encoding scheme.  (This is already
implemented in <command>jc1</command>, except for some minor
necessary adjustments.)
</para>
</sect2>

<sect2><title>Virtual method calls</title>
<para>
Virtual method dispatch is handled essentially the same
in C++ and Java -- <abbrev>i.e.</abbrev> by doing an
indirect call through a function pointer stored in a per-class virtual
function table.  C++ is more complicated because it has to support
multiple inheritance.  Traditionally, this is implemented
by putting an extra <literal>delta</literal> integer offset in
each entry in the virtual function table.
This is not needed for Java, which only needs a single function pointer
in each entry of the virtual function table.
There is a more modern C++ implementation technique, which uses
<firstterm>thunks</firstterm>, which does away with the need for the
<literal>delta</literal> fields in the virtual function tables.
This is now an option in G++, and will soon be the default on Linux.
We need to make sure that Java classes (<abbrev>i.e.</abbrev> those that
inherit from <literal>java.lang.Object</literal>) are implemented as
if using thunks.  (No actual thunks are needed for Java classes,
since Java does not have multiple inheritance.)
</para>
<para>
The first one or two elements of the virtual function table
are used for special purposes in both GNU Java and C++;  in Java,
it points to the class that owns the virtual function table.
G++ needs to know that Java is slightly different.
</para>
</sect2>

<sect2><title>Allocation</title>
<para>
New Java objects are allocated using a
<firstterm>class-instance-creation-expression</firstterm>:
<literallayout>
new <replaceable>Type</replaceable> ( <replaceable>arguments</replaceable> )
</literallayout>
The same syntax is used in C++.  The main difference is that
C++ objects have to be explicitly deleted, which in Java they are
automatically deleted by the garbage collector.
For a specic class, we can define in C++ <literal>operator new</literal>:
<literallayout>
class CLASS {
  void* operator new (size_t size) { return soft_new(MAGIC); }
}
</literallayout>
However, we don't want a user to have to define this
magic <literal>operator new</literal> for each class.  It needs to be done
in <literal>java.lang.Object</literal>.  This is not possible
without some compiler support (because the <literal>MAGIC</literal>
argument is class-dependent); however, it is straight-forward to
implement such support.  Allocating an array is a special case,
since the space needed depends on the run-time length given.
</para>
</sect2>
<sect2><title>Object construction</title>
<para>
In both C++ and Java newly created objects are allocated by a
<firstterm>constructor</firstterm>.  In both languages, a
constructor is a method that is automatically called.
Java has some restrictions on how constructors are called,
but basically the calling convention (and overload resolution)
are as for standard methods.  In G++, methods get passed an
extra magic argument, which is not passed for Java constructors.
G++ also has the constructors set up the vtable pointers.
In Java, the object allocator sets up the vtable pointer,
and the constructor does not change the vtable pointer.
Hence, the G++ compiler needs to know about these differences.
</para>
</sect2>
<sect2><title>Object finalization</title>
<para>
A Java methods with the special name <function>finalize</function>
serves some of the function as a C++ destructor method.
The latter is responsible for freeing up any resources owned
by the object before it is destroyed, including deleting
any sub-objects it points.  In Java, the garbage collector will
take care of deleting no-longer-needed sub-objects, so there
is much less need for finalization, but it is occasionally needed.
</para>
<para>
It might make sense to consider the C++ syntax for a finalizer:
<literal>~<replaceable>ClassName</replaceable></literal>
as being equivalent to the Java <function>finalize</function> method.
That would mean that if class that inherits from
<literal>java.lang.Object</literal> defined a C++-style destructor,
it would be equivalent to defining a <function>finalize</function> method.
However, I see no useful need solved by doing that.
Instead:  If you want to define or invoke a Java finalizer from C++ code,
you will need to define or invoke a method named <function>finalize</function>.
</para>
<para>
In this proposed hybrid C++/Java environment, there is no clear
distinction between C++ and Java objects.  Java objects inherit
from <literal>java.lang.Object</literal>, and are garbage collected.
On the other hand, regular C++ objects are not garbage collected,
but must be explicitly deleted.
It may be useful to support C++ objects (that do <emphasis>not</emphasis>
inherit from <literal>java.lang.Object</literal>) that would wantbe
garbage collected.  KNI will probably provide a way to do that,
by overloading <literal>operator new</literal>.
</para>
<para>
What happens if you explicitly <literal>delete</literal> an object
(Java or C++) that is garbage collected?  The Ellis/Detlefs garbage
collection proposal for C++ says that should cause the finalizer
to be run, but otherwise whether the object memory is freed
is unpredictable;  that seems reasonable to me.
</para>
</sect2>
</sect1>

<sect1><title>Interfaces</title>
<para>
A Java class can <firstterm>implement</firstterm> zero or more
<firstterm>interfaces</firstterm>, in addition to inheriting from
a single base class. 
An interface is a collection of constants and method specifications;
it is similar to the <firstterm>signatures</firstterm> available
as a G++ extension.  An interface provides a subset of the
functionality of C++ abstract virtual base classes, but are
normally implemented differently.  Since the mechanism used to
implement interfaces in GNU Java will change, and since interfaces
are infrequently used by Java native methods, we will not say
anything more about them now.
</para>
</sect1>

<sect1><title>Exceptions</title>
<para>
It is a goal of the Gcc exception handling mechanism that it as far as possible
be language independent.  The existing support is geared towards C++,
but should be extended for Java.  Essentially, the Java features are
a subset of the G++ features, in that C++ allows near-arbitrary values
to be thrown, while Java only allows throwing of references to
objects that inherit from <literal>java.lang.Throwable</literal>.
So once the Gcc exception handling is more stable, it should be
trivial to add Java support.  The main change needed for Java is
how type-matching is done;  fixing that would benefit C++ as well.
The main other issue is that we need to make Kaffe's representation
of exception ranges be compatible with Gcc's.
</para>
<para>
The goal is that C++ code that needs to throw a Java exception would
just use the C++ <command>throw</command> statement.  For example:
<literallayout>
throw new java::io::IOException(JvNewStringUTF("I/O Error!"));
</literallayout>
</para>
<para>
There is also no difference between catching a Java exception,
and catching a C++ exception.
The following Java fragment:
<literallayout>
try {
  do_stuff();
} catch (java.IOException ex) {
  System.out.println("caught I/O Error");
} finally {
  cleanup();
}
</literallayout>
could be expressed this way in G++:
<literallayout>
try {
  try {
    do_stuff();
  } catch (java::io::IOException ex) {
     printf("caught I/O Error\n;");
  }
catch (...) {
  cleanup();
  throw;  // re-throws exception
}
</literallayout>
Note that in C++ we need to use two nested <literal>try</literal> statements.
</para>
</sect1>

<sect1><title>Synchonization</title>
<para>
Each Java object has an implicit monitor.
The Java VM uses the instruction <literal>monitorenter</literal> to acquire
and lock a monitor, and <literal>monitorexit</literal> to release it.
The JNI has corresponding methods <literal>MonitorEnter</literal>
and <literal>MonitorExit</literal>.  The corresponding KNI macros
are <literal>JvMonitorEnter</literal> and <literal>JvMonitorExit</literal>.
</para>
<para>
The Java source language does not provide direct access to these primitives.
Instead, there is a <literal>synchonized</literal> statement that does an
implicit <literal>monitorenter</literal> before entry to the block,
and does a <literal>monitorexit</literal> on exit from the block.
Note that the lock has to be released even the block is abnormally
terminated by an exception, which means there is an implicit
<literal>try</literal>-<literal>finally</literal>.
</para>
<para>
From C++, it makes sense to use a destructor to release a lock.
KNI defines the following utility class.
<literallayout>
class JvSynchronize() {
  jobject obj;
  JvSynchronize(jobject o) { obj = o; JvMonitorEnter(o); }
  ~JvSynchronize() { JvMonitorExit(obj); }
};
</literallayout>
The equivalent of Java's:
<literallayout>
synchronized (OBJ) { CODE; }
</literallayout>
can be simply expressed:
<literallayout>
{ JvSynchronize dummy(OBJ); CODE; }
</literallayout>
</para>
<para>
Java also has methods with the <literal>synchronized</literal> attribute.
This is equivalent to wrapping the entire method body in a
<literal>synchronized</literal> statement.
Alternatively, the synchronization can be done by the caller
wrapping the method call in a <literal>synchronized</literal>.
That implementation is not practical for virtual method calls in compiled code,
since it would require the caller to check at run-time for the
<literal>synchronized</literal> attribute.  Hence our implementation of
Java will have the called method do the synchronization inline.
</para>
</sect1>

<sect1><title>Improved String implementation</title>
<para>
The standard Java implementation is a bit inefficient, because
every string requires <emphasis>two</emphasis> object:
A <literal>java.lang.String</literal> object, which contains a
reference to an internal <literal>char</literal> array, which
contains the actual character data.
If we allow the actual <literal>java.lang.String</literal> object
to have a size the varies depending on how many characters it contains
(just like array objects vary in size), we can save the overhead of
the extra object.  This would save space, reduce cache misses,
and reduce garbage collection over-head.
</para>
<literallayout>
class java::lang::String : public java::lang::Object
{
  jint length;  /* In characters. */
  jint offset;  /* In bytes, from start of base. */
  Object *base; /* Either this or another String or a char array. */

private:
  jchar& operator[](jint i) { return ((jchar*)((char*)base+offset))[i]; }

public:
  jchar charAt(jint i)
  {
    if ((unsigned32) i >= length)
      throw new IndexOutOfBoundsException(i);
    return (*this)[i];
  }

  String* substring (jint beginIndex, jint endIndex)
  {
    ...  check for errors ...;
    String *s = new String();
    s.base = base;
    s.length = endIndex - beginIndex;
    s.offset = (char*) &amp;base[beginIndex] - (char*) base;
    return s;
  }
  ...
}
</literallayout>
<para>
The tricky part about variable-sized objects is that we can no longer
cleanly separate object allocation from object construction,
since the size of the object to be allocated depends on the arguments
given to the constructor.  We can deal with this fairly straight-forwardly
from C++ or when compiling Java source code.  It is more complicated
(though quite doable) when compiling from Java byte-code.  We don't
have to worry about that, since in any case we have to support
the less efficient scheme with separate allocation and construction.
(This is needed for JNI and reflection compatibility.)
</sect1>

<sect1><title>Changes needed to G++</title>
<para>
Here is a list of tweaks needed to G++ before it can provide
the C++/Java interoperability we have discussed:
<itemizedlist>
<listitem><para>
We need a utility to translate Java class definitions into
equivalent C++ class declarations.  Most convenient would be
adding the ability for G++ to directly read class properties
from a <filename>.class</filename> file.  However, a simple
program that reads a <filename>.class</filename> and generates
a suitable C++ include file is almost as convenient.
</para></listitem>
<listitem><para>
We need a way to indicate to G++ that the class
<literal>java.lang.Object</literal> is magic, in that it, and all classes
that inherit from it should be implemented following Java conventions
instead of C++ conventions.  We say that such classes have
the <quote>Java property</quote>.
(Our goal is that on the whole it should not
matter, but there are a few places where it matters.  Hopefully,
these are all listed here.)
</para></listitem>
<listitem><para>
Virtual function tables and calls in classes
that have the Java property are different.
</para></listitem>
<listitem><para>
A <function>new</function> expression needs to be modified to call the
correct Kaffe function (for classes that have the Java property).
</para></listitem>
<listitem><para>
The interface to constructors needs to be changed so magic
vtable pointer initialization and the extra constructor argument
do not happen when constructing a Java object.
</para></listitem>
<listitem><para>
The <literal>typedef</literal>s for the primitive types (such as
<literal>jlong</literal>) map to concrete implementation types.
G++ needs some minor changes so that the mangling of those
implementation types are all disjoint (and preferably that the
manglings are the same on all platforms).
<listitem><para>
Change representation of exception ranges to be more suitable for Java.
</para></listitem>
</itemizedlist>
</para>
</sect1>
</article>
