This is the mail archive of the java-discuss@sourceware.cygnus.com mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ReRe: performance (shellsort experiments)


Kresten Krab Thorup <krab@daimi.au.dk> writes:

> With a dynamic compiler that does code-specialization, mosts of such
> casts can be eliminated.  

At the cost of some code bloat and extra complication.

> In context of static compilation, I expect that we could get a
> significant win by just adding a mono-morphic inline cache.

I'm not a big fan of using caching, partly because it makes performance
less predictable.  But it is is a reasonable implementation option.

>  That is, for each expression at the form 
>    x = (Foo)y
> we'd generate code like:
> ...
>        _Jv_CheckArrayStore (y, Foo.class); // may throw exc

I hope not!

> Experiments with Self/Smalltalk in context of dynamic method
> dispatch have shown that this kind of thing improves performance
> dramatically, since most call sites are in fact mono-morphic.

> On top of this, we can improve the performace of isAssignambleFrom;
> but I thing the most significant single thing one could do right now
> would be to implement uniquing for Utf8Const's, both in the linker and
> in the runtime system.

I don't see what that would make a big difference.  In pre-compiled
code, Utf8Consts are primarily used during class loading, as well as
some reflexive operations.  I assume you are thinking of some
specific bottle-necks?  Is it different for interpreted code?

Now there are some bottle-necks such as isAssignableFrom, that
occur commonly.  I would much rather (someone) work on fixing those.
I wrote up design for doing this in constant time.  For those who
haven't seen it, here it is again:

/*** CLASS MEMBERSHIP TESTING ***/

/* The depth of a class is how many "extends" it is removed from Object.
   Thus the depth of java.lang.Object is 0, but the depth of
   javio.ioFilterOutputStream is 2.
   Depth is defined for regular classes and array classes,
   but not primitives types or interfaces. */
#define CLASS_DEPTH(CL) ((CL)->ui.cls.depth)

// We can do class member testing in constant time adding in each class
// using a small extra table of all the ancestor classes.
// (This trick is relatively old.)
// In Kaffe it makes sense to incorporate this table in the dispatchTable.
// Note also we do not need to include the java.lang.Object in the table
// of ancestors, since it is redundant.
// If we order this table from current classes down by decreasing depth,
// we can combine the ancestors table with the pointer to the actual classs.

typedef void *methodptr;

struct _dispatchTable {
  Class ancestors[CLASS_DEPTH(THIS_CLASS)-1];
  methodptr method[...];
};

#define OBJECT_CLASS(OBJ) ((OBJ)->dtable->ancestors[0])

#define OBJECT_INSTANCEOF_CLASS(OBJ, CLS) \
  CLASS_SUBCLASS_OF(OBJECT_CLASS(OBJ), CLS)
#define CLASS_SUBCLASS_OF(OCLS, CLS) \
  ((CLS) == &ObjectClass \
   || ((CLASS_DEPTH(OCLS) - CLASS_DEPTH(CLS) >= 0 \
       && (OCLS)->dtable->ancestors[CLASS_DEPTH(OCLS) - CLASS_DEPTH(CLS)] \
           == (CLS))))

// Notice that the two instances CLASS_DEPTH(OCLS) - CLASS_DEPTH(CLS) can be
// combined to a single subtraction.
// Note that CLS is normally constant, such the first test against ObjectClass
// be normally be eliminated at compile-time.

If we assume that CLASS_DEPTH is seldom more than 4, we can make an
optimization, at the cost of a few extra words per class:

struct _dispatchTable {
  Class clas;
  // If the CLASS_DEPTH is less than 4, allocate 4 words anyway.
  // The unused ones are NULLs.
  Class ancestors[MIN(CLASS_DEPTH(THIS_CLASS)-1, 4)];
  methodptr method[...];
};

#define OBJECT_CLASS(OBJ) ((OBJ)->dtable->clas)
#define CLASS_SUBCLASS_OF_FAST(OCLS, CLS) \
  (OCLS)->dtable->ancestors[CLASS_DEPTH(CLS)-1] == (CLS))))
#define CLASS_SUBCLASS_OF(OCLS, CLS) \
  ((CLS) == &ObjectClass \
   || (CLASS_DEPTH(CLS) <= 4 ?  CLASS_SUBCLASS_OF_FAST(OCLS, CLS) \
     : (CLASS_DEPTH(OCLS) >= CLASS_DEPTH(CLS) \
        && (OCLS)->dtable->ancestors[CLASS_DEPTH(CLS)-1] == (CLS))))

// This wins, because we normally know (at compile-time) the identify
// of CLS and it CLASS_DEPTH.  Hence CLASS_SUBCLASS_OF reduces (at
// compile-time) to  CLASS_SUBCLASS_OF_FAST.

// Reference:  N. Wirth:  Reply to "type-extension type tests can be
// performed in constant time".  ACM TOPLAS 13(4):630, 1991.

/*** INTERFACES TESTS AND CALLS ***/

// Doing (x instanceof I) and calls like (((I)x).method(...)) are more
// difficult.  This is because interfaces may be "multiple inherited".
// Looked an tests can be done in constant time with single inheritance,
// because we can associate a unique depth to each class and a unique
// method index to each virtual index, and in a way that the tables are
// compact.  This is more difficult with multiple inheritance.

// Here I propose a design which I think is fairly efficient
// in both time (constant-time) and space.

// We associate an index with each method declared in an interface.
// For each interface, we ignore any inherited interfaces, sort the methods
// by some well-defined citerium (which could be declaration order),
// and assign indexes starting with 1 (zero is used for type tests).
// We call this index the Interface Methods Index of the interface method.

// Each superinterface of a class (i.e. each interface that the class
// directly or indirectly implements) has a corresponding "Partial
// Interface Dispatch Table" whose size is (number of methods + 1) words.
// The first word is a pointer to the interface (i.e. the java.lang.Class
// instance for that interface).  The remaining words are pointers to the
// actual methods that implement the methods declared in the interface,
// in the order specified by the Interface Method Index of the previous
// peragraph.

// For each non-abstract class, we create a list of its superinterfaces
// These are sorted by some criterium (alphabetical seems as good as any).
// We then create the "Interface Dispatch Table" by concatenating all the
// Partial Interface Dispatch Tables for each superinterface, in the
// order given by the sort.

// The problem now is to find the correct offset in the Interface
// Dispatch Table so we select the correct Partial Interface Dispatch Table
// that we are intersted in.  Once we have that, invoking an interface
// method just requires indexing with the Interface Method Index (known at
// compile time) to get the correct method.  Doing a type test (cast or
// instanceof) is the same problem: Once we have a possible Partial
// Interface Dispatch Table, we just compare the first element to see if
// it matches the desired interface.

// So how can we calculate the correct offset?  Unfortunateky, we can't
// do it at compile time.  It depends on both the interface, and the
// actual class of the receiver.  Our solution is to keep a vector of
// candiate offsets in each interface (the ioffsets table), and
// in each class we have an index (the iindex) used to select the
// correct offset from ioffsets.

struct Class {
  ...;
  union {
    struct {
      /* Index into interface's imap. */
      short iindex;
      short depth;
      /* pointer to vector of pointer to dispatch tables. */
      methodptr **itable;
    } cls;
    struct {
      /* Index into actual class's itable. */
      short *imap;
    } iface;
  } ui;

};

#define CLASS_IINDEX(CL) ((CL)->ui.cls.iindex)
#define CLASS_ITABLE(CL) ((CL)->ui.cls.itable)
#define CLASS_IMAP(CL) ((CL)->ui.iface.imap)

#define LOOKUP_INTERFACE(CL, IFACE, IMETODINDEX) \
  (CLASS_ITABLE(CL)[CLASS_IMAP(IFACE)[CLASS_IINDEX(cl)]][IMETHODINDEX])

struct Class {
  ...;
  union {
    struct {
      /* Index into interface's imap. */
      short iindex;
      short itable_length;
      /* Pointer to Interface Dispatch Table. */
      methodptr *itable;
    } cls;
    struct {
      /* Offsets into actual class's itable. */
      /* First element is the length. */
      /* Negative values are unused. */
      short *ioffsets;
    } iface;
  } ui;

};

#define CLASS_IINDEX(CL) ((CL)->ui.cls.iindex)
#define CLASS_ITABLE(CL) ((CL)->ui.cls.itable)
#define CLASS_ITABLE_LENGTH(CL) ((CL)->ui.cls.itable_length)
#define CLASS_IOFFSETS(CL) ((CL)->ui.iface.ioffsets)
#define CLASS_IOFFSETS_LENGTH(CL) ((CL)->ui.iface.ioffsets[0])

The CLASS_ITABLE can be computed at compile-time, and statically
allocated.  The CLASS_IINDEX of a class, and the CLASS_IOFFSETS
of an interface, have to be computed at link time.  In the case
of dynamic loading, this means run-time.  As more classes are
loaded, CLASS_IINDEX does not change, but the CLASS_IOFFSETS
vector of an interface may need to get extended if new classes
are loaded.  The function find_iiface does this calculation.

#define LOOKUP_INTERFACE(CL, IFACE, IMETODINDEX) \
  (CLASS_ITABLE(CL)[CLASS_IOFFSETS(IFACE)[CLASS_IINDEX(CL)] + IMETHODINDEX])

static short null_offset[2] = { 1, 0 };

/* Calculate and return the CLASS_IINDEX for a new class.
 * IFACES is a vector of NUM interfaces that the class implements.
 * OFFSETS[J] (for 0<=J<NUM) is the offset in the Interface Dispatch Table
 * for the Partial Interface Dispatch table corresponding to IFACE[J].
 * May extend the CLASS_IOFFSETS of the IFACES if need be.
 */

int
find_iindex(Class* ifaces, short offsets, int num)
{
  int i;
  for (i = 1;  i++; )
    {
      for (j = 0;  ;  j++)
	{
	  if (j >= num)
	    goto found;
	  if (i >= CLASS_IOFFSETS_LENGTH(ifaces[j]))
	    continue;
	  int ioffset = CLASS_IOFFSETS(ifaces[j])[i];
	  if (ioffset >= 0 && ioffset != offsets[j])
            break;  /* Nope - try next i. */
	}
    }
 found:
  for (j = 0;  ;  j++)
    {
      int len = CLASS_IOFFSETS_LENGTH(ifaces[j]);
      if (i >= len)
	{
	  /* FIXME - check off-by-one erors.
	     Should len field be counted in len? */
	  int newlen = 2 * len;
	  if (i >= newlen)
	    newlen = i + 3;
	  short *old_ioffsets = CLASS_IOFFSETS(ifaces[j]);
	  short *new_ioffsets;
	  if (old_ioffsets == null_offset)
	    {
	      new_ioffsets = [gc]malloc(newlen * sizeof(short));
	      new_ioffsets[0] = 1;
	      new_ioffsets[1] = 0;
	    }
	  else
	    {
	      new_ioffsets = [gc]realloc(old_ioffsets, newlen*sizeof(short));
	    }
	  new_ioffsets[0] = newlen;
	  while (len < newlen)
	    new_ioffsets[len++] = -1;
	  CLASS_IOFFSETS(ifaces[j]) = new_ioffsets;
	}
      CLASS_IOFFSETS(ifaces[j])[i] = offsets[j];
    }
  return i;
}

Checking that an object is an instance of a class that implements a given
interface (i.e. OBJ instanceof IFACE) uses the same data structures,
and is also constant time:

#define OBJECT_INSTANCEOF_INTERFACE(OBJ, IFACE) \
  CLASS_INSTANCEOF_INTERFACE(OBJECT_CLASS(OBJ), IFACE)

#define CLASS_INSTANCEOF_INTERFACE(CL, IFACE) \
  (CLASS_IINDEX(CL) <= CLASS_IOFFSETS_LENGTH(IFACE) \
   && ({unsigned short __offset = CLASS_IOFFSETS(IFACE)[CLASS_IINDEX(CL)]; \
        __offset < CLASS_ITABLE_LENGTH(CL) \
        && CLASS_ITABLE(CL)[__offset]] == IFACE}))

This requries three tests.  The first two are just bounds tests.
We can remove the first test if we ensure that there is no CLASS_IINDEX
lonker than any CLASS_IOFFSETS_LENGTH.  That imples that all the ioffsets
table have to have the same length:  That of the largest CLASS_IINDEX.
Unused elements of ioffsets could be marked with -1 - that would be safe
as long as the comparsion against CLASS_ITABLE_LENGTH(CL) is unsigned.

Using that data from experiements with classes.zip, that would require a
maximum of 14 bytes for 105 interfaces that do not share null_offset.
In other words:  fairly minor.  More space can be saved by sharing ioffsets
vectors that are indentical; by a more clever allaoction of iindex values;
and by a more clever re-arrangement of the Partial Interface Dispatch Tables
with the Interface Dispatch Tables.

One complication is that if a new classes is added that requires longer
ioffsets vectors, then the ioffsets of all interfaces have to be relocated.
But this should have infrequently, and only when a new class is loaded.

CLASS_INSTANCEOF_INTERFACE is unfortunately a trifle long to be inlined.

// Related work:

// Vitek, Horspool and Krall: ""Efficient Type Inclusion Tests".
// OOPSLA 97.  ACM SIGPLAN Notices 32(10:142.  1997.

// Some of the same people are also behind Cacao, a fast JIT for Alpha.
// See http://www.complang.tuwien.ac.at/java/cacao/.
// Cacao uses "coloring":  Each interface or interface is assign a unique
// global index, rather like perfect hashing.  This makes dispatch fast,
// but the per-class lookup tables are not dense.  (In my proposed design,
// the per-class lookup tables are dense, but the per-interface
// ioffset vectors may be sparse.  However, the ioffsets vectors are
// still small and there are fewer interfaces than classes (that implement
// interfaces).  How much space is saved, and if it is worth it,
// is open.)

// It seem slikely that there will be more interfaces and more classes
// implemeting interfaces in JDK 1.1, and code based on it.  For example,
// JDK 1.2 has new collector interfaces (such as List) that may become
// popular.

// Some relevant papers (including overview) are available from University
// of Geneva's Objects Systems Group
// http://cuiwww.unige.ch/OSG/publications/index.html

// Numbers:
JDK 1.1.4 classes.zip
153 interfaces
1458 classes total
1145 classes implement 0 interfaces
 153 classes implement 1 interface (directly or indirectly)
 105 classes implement 2 interfaces (directly or indirectly)
  39 classes implement 3 interfaces (directly or indirectly)
  10 classes implement 4 interfaces (directly or indirectly)
   5 classes implement 5 interfaces (directly or indirectly)
   1 class (java.awt.AWTEventMulticaster) implements 12 interfaces
160 classes implement more than 1 interface
A very simple-minded assignment of ioffsets yields a maximum length of
7 2-byte shorts.  (We could use 1-byte elements of each ioffset table
to save some space, but that might prevent classes that implement many
interfaces with many methods.  There are none such in classes.zip.)

-- 
	--Per Bothner
bothner@pacbell.net  per@bothner.com   http://home.pacbell.net/bothner/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]