This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

values types for Java


Here is my somewhat delayed proposal for adding value types
to Java in a backward-compatible way.  I don't really have
time to do much about it.  Ideally, this should be a JSR.
--
	--Per Bothner
per@bothner.com   http://per.bothner.com/

PASSING STRUCTS BY VALUE
-------------------------------------
(HIGHLIGHTS/CONCLUSIONS START WITH >>.)

C# has 'struct' (value) types as well as reference types.
These are useful for efficiency.  The following is a
proposal for value types in Java without modifying the
language, but instead defining a *convention* that implementations
may use to optimize value types.

The goal is to let a Java compiler (JIT or ahead-of-time) be able to
implement value types the way C/C++ structs are implemented.  We use a
"Point" with fields x and y as the running example.

Our goal is that a Java programmer can create a Point object,
and pass it to methods that expect a Point struct, with the
compiler automatically passing it by value.  It follows that the
Point class has to be "magic" so the compiler knows that
it needs to be handled specially.

>> A "value class" is a Java class that the compiler/run-time
passes by value instead of by reference.  A "value instance"
is an instance of a value class.

On the other hand, we want legacy Java compilers to be able to
implement value types with reasonable efficiency, so portable code can
use value types and have it work with on any Java implementation.

>> A "value class" is a normal Java class, but with a special
"value type annotation", and following certain conventions.

If a class T inherits from a value class S, then passing a T to a
method expecting an S would require the class to be "trunctated".
This causes unexpected and bad behavior.  Furthermore, if we disallow
value classes from having a vtable pointer (as discussed later), then
inheritance loses much of its point.
>>  A value class must be final.

Suppose a value class instance is passed to a method that
modifies one of its fields.  From the Java programmers point of
view that should modify the original instance that was passed in.
However, because the compiler invisibly passes it by value that
does not happen.  So we need to prevent modification.
>> All non-static fields of a value class must be final.

Stack-allocation and inlining of value-class instances: A major reason
for passing struct parameters is that they can be stack-allocated, so
we don't need heap allocation and gc.  For performance we want to
modify the Java implementation so that value class instances are
stack-allocated.  Specifically, given
  Point p = new Point(x, y);
  moveto(p);
or:
  moveto(new Point(x, y));
we want the compiler to stack-allocate the new Point.
>> The compiler can and should stack-allocate value instances.
This is simple enough once the compiler knows that Point is
a "value class", but there are a number of implications.

First, assume a method that takes a Point parameter p,
and that p is assigned to a field 'f' in some object 'o':
  o.f = p;
We cannot store in 'o' an address since the parameter 'p' is
a value on the stack, and it will become invalid as soon as
the current method returns.  Therefore, we actually have to
copy the value of p into the field f.
>> An object field whose type is a value class is implemented
as a struct field, not a reference field.
More generally:
>> All assignment and passing of value class instances is
done by structure copying, not reference copying.

Similarly, if we allow arrays of Point elements (which would
be a more extensive change than the current suggestion):
>> Elements of array of value classes have to be structures.

It follows that instance of values class do not have object
identity:  If a value gets copied whenever it accessed, then
the concept of a fixed identity has no meaning.
>> Equality of  two value instances is defined in terms of
field-by-field value comparison.
Using the 'new' operator implies creating a new object with its
own identity.  This be misleading for value classes.
>> Values classes have no public constructors.  Instead, new
values are created using static "factory" methods.
Conceptually, these factory methods "collapse" or "intern" values that
are equal.  Conceptually, this is done by hashing on the values of
fields, and keeping a table of existing objects, just like the
standard String intern method does.  That way value identity and
object identity would be the same, as required for value types.  Of
course the actual implementation, at least when using a
value-class-aware Java implementation need not do the actual
interning, but instead translates object equality to value
(field-for-field) equality.

Some proposals (paralleling C++ "plain old data types") prohibit
instance methods.  I don't think that such a restriction is needed.
We can have instance methods without a vtable pointer, as
long as the class is final.  In that case we can call instance
methods directly, without any run-time method lookup or vtable.
In essence we can treat an instance method 'foo(args)' as
syntactic sugar for 'static foo(Point this, args)'.  The 'this'
reference can be passed by reference or by value - it doesn't matter
(in terms of semantics), since all the fields are final.
>> Value classes may have both instance and static methods.

Is a value instance an Object?  Can you pass an Point to a
method that expects an Object?  Doing so would require that an
Point contain a vtable pointer. Also, if we pass a stack-allocated Point
to a method that expects an Object, then at run-time we have
a data pointer into the stack.  This may cause problems and
could confuse the GC, especially if we allow collectable
fields in value classes (as discussed next).
>> (Tentatively) A value instance should not be converted to or
coerced from Object.  (I.e. a value-class-aware compiler should
reject code that does this.)  This may be revisited later.

What if a value class contains a field that points to collectible
data?  Could this complicate GC, given that value instances may
live on the stack?  This should not cause a problem,
in that a parameter/variable/field containing (say) two Object
pointers should be implemented more-or-less the same as two
parameters/variables/fields that plain Object pointers.  One
possible exception might be an unusual architecture that passes
structs in a funny place.  We might also have to modify the
reflection data to handle Object fields.
>> In the initial "relase", values types cannot contain Object fields.  Only
primitive types, other value types, and gnu.gcj.RawData (a non-gc'd
'void*' pointer) are allowed.  We may revisit this later.

We've talked about "value classes", but skirted the issue of
how the compiler can tell which classes are value classes.
There are various options:
(1) Classes that satisfy the needed properties of a value class
(including being final and all instance variables are final) are value
classes.  This is difficult, because of the different semantics of
value classes (lack of identity, possible lack of 'isa'), and it seems
difficult for the compiler to verify (at least locally) that the
"value optimization" is safe.
(2) Use a special compiler flag to declare that a class is a value type.
I don't like this - if nothing else, it complicates Makefiles.
(3) Require value classes to inherit from some special class,
like gnu.gcj.ValueObject.  The problem with this approach is that
makes it difficult to write Java code that is portable, but takes
advantage of the "value optimization" when available (i.e. when using GCJ).
(4) Use some special declaration understood by GCJ that would get
ignored by non-GCJ compilers.  For example we could use a special
JavaDoc comment.  JDK 1.5 annotations may make sense.
My suggestion be for the compiler to test for the
existence of a magic static field name.  I suggest gnu$gcj$VALUE_CLASS.
>> A value class is distinguished by the existance of a static
field named "gnu$gcj$VALUE_CLASS".  For performance this should
be a final static primitive field initialized to a constant.

Note that value classes are similar to the "un-boxed" struct types
in .Net / CLI.  We should look at this more closely, keeping in
mind that we may want to support C#/.Net in the future.

public final class Point
{
  static final boolean gnu$gcj$VALUE_CLASS = true;

  // Fields are final.
  public final float x, y;

  private Point(float x, float y) { this.x = x;  this.y = y; }

  public static Point make (float x, float y)
  { Point p = new Point(x, y);
    // Conceptually:  p = intern(p);
    return p;
  }

  public static Point make(Point p)
  { return make(p.x, p.y); }

  public static boolean equals(Point p1, Point p2)
  {
    return p1.x = p2.x && p1.y == p2.y;
  }

  public float getX() { return x; }
  public float getY() { return y; }
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]