This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: I'm new and wanting to help!


Hi Mark,

On 8/3/06, Mark Wielaard <mark@klomp.org> wrote:
Hi,

On Thu, 2006-08-03 at 11:17 -0600, Tom Tromey wrote:
> Joel> I've just assumed that other VMs don't use copying collectors.
>
> Actually I'm pretty sure that all the high-performance ones do.  But,
> I do agree, whether this is a win depends on workload, at least AIUI.
>
> Joel> One option would be to have support for IdentityHashMaps built
> Joel> in to the GC, so that each instance is rehashed after GC.
>
> In java the system hash code is guaranteed not to change over the
> lifetime of the object, hence the cache-it-in-the-header approach.

Robert Lougher added a compacting collector to JamVM. As far as I
understood there is a bit in the object header to indicate that the
system hash code has been taken of an object, if so, the first time the
object is moved the system hash code (address of the object at that
time) is added as an extra int to the end of the object. I CCed Robert,
I am sure he will be able to provide a better/more correct description.


Yes, JamVM moves objects during compaction so I have the same hashcode issue as a copying collector. That is, if you base the hashcode on the object address the hashcode will change if the object moves.

As has already been mentioned, the obvious approach is to just store
the hashcode in the object header.  The problem with this is that it
adds an extra overhead per object (most VMs use a two word header :
the lockword, and a pointer to the class or vtable).  This is
generally unacceptable because a) most objects are small and b) very
few objects have their hashcode taken.

Instead, I use a tri-state hashcode, encoded using two bits in the
object header.  The states are "unhashed", "hashcode-taken", and
"has-hashcode".  Its operation is relatively simple:

All objects are initially in the unhashed state.  If an objects
hashcode is taken, we use the objects address and move the state to
"hashcode-taken".

When compacting, the compactor examines the hashcode state.  If it is
"unhashed" we can just move the object as nothing has taken the
hashcode so it doesn't matter if the address changes.  But if it is
"hashcode-taken", we must maintain the original value in the future.
So when the object is moved, it is expanded by 1 word and the original
address is added onto the end of the object.  The state is then moved
to "has-hashcode".

When taking the hashcode of an object in state "has-hashcode", the
original address is retrieved from the end of the object.

This approach has the advantage that only objects that have had their
hashcode taken (and subsequently moved) gets the overhead of the
hashcode.  The major drawback is the need for 2 extra bits in the
object header, which can be difficult to find.

Thanks,

Rob.

P.S.  For those interested JamVM alternates between a mark/sweep and a
mark/compact collector.  Allocation is done using a "next-fit"
strategy which is almost as fast as a copying collectors' "bump
pointer allocator", and maintains locality of reference but has been
shown to cause heavier fragmentation than approximate best-fit (Boehm
GC).  Rather than change allocation strategy I decided to incorporate
a compactor, to compact the heap when fragmentation becomes an issue.
The compactor uses a modified Jonkers algorithm.

P.P.S.  I discounted copying-collectors early on in the design of
JamVM because I was (still am?) targetting embedded platforms and
copying-collectors require twice the memory to hold the two
semi-spaces ("to" and "from" space).


Cheers,

Mark




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]