Jv_AllocBytesChecked (Was: What is wrong with Vector?)

Boehm, Hans hans_boehm@hp.com
Thu Dec 21 16:33:00 GMT 2000


> -----Original Message-----
> From: Tom Tromey [ mailto:tromey@redhat.com ]
> This sounds good to me.
> 
> One issue though is that this means we'll have to build the GC with
> -fexceptions.  I don't know whether this has a performance impact or
> merely a size impact.
Good point, which I hadn't considered.  The exception would only pass
through a few functions, so hopefully this wouldn't be a big deal.  We could
confine -fexceptions to the files defining the top level allocation
functions.  The gcc man page claims this has only a size impact. (I assume
it does effectively add flow edges to the analysis, but I'm not sure that
has much impact if you only pass through the exception.)
> 
> Hans> And my real motivation is that I get the same benefit for
> Hans> another allocation function I need for hash synchronization.
> 
> I've been meaning to ask you more about this work.  Is you change
> going to simply put the current mutex structures into a hash table, or
> will it also involve moving to more lightweight synchronization
> primitives?  These two issues seem orthogonal to me.  Is there some
> hidden dependency?
I'm doing both.  I think they interact fairly well.  I statically allocate a
bunch of lightweight locks in the hash table, and dynamically allocate the
heavyweight ones as I need them, effectively using the lightweight locks to
lock hash table chains as well.

I think the argument for combining them is that a lightweight mechanism will
probably need a heavyweight backup.  Thus you need two different kinds of
locks.  The hash table itself also pretty much needs two kinds of locks:
statically allocated and dynamically allocated.  Otherwise the common case
is too slow.  If you combine them orthogonally, you naturally end up with
four kinds of locks, where my current scheme gives you two.  In spite of
that, I believe that you could perhaps do it orthogonally.

My code is still buggy.  But so far, it looks like the I reduce the runtime
of the UCSD synchronization benchmark by about 30%.  (This is on X86, where
I still spend too much time in my pthread_self() replacement, and
pthread_self() itself is atrociously slow.)  The results for programs that
do a lot of synchronization on thread-local variables should be better,
since I save a lot of allocations there as well.  (But those don't work yet
:-(  )
> 
> I've read that some VMs (Electrical Fire, at least) stack allocate
> synchronization objects, at least in the very common case that
> monitorenter and monitorexit are paired in a method.  How hard do you
> think this would be to implement?  Do you think it is worth doing?
> 
I should look at this more.  My impression is that's hard to do, because we
don't have a second header word that we could point at the stack allocated
structure.  (A big motivation on my part is to get us down to 1 word.)  I
also though that there was a Sun patent on some of this?

Hans 


More information about the Java mailing list