This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
RE: criticalsections are very fast / pooling mutexes / how does hash-sync work?
- From: "Boehm, Hans" <hans_boehm at hp dot com>
- To: "'Adam Megacz '" <gcj at lists dot megacz dot com>, "'java at gcc dot gnu dot org '" <java at gcc dot gnu dot org>
- Date: Sun, 23 Dec 2001 17:08:25 -0800
- Subject: RE: criticalsections are very fast / pooling mutexes / how does hash-sync work?
I don't understand your concern about moving CriticalSections. The
collector doesn't move anything. Why would the memory address change?
Your description of hash synchronization is mostly right, but there is a bit
more to it. In the easy case, no pthreads/OS lock is actually acquired.
Each hash table entry contains both a "lightweight lock" data structure, and
a chain of "heavyweight" lock data structures. The lightweight lock is used
if at most one object hashing to a given entry is locked at a time, and no
object hashing to that location is being waited on with Object.wait().
Lightweight locks are acquired and released using primarily a
compare-and-swap operation.
I don't think it would be that difficult to port this to win32, and that may
also be a way around some of these issues, since the fast path no longer
involves any win32 calls. You will have to add the appropriate
compare-and-swap definition (natObject.cc), and supply a few other odds and
ends.
Hans
-----Original Message-----
From: Adam Megacz
To: java@gcc.gnu.org
Sent: 12/23/01 3:16 PM
Subject: criticalsections are very fast / pooling mutexes / how does
hash-sync work?
ye gads.
Using win32 criticalsections (userspace) instead of mutexes
(kernelspace) makes a huge, huge difference -- it shaved at least 30%
off my startup time and the obnoxious delays are almost gone. I say
"almost" because I'm still seeing a tiny delay -- barely
human-noticable. I suspect that this is because
InitializeCriticalSection() isn't a very fast operation.
I'm thinking of pooling CriticalSections -- when the blocked count on
a CriticalSection drops to zero, it goes back into the pool. I already
had to introduce a level of indirection (_Jv_Mutex is now a pointer to
a heap object instead of an actual CRITICAL_SECTION) because win32
barfs if the memory address of a CRITICAL_SECTION changes. So this
should be easy -- when the blocked count on a CRITICAL_SECTION drops
to zero, the corresponding _Jv_Mutex is set to NULL. When you try to
block on a NULL _Jv_Mutex, we check a new one out of the pool.
The downside is that blocking on an unheld mutex will require two
EnterCriticalSection()'s, but these are so fast in comparison to
InitializeCriticalSection() that I think it will be a performance win
in just about every usage scenario.
BTW, here's how I understand hash synchronization to work: all
_Jv_Mutex'es are stored in a hash table, keyed on the memory address
of their owner. The advantage is that many Object's are now PTRFREE
and can be stored in heap areas which aren't scanned (thus speeding up
GC). The other advantage is that you save memory on objects which are
never synchronize()d on -- no space is needed for a _Jv_Mutex, or even
a pointer to one.
Is this understanding correct? If so, why doesn't it work
out-of-the-box on all platforms, automatically?
- a