This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Fri, 14 Sep 2007, David Daney wrote:
Hans Boehm wrote:Right. The code is counting on being able to read the (volatile) address field concurrently with a compare_and_swap and seeing a value that was there at some point. The only case in which it does much with the data without relying on the success of another compare_and_swap is the immediately next one ((address & ~(HEAVY | REQUEST_CONVERSION)) == addr && he -> light_thr_id == self). In that case, it saw that that specific hash entry was in use for a lightweight lock held by the current thread. Since it was held by the current thread, it could not have been concurrently released. Thus the address field also could not have been changed asynchronously in the meantime. Thus those field values in fact cannot change asynchronously, and I hold the lightweight lock. Hence I'm allowed to update light_count.The compare_and_swap just succeeded in overwriting the address field, which makes us the holder of the lightweight lock on the given address, and gives us exclusive access to light_thr_id and light_count for the corresponding hash entry. (The LOCKED bit in the address field protects other parts of the hash entry, but not those two fields.)
I am talking about the part where the compare_and_swap fails. The problem I see is all the things that happen after the 'if' block I quoted. Specifically, how can you do anything sane with the value of retrieved from he->address (the last line I quoted)? It could be zero (the thread the owned the lock released it), or it could be an object that hashes to the same value, or the same object we are synchronizing on, there does not seem to be anything that keeps multiple threads (where compare_and_swap fail) out of this area after a first thread succeeds at compare_and_swap.
Yes. It is a SiByte SB1 CPU, which is a dual core mips64 device.This is admittedly a little dubious, since I'm treating the fact that address and light_thr_id are volatile as a guarantee that I can read it atomically. That isn't really true in general. But it should be true on workstation or server-class MIPS machines.
(There is a proposal before the C++ committee to add real atomics to the language, so this will probably get fixed. Until then, this problem is unavoidable.)This is on a MIPS system?
This code is extremely tricky. It's implementing locks, and hence doesn't itself follow a standard locking discipline. It's unlikely there's anything wrong with this particular piece of code, since it's the most common path.
As I said above, I am worried about the less common path. I am seeing occasional failures of the SyncTest.java testcase on a mips system with 2 CPUs. This is a test where four threads acquire and release a single monitor 1,000,000 times each. There are a couple of possibilities I am investigating:
1) Races in the HASH_SYNCHRONIZATION code. 2) Modifications I made to the compare_and_swap primitive have bugs. 3) A hardware error.
Since I cannot convince myself that the HASH_SYNCHRONIZATION code is correct, I thought I would ask about how it was supposed to work.
The failure is what seems to be a deadlock. Normally the test completes in about 10 seconds. Occasionally however it seems never to complete.The failure symptom is that the final value is one or two too low?
I am likewise suspicious about memory ordering. I think that perhaps adding memory barriers at the appropriate places might fix it.You can't get it to fail with assertions enabled here? Or it fails the same way with assertions enabled?
This code was originally tested on Itanium and X86 multiprocessors, so it seems unlikely, but not impossible, that it's generically broken across architectures. I would normally be suspicious about memory ordering issues. But MIPS is normally either sequentially consistent or at least close.
This could be the problem also. I have completely re-written all the atomic memory primitives for MIPS and it is possible that I made a mistake. I should note that the code I replaced had bugs. The compare_and_swap was returning the old value instead of the result of the compare.It might be worth making sure that the compiler isn't moving memory references across a compare_and_swap.
Thanks, David Daney.
Hans
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |