Bug 27598 - WeakHashMap can fail with concurrent readers
Summary: WeakHashMap can fail with concurrent readers
Alias: None
Product: classpath
Classification: Unclassified
Component: classpath (show other bugs)
Version: unspecified
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on: 18187
  Show dependency treegraph
Reported: 2006-05-14 05:15 UTC by Hans Boehm
Modified: 2006-10-30 22:16 UTC (History)
2 users (show)

See Also:
Host: any
Target: any
Build: any
Known to work:
Known to fail:
Last reconfirmed:

The test case that passes (538 bytes, text/plain)
2006-10-30 20:51 UTC, Audrius Meškauskas

Note You need to log in before you can comment on or make changes to this bug.
Description Hans Boehm 2006-05-14 05:15:10 UTC
Java Collections are generally expected to be safe in the presence of concurrent reads.

However many operations that read WeakHashMaps process the reference queue as part of the operation, and may thus delete elements from the WeakHashMap.  As a result, if two threads concurrently call e.g. size(), two threads may end up concurrently deleting elements from the WeakHashMap, resulting in a damaged data structure.

(This was based on code inspection, that failure has not been observed.  But an actual failure would be irreproducible, and hence probably not result in a bug report.)

An off-line discussion concluded that the right way to fix this is probably to include enough internal locking to ensure that two concurrent readers cannot interfere.

A corresponding bug report (6425537) was filed against the Sun JVM, based on the same discussion.
Comment 1 Andrew Pinski 2006-05-14 05:23:32 UTC
I wonder if this is related to PR 18187.
Comment 2 Audrius Meškauskas 2006-10-30 20:51:03 UTC
Created attachment 12515 [details]
The test case that passes

I wrote a simple test case, where 100 threads are reading from the same weak hash map, and the size of map is gradually decreasing when the entries are gc collected. Both in Sun's and ours implementation the test seems passing (either null or correct entry is returned). Hence I cannot reproduce this bug and would suggest to close this as unreproducible.
Comment 3 Hans Boehm 2006-10-30 22:16:42 UTC
I strongly disagree with closing this.  This is a threading bug.  It's nasty precisely because it is not systematically reproducible.  That's no reason to close it.

The problem is obviously still there.  Various readers call cleanQueue, which calls internalRemove, which updates the data structure all without synchronization.

The test case may have failed to catch it either because it doesn't do a good job of testing for the kind of corruption that may occur here (lost deletions, size decrements), or because it was run on two few processors to make the failure likely, or because a failure in a hash table this large is probably unlikely anyway.  A test case that actually reproduces an obscure threading bug like this is valuable; in my opinion, the fact that a test case doesn't fail doesn't mean much in cases like this.

It makes sense to close unreporducible bugs only if we can't track them down as a result.  We already understand the problem here.