This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: 4.0.0 JNI/reflection invocation 2/4/8 times as slow as in 3.3.3
Ranjit Mathew writes:
> Tom Tromey wrote:
> > Jost> Imagine a program which allocates and immediately removes a key. If you
> > Jost> let the hash() distribute the keys all over the table, your table gets
> > Jost> filled with tombstones. And since a tombstone prevents a key to be
> > Jost> entered immediately, our hash table has degenerated into a linear list
> > Jost> which could have a length of up to 75% of it. What would help is to
> > Jost> change the implementation to allow more than one key at a location, for
> > Jost> example by storing the additional keys in a separate list or by building
> > Jost> clusters.
> >
> > That's interesting. Could you write a patch that includes an
> > explanation of what is going on internally? And ideally do some
> > profiling so we can see the result?
>
> Instead of changing over to chaining, the current open
> addressing implementation can perhaps be redeemed by
> using a compacting remove(), eliminating the need for
> tombstones.
>
> This is also described in Knuth's TAOCP Vol. 3 (Sorting
> and Searching) and he shows that it doesn't impact
> performance that much. See section 6.4, Algorithm R (IIRC).
That's a good idea if we're using linear probing to handle collisions.
But I can't see that we handle collisions at all. Look at this:
public Object put(Object key, Object value)
{
// Rehash if the load factor is too high.
if (size > threshold)
{
...
}
int h = hash(key);
if (table[h] == key)
{
Object r = table[h + 1];
table[h + 1] = value;
return r;
}
// At this point, we add a new mapping.
modCount++;
size++;
table[h] = key;
table[h + 1] = value;
return null;
}
Aren't we overwriting an existing key here? Surely we need to
increment h until an empty entry is found.
> Also note that at least in the JavaDocs for this class
> in JDK 1.4.2, Sun says that this is an open addressing
> hash table using a linear probe with alternating key/value
> pairs. It also says that HashMap uses chaining instead.
>
> BTW, why is our default capacity 21 and not a prime
> like (say) 19 or 23? Primes are much better for hash
> table sizes (again, refer to TAOCP Vol. 3).
Well, that depends. If the key really is a random hash function it
doesn't matter much. However, our hash functions aren't very random,
so we should do this.
> On each expansion, instead of going for just 2n+1, we
> should go for the nearest prime to 2n. This is not
> as bogus as it sounds as we can easily pre-compute and
> store the 26 or so primes that would be needed for a
> default starting of, say, 23. Note that the array
> length is capped at 2^31.
>
> If the user supplies an explicit expectedMaxSize, we
> can still choose the nearest larger prime from our
> table.
Yes. The code to do this is in BigInteger, which has all the speedups
for small integers.
Andrew.