4.0.0 JNI/reflection invocation 2/4/8 times as slow as in 3.3.3

Andrew Haley aph@redhat.com
Fri Jan 28 09:48:00 GMT 2005

Ranjit Mathew writes:
 > Tom Tromey wrote:
 > > Jost> Imagine a program which allocates and immediately removes a key.  If you
 > > Jost> let the hash() distribute the keys all over the table, your table gets
 > > Jost> filled with tombstones.  And since a tombstone prevents a key to be
 > > Jost> entered immediately, our hash table has degenerated into a linear list
 > > Jost> which could have a length of up to 75% of it.  What would help is to
 > > Jost> change the implementation to allow more than one key at a location, for
 > > Jost> example by storing the additional keys in a separate list or by building
 > > Jost> clusters.
 > > 
 > > That's interesting.  Could you write a patch that includes an
 > > explanation of what is going on internally?  And ideally do some
 > > profiling so we can see the result?
 > Instead of changing over to chaining, the current open
 > addressing implementation can perhaps be redeemed by
 > using a compacting remove(), eliminating the need for
 > tombstones.
 > This is also described in Knuth's TAOCP Vol. 3 (Sorting
 > and Searching) and he shows that it doesn't impact
 > performance that much. See section 6.4, Algorithm R (IIRC).

That's a good idea if we're using linear probing to handle collisions.
But I can't see that we handle collisions at all.  Look at this:

  public Object put(Object key, Object value)
    // Rehash if the load factor is too high.
    if (size > threshold)

    int h = hash(key);
    if (table[h] == key)
        Object r = table[h + 1];
        table[h + 1] = value;
        return r;

    // At this point, we add a new mapping.
    table[h] = key;
    table[h + 1] = value;
    return null;

Aren't we overwriting an existing key here?  Surely we need to
increment h until an empty entry is found.

 > Also note that at least in the JavaDocs for this class
 > in JDK 1.4.2, Sun says that this is an open addressing
 > hash table using a linear probe with alternating key/value
 > pairs. It also says that HashMap uses chaining instead.
 > BTW, why is our default capacity 21 and not a prime
 > like (say) 19 or 23? Primes are much better for hash
 > table sizes (again, refer to TAOCP Vol. 3).

Well, that depends.  If the key really is a random hash function it
doesn't matter much.  However, our hash functions aren't very random,
so we should do this.

 > On each expansion, instead of going for just 2n+1, we
 > should go for the nearest prime to 2n. This is not
 > as bogus as it sounds as we can easily pre-compute and
 > store the 26 or so primes that would be needed for a
 > default starting of, say, 23. Note that the array
 > length is capped at 2^31.
 > If the user supplies an explicit expectedMaxSize, we
 > can still choose the nearest larger prime from our
 > table.

Yes.  The code to do this is in BigInteger, which has all the speedups
for small integers.


More information about the Java mailing list