This is the mail archive of the
mailing list for the Java project.
Re: String hashCode
Tom Tromey wrote:
> FYI I've been planning to change the data structure used for
> String.intern. I have a patch to do this already; I'll check it in
> after the branch is made.
> The new data structure is an STL map<> (a red-black tree). I chose
> this because it has better worst-case performance than the existing
> hash table and is therefore better for real-time usage. The hash
> table is bad because rehashing makes the worst case expensive.
Tom, I'm not sure that a btree-based structure is the best choice for the
intern table. Its easy to imagine situations where a bunch of Strings
would be interned in lexographical order, in which case the tree would
perform poorly. A red-black tree does solve this problem somewhat by
ensuring that the tree gets restructured to prevent worst-case
performance, but having to do this insert fixup is relatively expensive
if it is happening too often (and the deletefixup
I agree that rehashing is currently expensive, but this cost will be
massively reduced by using a hashcode field, as most of the cost is in
the hashcode calculation. It will still be slower in the worst case than
an rbtree, but overall it will be much faster. rbtrees also have a larger
memory overhead per entry and require an allocation for every new entry.
Also, by using an STL structure you're adding a dependency on libstdc++
which we have managed to avoid so far - adding more shared libs increases
startup overhead. If you wanted an rbtree, TreeMap might be a better
choice, although I suspect it wouldn't perform as well given the current
compiler due to interface calls and lack of inlining.
I think the best solution for the intern table is a hashtable, but with
buckets to resolve collisions ala java.util.HashMap/Hashtable. This gets
around the problem of uninterning a string messing up a collision chain.
[ bryce ]