String hashCode

Tom Tromey
Fri Feb 2 14:29:00 GMT 2001

>>>>> "Bryce" == Bryce McKinlay <> writes:

Bryce> A red-black tree does solve this problem somewhat by ensuring
Bryce> that the tree gets restructured to prevent worst-case
Bryce> performance, but having to do this insert fixup is relatively
Bryce> expensive if it is happening too often

My understanding is that red-black trees have log(n) performance even
in the worst case.  This is one reason I chose to replace the existing
hash table implementation.

Bryce> I agree that rehashing is currently expensive, but this cost
Bryce> will be massively reduced by using a hashcode field, as most of
Bryce> the cost is in the hashcode calculation. It will still be
Bryce> slower in the worst case than an rbtree, but overall it will be
Bryce> much faster. rbtrees also have a larger memory overhead per
Bryce> entry and require an allocation for every new entry.

The memory might be a concern.  Maybe I will do measurements to see,
if I can find the time.

I'm not sure I care too much about the average case performance of
String.intern().  People typically use String.intern() to speed up
string comparison at a later date.  To me this says that typical
programs won't be using String.intern() itself in a
performance-critical way.  That is, these calls will be part of the
startup overhead, won't be done in tight loops, etc.

Bryce> Also, by using an STL structure you're adding a dependency on
Bryce> libstdc++ which we have managed to avoid so far - adding more
Bryce> shared libs increases startup overhead.

I agree that we don't want a libstdc++ dependency.  However, I haven't
introduced one.  The STL map<> is implemented entirely in header
files; all the templates are instantiated in our library.  No new
libraries are linked in.

I agree this is a slippery slope.  We must review every use of STL
carefully to make sure we don't end up pulling in too much.

Bryce> If you wanted an rbtree, TreeMap might be a better choice,
Bryce> although I suspect it wouldn't perform as well given the
Bryce> current compiler due to interface calls and lack of inlining.

For interning we need a map which is invisible to the GC.


More information about the Java mailing list