PATCH: Add objc-lang.c, further cleanup - TAKE TWO
Zack Weinberg
zack@codesourcery.com
Sun Dec 9 20:29:00 GMT 2001
On Tue, Dec 04, 2001 at 06:29:10AM +0000, Neil Booth wrote:
>
> > Just yesterday I found an old letter from someone suggesting I try just
> >
> > hash = hash * 9 + ch
> >
> > for the hash step. If you have a harness set up to do these tests
> > already, mind trying that one?
>
> I think that might have been one they tested. ... OK you persuaded
> me. [Digs around]. Here's the results with my data with one test
> harness a guy posted (someone else had a larger set of hashes but
> didn't post his harness):
[snip]
Finally I get back to this.
I augmented the harness to indicate occupancy, mean, std.dev as well
as max. (Occupancy replaces "used" as it's easier to compare across
runs with different table sizes.) Then I tried various table sizes
with the provided hashes, gcc's hash, the "*9" hash, and one more -
the "linear congruential" hash used by Daniel Philips' ext2 indexed
directories (the approach is similar to FNV).
The data set was all the identifier-like strings found in all the
source code I have lying around - I made no attempt to exclude
words in comments or other junk; just every string that matched
/[a-zA-Z_][a-zA-Z0-9_]*/. There are 552610 of them after sort -u.
Results are attached, as is the revised harness. Mail me if you want
the data set, it's 2.2MB bzipped.
The conclusion I draw is that the perl, gcc, fnv, and dphil hashes are
all about tied for bucket distribution, on identifiers, and perform
well on power-of-two size tables. (Which is good, since hashtable.c
uses a power-of-two table.) CRC is decent but only if given a prime
table size.
lennart and nine are fast, but you pay for it with longer chains and
lower occupancy - lennart in particular consistently has a mean bucket
size of ~8, not ~1 the way all the other good ones do. mouse and dumb
are just not worth mentioning.
It would be interesting to attempt to simulate hashtable.c's rehashing
algorithm and see what kind of figures result.
zw
-------------- next part --------------
There are 552610 unique items-which-look-like-identifiers
(/[a-zA-Z_][a-zA-Z0-9_]*/) in my source code collection.
'gcc' is htab-hash-string from libiberty/hashtab.c.
'nine' is the simple multiply-by-nine hash.
'dphil' is Daniel Philips' linear congruential hash, from his
ext2-directory-index patch.
524288: largest power of two smaller than N
dumb occ 1.0% mean 108.23 dev 131.16 max 656 time 0.308sec
mouse occ 44.2% mean 2.39 dev 6.76 max 229 time 0.488sec
crc occ 58.6% mean 1.80 dev 1.71 max 135 time 0.423sec
lennart occ 12.5% mean 8.43 dev 2.89 max 23 time 0.388sec
nine occ 63.8% mean 1.65 dev 0.93 max 12 time 0.376sec
perl occ 64.9% mean 1.62 dev 0.85 max 9 time 0.417sec
gcc occ 65.1% mean 1.62 dev 0.84 max 9 time 0.422sec
fnv occ 65.2% mean 1.62 dev 0.84 max 9 time 0.458sec
dphil occ 65.1% mean 1.62 dev 0.84 max 9 time 0.537sec
1048576: smallest power of two larger than N
dumb occ 0.5% mean 108.23 dev 131.16 max 656 time 0.308sec
mouse occ 25.6% mean 2.06 dev 6.26 max 229 time 0.493sec
crc occ 36.4% mean 1.45 dev 1.35 max 135 time 0.430sec
lennart occ 6.2% mean 8.43 dev 2.89 max 23 time 0.388sec
nine occ 39.8% mean 1.32 dev 0.66 max 11 time 0.385sec
perl occ 40.8% mean 1.29 dev 0.57 max 7 time 0.423sec
gcc occ 40.9% mean 1.29 dev 0.56 max 7 time 0.429sec
fnv occ 41.0% mean 1.29 dev 0.56 max 7 time 0.465sec
dphil occ 40.9% mean 1.29 dev 0.56 max 6 time 0.559sec
552611: smallest prime larger than N
dumb occ 0.9% mean 108.23 dev 131.16 max 656 time 0.308sec
mouse occ 47.4% mean 2.11 dev 3.58 max 113 time 0.494sec
lennart occ 11.9% mean 8.43 dev 2.89 max 23 time 0.398sec
nine occ 61.9% mean 1.62 dev 0.91 max 12 time 0.376sec
perl occ 63.0% mean 1.59 dev 0.82 max 9 time 0.417sec
gcc occ 63.1% mean 1.58 dev 0.82 max 9 time 0.422sec
fnv occ 63.2% mean 1.58 dev 0.81 max 9 time 0.469sec
dphil occ 63.2% mean 1.58 dev 0.81 max 9 time 0.537sec
crc occ 63.0% mean 1.59 dev 0.82 max 8 time 0.436sec
1048759: smallest prime larger than 1048756
dumb occ 0.5% mean 108.23 dev 131.16 max 656 time 0.308sec
mouse occ 28.6% mean 1.85 dev 3.30 max 113 time 0.498sec
lennart occ 6.2% mean 8.43 dev 2.89 max 23 time 0.388sec
nine occ 39.8% mean 1.32 dev 0.66 max 12 time 0.382sec
gcc occ 40.9% mean 1.29 dev 0.56 max 8 time 0.429sec
fnv occ 40.9% mean 1.29 dev 0.56 max 8 time 0.465sec
crc occ 40.7% mean 1.29 dev 0.57 max 7 time 0.435sec
perl occ 40.8% mean 1.29 dev 0.56 max 7 time 0.423sec
dphil occ 40.9% mean 1.29 dev 0.56 max 6 time 0.545sec
10541534: N log2 N
dumb occ 0.0% mean 108.23 dev 131.16 max 656 time 0.308sec
mouse occ 3.3% mean 1.59 dev 3.01 max 113 time 0.515sec
lennart occ 0.6% mean 8.43 dev 2.89 max 23 time 0.389sec
nine occ 4.9% mean 1.07 dev 0.36 max 10 time 0.399sec
perl occ 5.1% mean 1.03 dev 0.19 max 5 time 0.441sec
gcc occ 5.1% mean 1.03 dev 0.18 max 5 time 0.447sec
crc occ 5.1% mean 1.03 dev 0.19 max 5 time 0.455sec
dphil occ 5.1% mean 1.03 dev 0.16 max 4 time 0.566sec
fnv occ 5.1% mean 1.03 dev 0.16 max 3 time 0.486sec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hashtest.tar.gz
Type: application/octet-stream
Size: 4608 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20011209/7a61bdca/attachment.obj>
More information about the Gcc-patches
mailing list