This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: htab_hash_string
- To: dan at cgsoftware dot com
- Subject: Re: htab_hash_string
- From: Zack Weinberg <zackw at panix dot com>
- Date: Fri, 17 Aug 2001 02:15:56 -0400
- Cc: gcc-patches at gcc dot gnu dot org
> And by intended use, I meant the hash function, not htab_hash_string
> (I.E. Whether it was meant for hashing strings because of some
> property, or was just some random hash function picked out of thin air).
> > + r = r * 67 + c - 113;
I recognize this formula - it's the hash function that I worked
out for cpplib's identifier hash (now gcc/hashtable.c).
I got it by extracting all the identifiers from all the source code
I had lying around in mid-1999, and testing many recurrences of
the form "H_n = H_{n-1} * K + c_n * L + M" where K, L, M were either
prime numbers or the appropriate identity. This was the best one.
I don't remember exactly what constituted "best", except I was
looking at bucket-length distributions mostly.
So it should be very good at hashing identifiers, but might not be
as good at arbitrary strings.
I'll add that it thoroughly trounces the hash functions recommended
for this use at http://burtleburtle.net/bob/hash/index.html, both
on speed and bucket distribution. I haven't tried it against the
function they just started using for Perl's hashes.
zw