This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: htab_hash_string

To: dan at cgsoftware dot com
Subject: Re: htab_hash_string
From: Zack Weinberg <zackw at panix dot com>
Date: Fri, 17 Aug 2001 02:15:56 -0400
Cc: gcc-patches at gcc dot gnu dot org

> And by intended use, I meant the hash function, not htab_hash_string
> (I.E. Whether it was meant for hashing strings because of some
> property, or was just some random hash function picked out of thin air).

> > +     r = r * 67 + c - 113;

I recognize this formula - it's the hash function that I worked
out for cpplib's identifier hash (now gcc/hashtable.c).

I got it by extracting all the identifiers from all the source code
I had lying around in mid-1999, and testing many recurrences of
the form "H_n = H_{n-1} * K + c_n * L + M" where K, L, M were either
prime numbers or the appropriate identity.  This was the best one.
I don't remember exactly what constituted "best", except I was
looking at bucket-length distributions mostly.

So it should be very good at hashing identifiers, but might not be
as good at arbitrary strings.

I'll add that it thoroughly trounces the hash functions recommended
for this use at http://burtleburtle.net/bob/hash/index.html, both
on speed and bucket distribution.  I haven't tried it against the
function they just started using for Perl's hashes.

zw

Follow-Ups:
- Re: htab_hash_string
  - From: Neil Booth
- Re: htab_hash_string
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]