This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Hash Function for "switch statement"


Jae Hyuk Kwak wrote:
Hi Robert,


On Fri, Mar 19, 2010 at 2:13 PM, Robert Dewar <dewar@adacore.com> wrote:
Jae Hyuk Kwak wrote:

For the issue of Modulus operation, it does not need to be divide or
hardware modulus operation.
Let me give you an example here:
13 % 2 = 1
13 & (2-1) = 1
Yes, of course, we all know that, and of course the compiler does
these simple optimizations. However, for computing hashes you
typically want modulus with OTHER than a power of 2 to involve
all bits.

The point that I made is we don't need the "OTHER".


For example, when we have 200 cases in a switch, we don't need to use
200 as OTHER, but we can just use 256.
Because there is no reason to stick on the number of cases for the
case of hash table.

You miss my point, doing a mod with 256 is an AWFUL hash algorithm since it only uses the low order 8 bits! But if you are just doing the AND to get the final result, yes, of course again that's totally familiar to everyone involved with compilers.

I think we have different assumption on the hash function.
The hash function I am thinking of is not just taking lower bits.
The AND operator part takes place after hash function is done.
Those hash functions can be one of functions on this web page:
http://www.concentric.net/~Ttwang/tech/inthash.htm

Sure that's a possibility


Let me copy and paste the function to prevent misunderstanding.
public int hash32shift(int key)
{
  key = ~key + (key << 15); // key = (key << 15) - key - 1;
  key = key ^ (key >>> 12);
  key = key + (key << 2);
  key = key ^ (key >>> 4);
  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
  key = key ^ (key >>> 16);
  return key;
}

int value = given_from_user_at_runtime();
int hashKey = hash32shift( value );
int actualHashKey = hashKey & ( 512 - 1 );
HashBucket * bucket = hashTable[ actualHashKey ];
if( bucket->key == value ) (*(bucket->function)();


In order to prevent the false positives, we need to store the key
value as you correctly pointed.
So the final process will be first to the hash key value calculation
and then go to the hash table, and compare whether the value is
correct or not.
Again, you can assume we all know the elementary algorithms for
managing hash tables.

One thing to realize is that hashing only has good average performance.
So your O(N) is talking about average case performance, whereas the
O(NlogN) for a tree search is worst case. That's comparing apples and
oranges. It is worrisome to have the possibility of really bad
performance for a particular bad luck case.

As I have already corrected, the performance of HashTable way is O(1) not O(N).

Well I was giving the performance of N executions, which is O(N) for the hash table, and O(NlogN) for the tree, if you just want to consider a single execution, then the result is O(1) for the hash and O(logN) for the tree search, so my comparison was consistent.

You may think it is not proper comparison, but I think comparing "tree search" with hash table. Let me cite a sentence from Wiki: http://en.wikipedia.org/wiki/Hash_table

"In many situations, hash tables turn out to be more efficient than
search trees or any other table lookup structure."

yes, of course, that's why all compilers use hash tables extensively, I think you will find that people on this mailing list know all about hash tables (I would not be surprised if someone here wrote the (rather elementary) wikipedia article). BTW, I would recommend a more thorough source, e.g. Knuth AOP for reading up on hashing.

The above quote from Wikipedia is indeed misleading because it does
not make a clearer contrast between average behavior and worst case
behavior (the situation is similar to comparing quick sort with heap
sort, they are both NlogN on average, but the constant is smaller
for quick sort, so it is generally better, but the worst case is
N**2 for quick sort and NlogN for heap sort.

So this does not get around the possibility of a bad luck worst case.
It is one thing for a program to compiler slowly because by bad luck
it has identifier names that hash badly, it is QUITE another to hit
bad luck at execution time which results in a particular program
running horribly slow. I am dubious about introducing this kind of
uncertainty.

I trust this point is clear to you?

You seem to write as though the discovery of hash algorithms is new
and interesting in this context. In fact the use of hash tables for
switch statements is very old (BCPL comes to mind (*) , as is the
understanding of whether it is a useful approach). Along with
others in this thread, I very much doubt it is useful in practical
cases. And for certainly you would have to do extensive careful
benchmarks to show it was a worthwhile replacement, especially
given the worst-case behavior.

(*) Here is a 1969 BCPL reference (more than 40 years old!)

http://portal.acm.org/citation.cfm?id=1476880&dl=GUIDE&coll=GUIDE&CFID=80853141&CFTOKEN=76204023

A quote from this article:


" ... the switch very efficiently, even constructing a hash table if this method of switching is ..."


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]