What are the properties of a good key hash?

Question

I'm attempting to create a key hash for a custom struct with two integer values, x and y - for use in a hashtable.

I only really care about values between -10000 to +10000, and 90% of the time it will likely be operating between -100 and +100. I don't care at all about security.

From some basic research, I've learned that an optimal performing hash has very few or no collisions. I believe I can achieve this across the relevant set by treating x and y as 16 bit values.

Also it is beneficial to have adjacent input values create non-adjacent, and preferably very different output values.

How important is the non-adjacency in performance? And are there any other major considerations? Would one want to allow some collisions in favor of non-adjacency? I am using this not for security, but rather as a key value in a hashtable, so lookup speed is important. (I'm also caching the hashcode in a readonly variable at initialization, so don't worry about that).

public override int GetHashCode() { //A very simple unique hash, but values are adjacent //hash = (x<<16) +y; **EDIT: This is the function to use** hash = (x<<16) +y +32768;

 /*A mildly complicated hash from the internets, 
   non adjacent values, caused some collisions.  
   (checking from -250 to +250 in x and y required 90s
   Checking 20000 would take a very long time)*/
 //hash -= (hash&lt;&lt;6);
 //hash ^= (hash&gt;&gt;17);
 //hash -= (hash&lt;&lt;9);
 //hash ^= (hash&lt;&lt;4);
 //hash -= (hash&lt;&lt;3);
 //hash ^= (hash&lt;&lt;10);
 //hash ^= (hash&gt;&gt;15);

}

Answer 1

From some basic testing, it seems as though the benefit of any superior hashing algorithm in spacing and adjacency is far outweighed by the cost of actually calculating the hashcode, particularly in structs where they are frequently recalculated.

Simple hashcodes would appear to be superior.

That said, collisions do appear to have a very significant impact on performance.

One recommended hash algorithm on MSDN for such a problem, was 100x slower than my short sample code above for lookup, and nearly froze my computer for hashtable generation - most likely due to lots of collisions in the data set.

int x;
int y;
//hash = x^y; //DO NOT USE THIS, QUITE TERRIBLE

Whereas this simple computation is very fast

//hash = (x<<16) +y;  //does not use full 16 bits
hash = (x<<16) +y +32768;  //need the offset

From this it would seem if you are interested in generating key values, avoiding collisions should be the number 1 concern, and a fast hashcode algorithm number 2.

The result looks like this

public Point(int x, int y){ this.x = Mathf.Clamp(x, -32768, 32767); this.y = Mathf.Clamp(y, -32768, 32767);

 hash = (this.x&lt;&lt;16) +this.y +32768;

}

What are the properties of a good key hash?

1 Reply

Your answer

Follow this Question

Related Questions