I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking what's a good hash function "in general". I've considered CRC32 (but where to find good implementation?) and a few cryptography algorithms. My table, though, has very specific requirements.
Here's what the table will be like:
100,000 items max
200,000 capacity (so the load is 0.5)
hashing a 6-character string which is a part of English sentence
examples: "become" "and he" ", not "
The number one priority of my hash table is quick search (retrieval). Quick insertion is not important, but it will come along with quick search. Deletion is not important, and re-hashing is not something I'll be looking into. To handle collisions, I'll be probably using separate chaining as described here. I have already looked at this article, but would like an opinion of those who have handled such task before.
unsigned int
by, say 3 bits and add in the next byte, then reduce modulo a prime, that one works OK for text. Is the input somehow under the (partial) control of potentially malicious parties (they could give you 100,000 strings hashing to a few buckets...)? Then you need to make it hard to cook up data for such an attack, perhaps by "salting" (start with a secret random value for each table) and some cryptographic hash thrown in (but for short strings that might not be very effective). – Basso