I'm programming a minhashing algorithm in Java that requires me to generate an arbitrary number of random hash functions (240 hash functions in my case), and run any number of integers through it (2000 at the moment).
In order to do that, I've been generating random numbers a, b, and c (from the range 1 - 2001) for each of the 240 hash functions. Then, my hash function returns h = ((a*x) + b) % c, where h is the return value and x is one of the integers run through it.
Is this an efficient implementation of random hashing, or is there a more common/acceptable way to do it?
This post was asking a similar question, but I'm still somewhat confused by the wording of the answer: Minhash implementation how to find hash functions for permutations
Pseudorandom
. This sounds like an academic issue so it might be worth noting the distinctions. – Colleaguebottom-k
hashing, where you just use one hash function, but keep thek
smallest values, rather than only one. – Heterogamete