Fastest way to generate hash function from k wise independent hash family for small k(<=5)
Asked Answered
H

2

22

I need a hash function h[n]:[t] from k wise independent hash family when k is small (<= 5). Or I need n hash values chosen uniformly randomly from [1-t] such that they are k wise independent. I am trying to implement some randomized algorithms where I needed this. I was generating n random number from range [1-t] using

scipy.stats.randint(0,self._t).rvs(self._n)

but this seems to be too slow for my application. Since I don't need full randomness but only 4 wise independence I was wondering if I could speed this up. I know I can use polynomial hash family to get k wise independence but is this the best? If yes, is there any fast implementation of this which I can plug in? If no, what are the alternative ways (libraries, possibly in Python)?

I have looked at this thread Obtaining a k-wise independent hash function but I am not sure what the accepted answer means by this: "if you need k different hash, just re-use the same algorithm k times, with k different seeds".

Any suggestions are greatly appreciated. Thanks.

Harbard answered 27/9, 2017 at 14:45 Comment(2)
If I am right your goal is to generate n random numbers (at least 4-wise independent) in range [1-t] in fastest way possible.Hepza
@EngineeredBrain Yes exactly and those number should be uniformly randomly picked from [1-t]. Thanks.Harbard
B
1

You can try to use numba's jit with numpy's random.randint():

import scipy.stats
import numpy as np
from numba import jit

def randint_scipy(n):
    return scipy.stats.randint(0, 10000).rvs(n)

def randint_numpy(n):
    return np.random.randint(0, 10000, n)

@jit
def randint_numpy_jit(n):
    return np.random.randint(0, 10000, n)

%timeit randint_scipy(5)
%timeit randint_numpy(5)
%timeit randint_numpy_jit(5)

Output:

1.09 ms ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.63 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
960 ns ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Thus, numpy + numba is 1135 times faster than scipy's implementation of randint().

Burgomaster answered 16/8, 2018 at 15:33 Comment(0)
R
0

The fastest way to get a real k-wise independent hash function is to evaluate a degree-k polynomial over a finite field. The fastest way to do that is probably using carry less multiplication. For example, code see https://github.com/speedyhash/shorthash/blob/master/include/clhash.h#L248

In general, you should see the Techniques section on the Wikipedia article: https://en.wikipedia.org/wiki/K-independent_hashing#Techniques

Rebel answered 11/6, 2021 at 15:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.