I have a large sparse numpy/scipy matrix where each row corresponds to a point in high-dimensional space. I want make queries of the following kind:
Given a point P (a row in the matrix) and a distance epsilon, find all points with distance at most epsilon from P.
The distance metric I am using is Jaccard-similarity, so it should be possible to use Locality Sensitive Hashing tricks such as MinHash.
Is there an implementation of MinHash for sparse numpy arrays somewhere (I can't seem to find one) or is there an easy way to do this?
The reason I am not just pulling something built for non-sparse arrays off of Github is that the sparse data structures in scipy might cause explosions in time complexity.