How to bucket locality-sensitive hashes?
Asked Answered
R

1

5

I already have the algorithm to produce locality-sensitive hashes, but how should I bucket them to take advantage of their characteristics(i.e. similar elements have near hashes(with the hamming distance))?

In the matlab code I found they simply create a distance matrix between the hashes of the points to search and the hashes of the points in the database, to simplify the code,while referencing a so called Charikar method for an actually good implementation of the search method.

I tried to search for that, but I'm not sure how to apply to my case any of the methods I found(like the multi-probe method). None of these techniques seems easily pluggable if you already have the hashes. Is there any simple example code for this? Or any suggestion?

This is the link to the page with the matlab code I'm talking about: http://www.eecs.berkeley.edu/~kulis/klsh/klsh.htm

Rapprochement answered 30/6, 2011 at 19:54 Comment(5)
Doing some research on the matter I've come up with an algorithm, which basically consist in creating tables for each bit(in this case) and divide all the elements between those that have that bit set and those that haven't. Do this for all the bits. Then, while searching, you visit the right table for each bit of the query and this way you take all the elements to compute the distance with the query(once you erased the dupes).Rapprochement
All this taking in consideration an obvious optimization, which is, talking about bits, they're either 0 or 1, so you don't really need to list them both(that is, if you list those that have the bit set, it means that all the others haven't).Rapprochement
If your comments answer your own question, could you post them as an answer and accept it (which you can do, I think, after two days)? This way other people can see the problem is solved more easily...Oruro
@Rapprochement hello, I see that this is a quite old question, but did you implement it? I think that you are talkin about the KLSH which I'm trying to implement too. I didn't find much about the method that you described in the comment, could you tell me more about it?Roseliaroselin
I found an implementation of the algorithm hereRoseliaroselin
F
0

Based on: Search in locality sensitive hashing I would say this, after reading Similarity Estimation Techniques from Rounding Algorithms:

This question is somehow broad, so I am just going to give a minimal (abstract) example here:

We have 6 (= n) vectors in our dataset, with d bits each. Let's assume that we do 2 (= N) random permutation.

Let the 1st random permutation begin! Remember that we permute the bits, not the order of the vectors. After permuting the bits, they maintain an order, for example:

v1
v5
v0
v3
v2
v4

Now the query vector, q, arrives, but it's (almost) unlikely that is going to be the same with a vector in our dataset (after the permutation), thus we won't find it by performing binary search.

However, we are going to end up between two vectors. So now we can imagine the scenario to be like this (for example q lies between v0 and v3:

v1
v5
v0 <-- up pointer
   <-- q lies here
v3 <-- down pointer
v2
v4

Now we move either up or down pointer, seeking for the vi vector that will match at the most bits with q. Let's say it was v0.

Similarly, we do the second permutation and we find the vector vi, let's say v4. we now compare v0 from the first permutation and v4, to see which one is closest to q, i.e. which one has the most bits equal with q.


However, if you are seeking for a ready implementation, you should ask in Software Recommendation. I would also look at the paper I linked to to see if the author(s) made the code public, or if they would like to share it after contacting them.

Fdic answered 23/5, 2016 at 6:29 Comment(1)
Found an implementation of the algorithm hereRoseliaroselin

© 2022 - 2024 — McMap. All rights reserved.