How to solve nearest neighbor through the R-nearest neighbor?
Asked Answered
I

1

1

Citing the E2LSH manual (it's not important that's about this specific library, this quote should be true for NN problem in general):

E 2LSH can be also used to solve the nearest neighbor problem, where, given the query q, the data structure is required the report the point in P that is closest to q. This can be done by creating several R-near neighbor data structures, for R = R1, R2, . . . Rt , where Rt should be greater than the maximum distance from any query point to its nearest neighbor. The nearest neighbor can be then recovered by querying the data structures in the increasing order of the radiae, stopping whenever the first point is found

Could someone rephrase this please? I don't this procedure to find the nearest neighbor using the R-near neighbor approach.

Ignatzia answered 8/6, 2016 at 12:57 Comment(0)
I
1

I will provide an example, which should clear things up. Assume that our dataset consists of only one point, p and a query point arrives, q. Let's assume* that the distance of p and q is 3,9.

Now, by using E2LSH$, I can create a data structure that solves the R-nearest neighbor problem, i.e. it will answer yes (and fetch me the point) that lies inside a radius R. If no such point exists, it will answer no.

Let's say that I choose to build 5 data structures of that kind, starting from R = 1 to 5. In our mind, this is what we have done so far:

enter image description here

So now remember, that d(p, q) = 3,9, thus we expect to ask the data structure that is built with R = 4 and find for us the query point q.

Now let's pretend that we do not know d(p, q), so we start searching from the smallest Radius we have picked, that's 1. So, we ask, is there anything in Radius (equal to 1) from our dataset? No!

From R = 2? No! From R = 3? No! From R = 4? Yes, and that's q! So now we are done. 4 is the Rt you mention in your question.


* That's a strong assumption and E2LSH suffers for having to put the user to input that parameter R, since usually we do not know what value should R have, too big and we will waste space and time, too small and we won't find our query!

$ I heard there is something more modern than E2LSH now, in Ilya Razenshteyn's homepage.

Impatient answered 10/6, 2016 at 8:33 Comment(7)
Holy...! This is SO naive and inefficient :D I have to say it: I already thought that this was the approach described in the question, but it appeared to me with so many problems that I wanted to be sure about it! First of all: how do we decide the radius-increasing step? If it's too large probably from finding no neighbor we'll find a bunch of them, but if it's too small probably we'll have to do SO MANY iterations! In few words: tuning like hell (which is usually not a good thing). Anyway thanks so much for FALCONN linking and your answer!Ignatzia
And your answer was so clear! Thanks so much (hoping not to bother you with all these questions about LSH ;) )Ignatzia
@Ignatzia tuning! I am not aware of another technique, but it's an open question and lots of research is being done on this! No my friend, not at all! In fact, I enjoyed your questions, because they made me refresh what I know and learn some new things! So I thank you! Cheers!Impatient
Well, then could you please give a quick look to this question? :)Ignatzia
Could we start a chat? That could be really useful and time saving :)Ignatzia
I created this chat could you join it?Ignatzia
@Ignatzia sorry I have exams. I would advice you to post a new question (so that another person can also answer) and if not, feed me with the link! :)Impatient

© 2022 - 2024 — McMap. All rights reserved.