Is LSH about transforming vectors to binary vectors for hamming distance?
Asked Answered
W

1

2

I read some paper about LSH and I know that is used for solving the approximated k-NN problem. We can divide the algorithm in two parts:

  1. Given a vector in D dimensions (where D is big) of any value, translate it with a set of N (where N<<D) hash functions to a binary vector in N dimensions.

  2. Using hamming distance, apply some search technique on the set of given binary codes obtained from phase 1 to find the k-NN.

The keypoint is that computing the hamming distance for vectors in N dimensions is fast using XOR.

Anyway, I have two questions:

  1. Point 1. is still necessary if we use a binary descriptor, like ORB? Since ORB's descriptors are already binaries and we use the hamming distance for comparing them, why we should perform the first point?

  2. How the conversion happens for images described by SIFT? Each SIFT descriptor is 128 bits and each image is described by a set of descriptors. So we have matrix descX128 (where desc is the number of descriptors used), while LSH usually accept as input a vector.

Wellthoughtof answered 24/5, 2016 at 8:55 Comment(0)
C
2

1) You can bypass it, but then you will work in D dimensions, not N as you say. where N << D. That means that the algorithm has to adapt to D dimensions as well.


2) No.

Read SIFT from openCV:

  1. Keypoint Descriptor

Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is devided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

Here is how I have it in my mind, hope that will be enough:

LSH takes as input a pointset, of n points, where every point lies in d dimensions.

So, a query is a point, in d dimensions and the goal is to find its NN*.


  1. Now every point, represents an image descriptor. So, we have n images in our dataset.

  2. The query, which is also a point (i.e. a vector with d coordinates), represents another image descriptor.

  3. We are seeking to match (i.e. to find the Nearest Neighbor) the query image descriptor with an image descriptor from our dataset.

So the conversion you are talking about is applied in a vector, not a matrix.


Edit:

Moreover, from our High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests paper, see this in the Experiments section:

SIFT is a 128-dimensional vector that describes a local image patch by histograms of local gradient orientations.


*or the Fixed-radius near neighbors problem

Catalan answered 25/5, 2016 at 6:34 Comment(10)
But your quoted text refers only to one descriptor. This means that in SIFT each descriptor is a vector in 128 dimensions. But an image is represented by multiple descriptors , that's why I say that an image is represented through a matrix, since me need multiple vectors descriptors (each one of them in 128 dimensions).Wellthoughtof
Then this Keypoints between two images are matched by identifying their nearest neighbours. from the link I provided, should help @justHelloWorld.Catalan
I'm sorry for my stupidity, but I don't understand how that could help on my previous observation :DWellthoughtof
I think the fault lies on me @justHelloWorld, not you! See my edit. In LSH we search for image descriptors.Catalan
Now it's much more clear: we base our research in LSH on just one descriptor between the many available in the whole image. So we identify one image through one descriptor relative to a small patch in the image. Now I wonder how we can extract a meaningful (or better, the most meaningful) patch in the image, but I think that I'll open another question on the topic :)Wellthoughtof
Exactly @justHelloWorld, you got it! Hmm, that's a big question and many scientists are working on it, it is an open question I guess.Catalan
And this is the question :)Wellthoughtof
That's a rather broad question for me, I will let someone else answer @justHelloWorld, good luck!Catalan
I just found out that the GIST descriptor refers to an entire image, and not just a patch/keypoint inside it, so we can represent an image through a vector (in 534 dimensions, if I'm not wrong). This is great :)Wellthoughtof
That's what I was saying from the very start @justHelloWorld, but you got me tangled up a bit, sorry! In our High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests we have top performance in GIST and it is 960 dimensions. ;)Catalan

© 2022 - 2024 — McMap. All rights reserved.