What is a good metric for feature vector comparison and how to normalize them before comparison?

Asked 20/6, 2013 at 17:14 Answered 21/6, 2013 at 8:47

machine-learning computer-vision image-segmentation

Background:

I am working on a bottom up approach to image segmentation where in I first over-segment the image into small-regions/super-pixels/super-voxels and then I want to iteratively merge adjoining over-segmented regions based on some criterion. One criterion I have been playing with is to measure how similar in appearance are the two regions. To quantify appearance of a region, I use several measures -- intensity statistics, texture features etc. I lump all the features I compute for a region into a long feature vector.

Question:

Given two adjacent over-segmented regions R1 and R2, let F1 and F2 be the corresponding feature vectors. My questions are the following:

-- What are good metrics to quantify the similarity between F1 and F2?

-- How best to normalize F1 and F2 before quantifying their similarity with a metric? (using any supervised approach to normalization is not feasible because i dont want my algorithm to be tied to one set of images)

Solution in my mind:

Similarity(R1, R2) = dot_product(F1 / norm(F1), F2 / norm(F2))

In words, I first normalize F1 and F2 to be unit vectors and then use the dot product between the two vectors as a similarity measure.

I wonder if there are better ways to normalize them and compare them with a metric. I would be glad if the community can point me to some references and write out reasons why something else is better than the similarity measure I am using.

Urochrome answered 20/6, 2013 at 17:14 Comment(0)

State of the art image segmentation algorithms use Conditional Random Fields over Superpixels (IMO SLIC algorithm is the best option). This type of algorithms capture the relationship between adjacent superpixels at the same time they classify each superpixel (normally using SSVM).

For superpixel classifying you will normally collect a bag of features for each of them, such as SIFT descriptors, histograms, or whatever feature you think it might help.

There are many papers that describe this process, here you have some of them which I find interesting:

However, there are not many libraries or software for dealing with CRF. The best you can find out there is this blog entry.

Yand answered 20/6, 2013 at 18:3 Comment(1)

Thanks a lot for the references. There certainly are some ideas that i could adapt to my problem. – Urochrome 25/6, 2013 at 18:43

I lump all the features I compute for a region into a long feature vector. [...]

What are good metrics to quantify the similarity between F1 and F2? [...]

How best to normalize F1 and F2?

tl;dr: use a TF-IDF kind of scoring as described here (see Discrete  Approach, slides 18-35).

There is a (quite old) CBIR engine called GIFT (a.k.a The GNU Image-Finding Tool) that precisely follows such an approach to compute similarity between images.

What is precisely interesting with GIFT is that it applies techniques from text retrieval right to CBIR - which has become in some ways a classic approach (see A Text Retrieval Approach to Object Matching in Videos).

In practice GIFT extracts a large amount of local and global color and texture low-level features where each individual feature (e.g the amount of the i-th color within an histogram) can be thought as a visual word:

global color (HSV color histogram): 166 bins = 166 visual words
local color (color histogram analysis by recursively subdivide the input image into sub-regions): 340 (sub-regions) x 166 (bins) = 56,440 visual words
global texture (Gabor histogram): 3 (scales) x 4 (orientations) x 10 (ranges) = 120 visual words
local texture (Gabor histogram in a grid of sub-regions): 256 (sub-regions) x 120 (bins) = 30,720 visual words

So for any input image GIFT is able to extract a 87,446-dimensional feature vector F, keeping in mind that a feature is considered as either present (with a certain frequency F[i]) or not present in the image (F[i] = 0).

Then the trick consists in first indexing every image (here every region) into an inverted file for efficient querying. In a second step (query time) you are then free to use each region as a query image.

At query time the engine uses a classical TF-IDF scoring:

/* Sum: sum over each visual word i of the query image
 * TFquery(i): term frequency of visual word i in the query image
 * TFcandidate(i): term frequency of visual word i in the candidate image
 * CF(i): collection frequency of visual word i in the indexed database
 */
score(query, candidate) = Sum [ TFquery(i) * TFcandidate(i) * log**2(1/CF(i)) ]

Internally things are a bit more complex since GIFT:

performs sub-queries by focusing separately on each kind of low-level features (sub query 1 = color hist only, sub query 2 = color blocks, etc) and merges the scores,
includes features pruning to evaluate only a certain percentage of the features.

GIFT is pretty efficient so I'm pretty sure you may find interesting ideas there you could adapt. Of course you could avoid using an inverted index if you have no performance constraints.

Barriebarrientos answered 21/6, 2013 at 8:47 Comment(1)

thanks a lot for the slides and pointers --- though they dont seem to apply as is, I certainly found a few ideas which i could adapt to my scenario. – Urochrome 22/6, 2013 at 15:46

Just want to point out that you don't really need create unit vectors off of F1 or F2 before computing the cosine similarity (which is the dot product). This is because F1/norm(F1) will explicitly make each a unit vector for direction comparison.

Other metrics for vector comparison would include the Euclidean distance, Manhattan distance, or the Mahalanobis distance. The last one may not be quite applicable in your scenario. Please read wikipedia for more.

I myself have argued a few times about which one is better to choose, the Euclidean or the Cosine. Note that the context of either metric's usage is subjective. If in Euclidean space, you just want to measure if two points are aligned together, cosine measure makes sense. If you want explicit distance metric, Euclidean is better.

Hut answered 20/6, 2013 at 17:26 Comment(5)

Did you mean euclidean/manhattan after making them unit vectors? If not, then its not a good measureas it isnt normalized. To cite an example, take two pairs of adjacent regions (R1, R2) and (R3, R4). Now imagine (R1,R2) to be located in a place of the image where the illumination is dark and (R3,R4) are located where illumination is relatively bright and assume non-uniform illumination. Also, imagine(R1,R3) to have same texture and (R2,R4) to have same texture. In this case, a good measure should give Similarity(R1,R2) = Similarity(R3,R4) and both euclidean/manhattan may not produce this. – Urochrome 20/6, 2013 at 18:13

If you compute cosine similarity it would be identical to your proposed metric. That is, Similarity(R1, R2) = dot_product(F1 / norm(F1), F2 / norm(F2)) = dot_product(F1, F2) / (norm(F1) * norm(F2)) = Cosine_Similarity(R1, R2) – Deviant 20/6, 2013 at 21:5

Why would I want to compute unit vectors before obtaining the Euclidean distance? E.g. consider two points in the 2-D Euclidean space: (1,1) and (5,5). The Euclidean distance between two points as one can observe, would be sqrt(32). If I convert each of them to a unit vector, both will reflect the same point. Is that what we want? – Hut 21/6, 2013 at 13:6

You don't want to convert to unit vectors for calculating Euclidean distance. No one is suggesting that. – Deviant 21/6, 2013 at 16:6

@danf yes indeed. And, thanks for the comment. One concern i had with this metric is if the features are of vastly different scales -- ex: one feature confined to [0,1], and another varying widely between [10, 1000] --- when we make them a unit vector we end up dividing the narrow scale features by a huge value and i wonder if that would have a negative affect on the performance of the similarity measure? – Urochrome 24/6, 2013 at 22:0

Recommended topics

Hot tags