Right way to compute cosine similarity between two arrays?
Asked Answered
R

2

8

I am working on a project that detects some features of two input images(handwritten signatures) and compares those two features using cosine similarity. Here When I mean two input images, one is an original image, and other is duplicate image. Say I am extracting 15 such features of one image(original image) and storing it in one array(Say, Array_ORG), and features of other image is stored in Array_DUP similarly. Now, I am trying to calculate the cosine similarity between these two arrays. These arrays are of double datatype.

I am listing down two methods that I followed:

1)Manual calculation of cosine similarity:

main(){

for(int i=0;i<15;i++)
    sum_org += (Array_org[i]*Array_org[i]);
for(int i=0;i<15;i++)
    sum_dup += (Array_dup[i]*Array_dup[i]);
double magnitude = sqrt(sum_org +sum_dup );
double cosine_similarity = dot_product(Array_org, Array_dup, sizeof(Array_org)/sizeof(Array_org[0]))/magnitude;
}

double dot_product(double *a, double* b, size_t n){
double sum = 0;
    size_t i;

    for (i = 0; i < n; i++) {
            sum += a[i] * b[i];
    }

    return sum;
}

2)Storing the values into a Mat and calling dot function:

Mat A = Mat(1,15,CV_32FC1,&Array_org);
Mat B = Mat(1,15,CV_32FC1,&Array_dup);
double similarity = cal_theta(A,B);

double cal_theta(Mat A, Mat B){
double ab = A.dot(B);
double aa = A.dot(A);
double bb = B.dot(B);
return -ab / sqrt(aa*bb);
}

I have read that cosine similarity value ranges from -1 to 1, with -1 saying both are exxactly opposite, and 1, saying both are equal. But first function gives me values in 1000's and second function gives me values more than 1.
Please guide me which process is right, and why? Also how do I infer the similarity if cosine similarity values are more than 1?

Regolith answered 22/5, 2015 at 19:2 Comment(0)
R
17

The correct definition of cosine similarity is :

enter image description here

Your code does not compute the denominator, hence the values are wrong.

double cosine_similarity(double *A, double *B, unsigned int Vector_Length)
{
    double dot = 0.0, denom_a = 0.0, denom_b = 0.0 ;
     for(unsigned int i = 0u; i < Vector_Length; ++i) {
        dot += A[i] * B[i] ;
        denom_a += A[i] * A[i] ;
        denom_b += B[i] * B[i] ;
    }
    return dot / (sqrt(denom_a) * sqrt(denom_b)) ;
}
Runnel answered 22/5, 2015 at 19:8 Comment(8)
I am sorry, There seems to be a typo, I am dividing the dot_product with magnitude only. I changed the names of the variables for better understanding while posting, and a typo occured.Regolith
Ohkk, go through my modified code. You are calculating the denominator wrong! It should be sqrt(sum_org*sum_dup). You were adding instead of multiplying.Runnel
Thank You! Got the mistake :) Now, both methods yield the same answer but do you know how do I infer the similarity if the value is more than 1?Regolith
@ShruthiKodi the magnitude of the value cannot be more than 1, as |AB|^2 <= norm(A)norm(B) (Cauchy-Schwartz inequality)Assemblyman
@vsoftco, Exactly! That is my concern. The code above is giving me value like " 2.6821e+006" . I am seriously confused as to what should I understand from this value? Any clues will be helpful.Regolith
@ShruthiKodi make sure you're no accessing out of bounds, as it cannot possibly give you a value greater than 1Assemblyman
As @Assemblyman has said, the function cannot return anything greater than 1.0 or less than 0.0. So, do check what you are accessing.Runnel
Thanks. I will check the code once again. Thanks for the inputs. :)Regolith
F
0

Just adding a method that with Opencv(C++) to calculate to feature vectors cosine similarity:

float cosSim = f1.dot(f2) / (cv::norm(f1) * cv::norm(f2));

where f1 and f2 are both 1-dimension cv::Mat with size (1, xx).

Flofloat answered 5/11, 2020 at 3:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.