Python: check cosine similarity between mongoDB database documents
Asked Answered
I

0

6

I am using python. Now I have a mongoDB database collection, in which all documents have such a format:

{"_id":ObjectId("53590a43dc17421e9db46a31"),
 "latlng": {"type" : "Polygon", "coordinates":[[[....],[....],[....],[....],[.....]]]}
 "self":{"school":2,"home":3,"hospital":6}
 }

In which the field "self" indicates the venue types in the Polygon and the number of corresponding venue types. different documents have different self field, such as {"KFC":1,"building":2,"home":6}, {"shopping mall":1, "gas station":2}

Now I need to calculate the cosine similarity between two "self" fields of two documents. Before, all my documents are saved as dictionaries in a pickle file, and I use following codes to calculate the similarity:

vec = DictVectorizer()
total_arrays = vec.fit_transform(data + citymap).A
vector_matrix = total_arrays[:len(data)]
citymap_base_matrix = total_arrays[len(data):]

def cos_cdist(matrix, vector):
v = vector.reshape(1, -1)
return scipy.spatial.distance.cdist(matrix, v, 'cosine').reshape(-1)

for vector in vector_matrix:
    distance_result = cos_cdist(citymap_base_matrix,vector)

Here, the data and citymap are just like [{"KFC":1,"building":2,"home":6},{"school":2,"home":3,"hospital":6},{"shopping mall":1, "gas station":2}]

But now I am using mongoDB and I want to know if there is mongoDB method to calculate the similarity in a more straightforward way, any idea?

Indraft answered 28/4, 2015 at 8:39 Comment(1)
It's not a feature of MongoDB to compute cosine similarity. I think using approximately the same method, but getting the inputs from the database instead of a pickle file, is the way to go.Ambulacrum

© 2022 - 2024 — McMap. All rights reserved.