cosine-similarity Questions

1

Solved

I'm working on a corpus of ~100k research papers. I'm considering three fields: plaintext title abstract I used the TfIdfVectorizer to get a TfIdf representation of the plaintext field and feed...
Unquestioned asked 6/2, 2017 at 13:1

0

I am trying to calculate similarity. First of all i used RAKE library to extract the keywords from the crawled jobs. Then I put the keywords of every jobs into separate array and then combined all ...
Achromatic asked 5/11, 2016 at 8:12

2

Solved

I was trying to use DBSCAN algorithm from scikit-learn library with cosine metric but was stuck with the error. The line of code is db = DBSCAN(eps=1, min_samples=2, metric='cosine').fit(X) wh...

1

Solved

As the title says, I'm trying to apply a function over each pair of columns of a dataframe under some conditions. I'm going to try to illustrate this. My df is of the form: Code | 14 | 17 | 19 | ....
Flexure asked 19/7, 2016 at 10:0

1

I am handling one hundred thousand(100,000) documents(mean document length is about 500 terms). For each document, I want to get the top k (e.g. k = 5) similar documents by cosine similarity. So ho...

2

I want to calculate the cosine similarity of two lists like following: A = [u'home (private)', u'bank', u'bank', u'building(condo/apartment)','factory'] B = [u'home (private)', u'school', u'bank'...
Werby asked 2/3, 2015 at 20:55

1

Solved

I am working on a document search problem where given a set of documents and a search query I want to find the document closest to the query. The model that I am using is based on TfidfVectorizer i...
Taxicab asked 11/8, 2015 at 0:41

3

I have a Pandas DataFrame that contains several string values. I want to replace them with integer values in order to calculate similarities. For example: stores[['CNPJ_Store_Code','region','total...
Selfstarter asked 6/8, 2015 at 7:1

5

Solved

I want to pass myVector to another class (Case.java) but I get this kind of error message. Type mismatch: cannot convert from Object[] to int[]. Can anybody tell me how to solve this? User.java J...
Meal asked 29/7, 2015 at 10:5

1

Solved

I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings. From the experime...
Joel asked 9/5, 2015 at 16:23

4

Given a query I have a cosine score for a document. I also have the documents pagerank. Is there a standard good way of combining the two? I was thinking of multiply them Total_Score = cosine-s...
Ivelisseivens asked 18/2, 2013 at 16:12

0

I am using python. Now I have a mongoDB database collection, in which all documents have such a format: {"_id":ObjectId("53590a43dc17421e9db46a31"), "latlng": {"type" : "Polygon", "coordinates":[...
Indraft asked 28/4, 2015 at 8:39

1

I have a scenario where i have retreived information/raw data from the internet and placed them into their respective json or .txt files. From there on i would like to calculate the frequecies of...

2

Solved

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity ) when one of the vectors is all zeros? v1 = [1, 1, 1, 1, 1] v2 = [0, 0, 0, 0, 0] When we calculate accord...
Marylandmarylee asked 2/11, 2014 at 13:13

3

Solved

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at th...

2

Solved

I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs. The straightforward solution would require O(n2m) time. Is ...
Wean asked 1/12, 2012 at 16:39

1

I am trying to use scikit's Nearest Neighbor implementation to find the closest column vectors to a given column vector, out of a matrix of random values. This code is supposed to find the nearest...

1

Solved

Data Structure: User has many Profiles (Limit - no more than one of each profile type per user, no duplicates) Profiles has many Attribute Values (A user can have as many or few attribute values...
Langill asked 11/2, 2014 at 18:40

1

Solved

I'm working on a high-dimensional problem (~4k terms) and would like to retrieve top k-similar (by cosine similarity) and can't afford to do a pair-wise calculation. My training set is 6million x ...
Moonier asked 22/9, 2013 at 17:54

2

I need to compute the cosine similarity between strings in a list. For example, I have a list of over 10 million strings, each string has to determine similarity between itself and every othe...

3

I have a large data set that I would like to cluster. My trial run set size is 2,500 objects; when I run it on the 'real deal' I will need to handle at least 20k objects. These objects have a cos...

2

Solved

Suppose we have buyers and sellers that are trying to find each other in a market. Buyers can tag their needs with keywords; sellers can do the same for what they are selling. I'm interested in fin...
Peshitta asked 28/2, 2011 at 13:21

© 2022 - 2024 — McMap. All rights reserved.