cosine-similarity Questions
1
Solved
I'm working on a corpus of ~100k research papers. I'm considering three fields:
plaintext
title
abstract
I used the TfIdfVectorizer to get a TfIdf representation of the plaintext field and feed...
Unquestioned asked 6/2, 2017 at 13:1
0
I am trying to calculate similarity. First of all i used RAKE library to extract the keywords from the crawled jobs. Then I put the keywords of every jobs into separate array and then combined all ...
Achromatic asked 5/11, 2016 at 8:12
2
Solved
I was trying to use DBSCAN algorithm from scikit-learn library with cosine metric but was stuck with the error.
The line of code is
db = DBSCAN(eps=1, min_samples=2, metric='cosine').fit(X)
wh...
Albertson asked 23/9, 2015 at 17:10
1
Solved
As the title says, I'm trying to apply a function over each pair of columns of a dataframe under some conditions. I'm going to try to illustrate this. My df is of the form:
Code | 14 | 17 | 19 | ....
Flexure asked 19/7, 2016 at 10:0
1
I am handling one hundred thousand(100,000) documents(mean document length is about 500 terms). For each document, I want to get the top k (e.g. k = 5) similar documents by cosine similarity. So ho...
Startle asked 24/12, 2015 at 3:44
2
I want to calculate the cosine similarity of two lists like following:
A = [u'home (private)', u'bank', u'bank', u'building(condo/apartment)','factory']
B = [u'home (private)', u'school', u'bank'...
Werby asked 2/3, 2015 at 20:55
1
Solved
I am working on a document search problem where given a set of documents and a search query I want to find the document closest to the query. The model that I am using is based on TfidfVectorizer i...
Taxicab asked 11/8, 2015 at 0:41
3
I have a Pandas DataFrame that contains several string values.
I want to replace them with integer values in order to calculate similarities.
For example:
stores[['CNPJ_Store_Code','region','total...
Selfstarter asked 6/8, 2015 at 7:1
5
Solved
I want to pass myVector to another class (Case.java) but I get this kind of error message. Type mismatch: cannot convert from Object[] to int[]. Can anybody tell me how to solve this?
User.java
J...
Meal asked 29/7, 2015 at 10:5
1
Solved
I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings.
From the experime...
Joel asked 9/5, 2015 at 16:23
4
Given a query I have a cosine score for a document. I also have the documents pagerank. Is there a standard good way of combining the two?
I was thinking of multiply them
Total_Score = cosine-s...
Ivelisseivens asked 18/2, 2013 at 16:12
0
I am using python. Now I have a mongoDB database collection, in which all documents have such a format:
{"_id":ObjectId("53590a43dc17421e9db46a31"),
"latlng": {"type" : "Polygon", "coordinates":[...
Indraft asked 28/4, 2015 at 8:39
1
I have a scenario where i have retreived information/raw data from the internet and placed them into their respective json or .txt files.
From there on i would like to calculate the frequecies of...
Liturgics asked 16/12, 2014 at 4:29
2
Solved
How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )
when one of the vectors is all zeros?
v1 = [1, 1, 1, 1, 1]
v2 = [0, 0, 0, 0, 0]
When we calculate accord...
Marylandmarylee asked 2/11, 2014 at 13:13
3
Solved
I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at th...
Shaum asked 7/8, 2014 at 11:15
2
Solved
I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs.
The straightforward solution would require O(n2m) time.
Is ...
Wean asked 1/12, 2012 at 16:39
1
I am trying to use scikit's Nearest Neighbor implementation to find the closest column vectors to a given column vector, out of a matrix of random values.
This code is supposed to find the nearest...
Levity asked 12/4, 2014 at 15:50
1
Solved
Data Structure:
User has many Profiles
(Limit - no more than one of each profile type per user, no duplicates)
Profiles has many Attribute Values
(A user can have as many or few attribute values...
Langill asked 11/2, 2014 at 18:40
1
Solved
I'm working on a high-dimensional problem (~4k terms) and would like to retrieve top k-similar (by cosine similarity) and can't afford to do a pair-wise calculation.
My training set is 6million x ...
Moonier asked 22/9, 2013 at 17:54
2
I need to compute the cosine similarity between strings in a list. For example, I have a list of over 10 million strings, each string has to determine similarity between itself and every othe...
Madai asked 23/2, 2013 at 14:34
3
I have a large data set that I would like to cluster. My trial run set size is 2,500 objects; when I run it on the 'real deal' I will need to handle at least 20k objects.
These objects have a cos...
Scapegoat asked 22/6, 2012 at 5:14
2
Solved
Suppose we have buyers and sellers that are trying to find each other in a market. Buyers can tag their needs with keywords; sellers can do the same for what they are selling. I'm interested in fin...
Peshitta asked 28/2, 2011 at 13:21
© 2022 - 2024 — McMap. All rights reserved.