locality-sensitive-hash Questions

3

Solved

How can use fuzzy matching in pandas to detect duplicate rows (efficiently) How to find duplicates of one column vs. all the other ones without a gigantic for loop of converting row_i toString(...
Antilogarithm asked 14/9, 2016 at 12:13

6

Solved

I noticed that LSH seems a good way to find similar items with high-dimension properties. After reading the paper http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf, I'm still co...

5

I'm looking for a lightweight Java library that supports Nearest Neighbor Searches by Locality Sensitive Hashing for nearly equally distributed data in a high dimensional (in my case 32) dataset wi...
Sawtelle asked 28/3, 2012 at 14:57

1

Solved

I am looking for a performant algorithm to match a large number of people by location, gender and age according to this data structure: Longitude (denotes the persons location) Latitude (denotes ...

1

I am trying to implement LSH spark to find nearest neighbours for each user on very large datasets containing 50000 rows and ~5000 features for each row. Here is the code related to this. MinHash...
Collyrium asked 22/2, 2018 at 12:17

2

Solved

I have read a lot of tutorials, documents, and pieces of code implementing LSH (locality-sensitive hashing) with min-hash. LSH tries to find the Jaccard coefficient of two sets by hashing random s...
Radiomicrometer asked 7/1, 2013 at 21:11

1

I would like to approximately match Strings using Locality sensitive hashing. I have many Strings>10M that may contain typos. For every String I would like to make a comparison with all the other s...
Falls asked 4/8, 2014 at 8:17

1

is there any plugin allowing LSH on Elasticsearch? If yes, could you point me to the location and tell me a little how to use it? Thanks Edit: I found out that ES uses MinHash plugin. How could I ...
Hypso asked 25/9, 2015 at 8:8

1

I'm reading this survey about LSH, in particular citing the last paragraph of section 2.2.1: To improve the recall, L hash tables are constructed, and the items lying in the L (L ′ , L ′ < L...
Silage asked 13/6, 2016 at 6:39

1

Solved

Citing the E2LSH manual (it's not important that's about this specific library, this quote should be true for NN problem in general): E 2LSH can be also used to solve the nearest neighbor proble...

1

Solved

I read some paper about LSH and I know that is used for solving the approximated k-NN problem. We can divide the algorithm in two parts: Given a vector in D dimensions (where D is big) of any val...

1

I already have the algorithm to produce locality-sensitive hashes, but how should I bucket them to take advantage of their characteristics(i.e. similar elements have near hashes(with the hamming di...
Rapprochement asked 30/6, 2011 at 19:54

1

Solved

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper: Given bit vectors consisting of d bits each, we choos...

1

Solved

I'm trying to implement a general fingerprint memoizator: we have a file that can be expressed through an intelligent fingerprint (like pHash for images or chromaprint for audio) and if our desider...
Profant asked 9/5, 2016 at 8:23

1

Solved

Lists are not hashable. However, I am implementing LSH and I am seeking for a hash function that will correspond a list of positive integers (in [1, 29.000]) to k buckets. The number of lists is D,...
Intoxication asked 9/5, 2016 at 21:11

1

Solved

Matrix M is the signatures matrix, which is produced via Minhashing of the actual data, has documents as columns and words as rows. So a column represents a document. Now it says that every stri...

3

Solved

Following this question, we know that two different dictionaries, dict_1 and dict_2 for example, use the exact same hash function. Is there any way to alter the hash function used by the dictionar...

2

I am implementing a near-neighbor search application which will find similar documents. So far I have read a good portion of LSH related materials (theory behind LSH is some kind of confusing and I...
Mckinney asked 8/4, 2014 at 20:4

1

I have a large sparse numpy/scipy matrix where each row corresponds to a point in high-dimensional space. I want make queries of the following kind: Given a point P (a row in the matrix) and a dis...
Nucleoprotein asked 17/7, 2014 at 11:19

2

Solved

I'm programming a minhashing algorithm in Java that requires me to generate an arbitrary number of random hash functions (240 hash functions in my case), and run any number of integers through it (...
Dolomites asked 10/7, 2014 at 12:11

1

Currently I'm studying how to find a nearest neighbor using Locality-sensitive hashing. However while I'm reading papers and searching the web I found two algorithms for doing this: 1- Use L numbe...
Lennon asked 23/6, 2013 at 7:45

4

Solved

Are there any relatively simple to understand (and simple to implement) locality-sensitive hash examples in C/C++/Java/C#? I'd like to learn more about the concept and so want to try an impl...
Mccormick asked 24/4, 2011 at 10:10
1

© 2022 - 2024 — McMap. All rights reserved.