record-linkage Questions

3

Solved

How can use fuzzy matching in pandas to detect duplicate rows (efficiently) How to find duplicates of one column vs. all the other ones without a gigantic for loop of converting row_i toString(...
Antilogarithm asked 14/9, 2016 at 12:13

2

Solved

I'm reasonably new to machine learning, I've done a few projects in python. I'm looking for advice on how to approach the below problem which I believe could be automated. A user in a data quality...

1

Solved

Let's say that I have an MDM system (Master Data Management), whose primary application is to detect and prevent duplication of records. Every time a sales rep enters a new customer in the system...

1

My team has been stuck with running a fuzzy logic algorithm on a two large datasets. The first (subset) is about 180K rows contains names, addresses, and emails for the people that we need to match...

2

I have a question that is somewhat high level, so I'll try to be as specific as possible. I'm doing a lot of research that involves combining disparate data sets with header information that refer...

6

Solved

I have a large database (potentially in the millions of records) with relatively short strings of text (on the order of street address, names, etc). I am looking for a strategy to remove inexact d...

2

Solved

I'm trying to use the Dedupe package to merge a small messy data to a canonical table. Since the canonical table is very large (122 million rows), I can't load it all into memory. The current appr...
Reuter asked 15/7, 2015 at 18:9

2

Solved

I have the following problem and was thinking I could use machine learning but I'm not completely certain it will work for my use case. I have a data set of around a hundred million records contai...

3

Solved

I'm developing an application which must be able to find & merge duplicates in a Hundreds of thousands of contact information stored in sql server DB. I have to compare all the columns in the t...
Louise asked 4/10, 2013 at 11:54
1

© 2022 - 2024 — McMap. All rights reserved.