how to migrate mysql data to ElasticSearch realtime
Asked Answered
T

3

13

I have a mysql database with couple tables, I wanna migrate the mysql data to ElasticSearch. It's easy to migrate the whole database to ES via a batch job. But how should I update ES from mysql realtime. i.e if there was a update operation in mysql then I should do the same operation in ES. I researched mysql binLog which can reflect any changes from mysql. But I have to parse binLog to ES syntax, I think it's really painful. Thanks! (the same case with Solr)

Tref answered 9/10, 2015 at 0:32 Comment(0)
K
11

There is an existing project which takes your binlog, transforms it and ships it to Elasticsearch, You can check it out at: https://github.com/siddontang/go-mysql-elasticsearch

Another one would be this one: https://github.com/noplay/python-mysql-replication.

Note, however, that whichever you pick, it's a good practice to pre-create your index and mappings before indexing your binlog. That gives you more control over your data.

UPDATE:

Here is another interesting blog article on the subject: How to keep Elasticsearch synchronized with a relational database using Logstash

Kenny answered 9/10, 2015 at 3:32 Comment(2)
Thank you so much! go-mysql-es is awesome! Just one question: do you have idea how does it do Upsert operation? e.g I have two tables t1(uid,name),t2(uid,age) they are having the same id and the two tables are corresponding to one index. But when one table was updated it would overwrite(remove) the existing record in ES. Actually I hope that is update not overwrite.Tref
@Tref answering your comment so that it could help someone. Update operation always create a new document, update the version of document and the mark the previous version for deletion. This is the standard way how ES works.Ardellardella
T
1

The best open source solution would be this. You can run this as a command line and give the incremental logic too in the command.

GO through this session to get a complete idea.

Tufted answered 9/10, 2015 at 3:21 Comment(1)
Thanks! but it's not what I'm looking for. The doc only shows how to get incremental data, but I do need to monitor the deleted data and updated data.Tref
D
0

I guess best option is to simply use Kafka connect plugin called debezium, and use the Mysql Connector for source, and Elastic Search sink connector

Durrace answered 19/5, 2021 at 11:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.