Backup/restore kafka and zookeeper

Asked 13/12, 2017 at 10:41 Answered 21/9, 2018 at 15:3

Solved apache-kafka backup apache-zookeeper restore

I am running a simple 3 node of kafka and 5 node of zookeeper to run the kafka, I would like to know which is the good way of backup my kafka, same for my zookeeper.

For the moment I just export my data directory to a s3 bucket...

Thanks.

Stymie answered 13/12, 2017 at 10:41 Comment(5)

Since you are running in replication mode. What do you mean by backup? – Marcy 13/12, 2017 at 12:32

If the datacenter where my kafka are burn what did I do ? Even if I got replication I would like being able to backup it and restore it :) – Stymie 13/12, 2017 at 14:35

Messages in kafka have a limited time to live anyway, do you want to store the current state, or all the historical series of every data which was contained in kafka? – Schoolboy 13/12, 2017 at 16:9

You might consider running another Kafka cluster in another geographic location and duplicating the flow to it. – Pilcher 18/12, 2017 at 23:11

@Pilcher That's what I've added recently, I only want the current state – Stymie 19/12, 2017 at 9:58

Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:

Maintain second Kafka cluster, to which all topics get replicated. I haven't verified this setup, but if offset topics are also replicated, then switching to another cluster shouldn't harm consumers' processing state.
Dump topics to cloud storage, e.g. using S3 connector (as described by Zalando). In case of restore, you recreate topics and feed it with data from your cloud storage. This would allow you to make point-in-time restore, but consumers would have to start reading from topic from the beginning.

The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.

Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.

Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.

Corella answered 19/1, 2018 at 9:16 Comment(6)

In case of maintaining another Kafka cluster and imagining a situation when the original cluster is broken because of e.g. hight network load or any other cause what guarantees that the 2nd cluster will not fail for the same reason if data to that one is being replicated to it – Mccollum 10/5, 2018 at 13:34

Exactly. While I would expect Kafka to not corrupt data because of high network load, I would like it to still be protected against human error. Kafka Streams is getting more popular, and it stores processing state. There are cases, where it is less trouble to have downtime and maybe loose some data, but restore accidentally corrupted state, than to let it run with corrupted state. It will depend on your use case though, whether you really care. For our use cases, we are working on ~point-in-time restore from S3. – Corella 11/5, 2018 at 14:26

@Corella - out of curiosity, what approach did you settle on for point-in-time restores? – Sivie 14/12, 2018 at 15:1

@Corella did you consider Confluent AWS S3 connector docs.confluent.io/current/connect/kafka-connect-s3/index.html for doing the backups? – Simulator 9/2, 2019 at 16:43

@Sivie We did not settle on any approach yet. Other tasks took priority. – Corella 21/2, 2019 at 22:6

@Simulator We were looking into it, but I don't remember our thoughts back then. I think it was just fine for backup, but for restore we would need something extra to correct committed offsets - offsets from old cluster will not match offsets in new cluster if Kafka's retention policy did cleanup records in the old cluster. – Corella 21/2, 2019 at 22:9

Apache Kafka already keeps your data distributed and also provide strong consistent replication capabilities.

From an architecture design point of view first we need to understand that what a backup means for us?

is it for surviving a data center failure?

As you said in the comment imagine the case when your entire datacenter is down, then it means that everything running in that datacenter is gone, not just the kafka. To handle such kind of failures, you need to design a real-time replication strategy to a different datacenter & you can use kafka-mirror maker for that. You need to set up a kafka cluster in a different data center (not necessarily with same hardware resources) and then configure your current data center Kafka to be mirrored on this other datacenter.

In the case of a datacenter wide failure, all of your services will be running from this fallback datacenter and they will be using your mirrored Kafka as the primary kafka.

Then once the other data center is back, you can set up the mirror in the opposite way and you can come to your old (destroyed) datacenter.

is it only backing up the Kafka/Zookeeper data?

Kafka connect has a couple of out the box connectors for transporting data from kafka with consistency guarantee. So maybe you can choose AWS S3 as your backup store and the following connector can do that for you.

Confluent AWS S3 connector.
Pinterest has secor service which transfer data to AWS S3, Google & Mircosoft Cloud storages. I am sure you can also find some dedicated connectors for all the big cloud providers. Few things which need to be considered in case of backing up the Kafka data to a highly available cloud storage.
kafka has a data retention policy per topic, so the old data will be removed from the Kafka servers by Kafka itself, but it will still stay in your AWS S3 bucket, so if you directly copy it back in case of a restore event then you will see much more data on Kafka brokers and also it will not be a good idea to restore entire data into existing running Kafka cluster because then you will start processing old data. So be selective & careful in this process
For zookeeper, you can also copy the data to AWS S3 but you need to be careful in restoring because of the ephemeral nodes. I have found few links which can help:

https://jobs.zalando.com/tech/blog/backing-up-kafka-zookeeper/ https://www.elastic.co/blog/zookeeper-backup-a-treatise https://medium.com/@Pinterest_Engineering/zookeeper-resilience-at-pinterest-adfd8acf2a6b

In the end, "Prevention is better than cure". So if you are running in a cloud provider setup like AWS then you can deploy your cluster setup by keeping failures upfront in your mind. Below link has some information.

https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

Narcis answered 21/9, 2018 at 15:3 Comment(0)

Recommended topics

Hot tags