redis cluster recovery downtime when master goes down [closed]

Asked 18/6, 2020 at 18:35 Answered 28/7, 2022 at 18:48

When a master goes down in a Redis cluster, Redis will wait for node timeout to promote slave to master. There may be additional time taken for slave promotion to master. During the time master goes down to slave promotion to master, writes/reads, especially writes will fail. How do I ensure zero downtime?

Lactic answered 18/6, 2020 at 18:35 Comment(0)

I think its a common problem with most databases. Lets say you have a mongo replicaset, and master goes down, it takes a while for the slave to be promoted, and you lose the writes, same with mongo shard, or mysql.

Even if redis could provide an instant failover(which is not possible), your writes could not be guaranteed unless you use AOF with write to disk on every operation, but that would be terribly slow and defeat the whole purpose of redis.

One solution to get closer to better guarantees for writes would be to push the data to a queue, like kafka and write to redis or any other datastore asynchronously. But then you introduce one more stack, and we have to worry about its failover also.

So, i think we should try to treat redis like a cache, and not as a permanent datastore.

Tineid answered 18/6, 2020 at 19:44 Comment(1)

Thanks for the insight. In my case, I am using redis as a cache but data to long term store is persisted via redis dump. So if user data (which is a constant stream) doesn't reach redis, we lose the data forever. Do you suggest a change in this design? Should I be persisting to long term store outside of redis? – Lactic 18/6, 2020 at 21:18

When it comes to design an architecture, we need to think about the tradeoffs. Yes, whenever redis master go down, there is some wait time that promotes one of the slaves to master and some of the writes may miss in the meantime. That's the nature of redis.

If you have a cluster with 1 Master and 3 slaves and you are writing, eventually you will write things to master and there should be a sync with slaves but redis doesn't wait for the acknowledgments from the slaves to send the acknowledgement back to client. If redis wants to do that, redis can't be this much quick.

At the end, redis can be useful only as a cache storage not the disk storage. But whenever you are facing CATCH MISS, you can search the things in permanent data storage like DBs. Don't use redis as a permanent storage and it won't built in such a way.

Shelf answered 28/7, 2022 at 18:48 Comment(0)

Recommended topics

Hot tags