Redis HA setup on AWS with sentinels - redis-nodes seen by different sentinels end up on endless loop

Our setup

3x redis sentinels one on each AWS Sydney AZ
2 to 500 redis nodes with a master and a multiple slaves that scales horizontally automatically using AWS Auto-scaling group policies
1x Write ELB that pushes traffic to the master
1x Read ELB that pushes traffic to the slaves
1x Sentinel ELB that pushes traffic to the sentinels
1x Facilitator (more on this bellow)

This setup is replicated across two clusters for what we call metadata and cache. We want to deploy more clusters.

Facilitator

Is a python daemon built by us that subscribes to sentinels pub/sub and listens to +switch-master messages. Here are the actions the facilitator takes:

Detects and master failover triggered by +switch-master
Queries sentinels for the new master using SENTINEL get-master-addr-by-name mymaster
Tags old master with RedisClusterNodeRole = slave
Tags new master with RedisClusterNodeRole = master
Adds new master into our write ELB
Removes new master from our read ELB
Removes old master from our write ELB
Tries to add the old master to our read ELB (this will fail if the server is down, which is fine)

Problem

Because slaves can come and go multiple times per day dependent on traffic it happens that we end up with some slaves belonging to the sentinels from both clusters fighting for the same slave. This happens because the IP pools are shared between clusters, and as far as we are aware the slaves ID are their IPs.

Here's how to replicate:

Cluster cache has a master with IP 172.24.249.152
Cluster cache has a master failover promoting slave with IP 172.24.246.142 to master. Node with IP 172.24.249.152 is now off
Cluster metadata scales up and the DHCP assigns the IP 172.24.249.152 (previous master on cluster cache)
Cluster cache will see that it's previous master is now up and will try to reconfigure it as a slaveof 172.24.246.142 (new master on cache cluster)
Cluster metadata will trigger a +sdown on 172.24.246.142 and after a while a -sdown followed by a +slave-reconf-sent to it to try to reconfigure it as a slave of metadata cluster
Cluster cache will try to do the same as cluster metadata is doing on point 5.

Sentinels get stuck in this endless loop fighting for that resource forever. Even when we only have a sentinels group managing both redis clusters with different master name this happens. This lead us to believe that sentinels are not aware of the resources between different clusters but rather they just do what is logical for each cluster separately.

Solutions we tried

Triggering a SENTINEL reset mymaster after a +sdown event, to try to make the sentinels to forget about that node. Problem with this is that it can generate a race condition if that cluster is performing a master failover. We successfully replicated this assumption and got left with sentinels out of sync where one is pointing to one master and the other two are pointing to another.
Segregate the network into pools of IPs one per cluster. This works because the IPs are never reused but also makes things a lot less agile and more complicated when we need a new cluster. This is the solution we ended up going for but we'd like to avoid it if possible.

Ideal Solution(s)

Redis sentinel to provide a SENTINEL removeslave 172.24.246.142 mymaster that we can run everytime there's a +sdown event on a slave. This will make that cluster to forget that slave ever existed without creating the side effects that a SENTINEL reset mymaster has.
Stop identifying the uniqueness of the slaves solely by the IP. Maybe add a redis server start timestamp or any other token that prevents slaves that are shutdown and new ones that brought back up with the same IP to be seen as the same node.

Question

Can you guys think of any other solution that doesn't involve changing redis sentinel code and that does't require to segregate IP pools between clusters?

Recommended topics

Hot tags