Zookeeper multiple leader election issue

Asked 27/5, 2016 at 8:14 Answered 25/6, 2021 at 23:44

Solved java apache-zookeeper distributed distributed-computing

I have a distributed application that uses ZooKeeper for leader election. Only the elected leader can commit to the database. I recently discovered that there is a potential situation which could lead to multiple leaders. The situation arises when the elected leader is paused for a long GC and can lose the heartbeat to the ZooKeeper, leading to the election of a new leader. At this point, both the nodes think themselves to be the leader and can lead to conflict.

Any suggestions on how to avoid such situation ?

Dehydrogenase answered 27/5, 2016 at 8:14 Comment(5)

I would first investigate on why the GC paused for so long, It may lead you to performance issues. If GC is the only cause then there must be a setting to wait longer? – Whiffletree 27/5, 2016 at 8:28

Thanks Minh. GC Pauses are kinda unavoidable in most large-scale application. You can optimize to have fewer pauses but totally avoiding a long GC pause is not practically possible. Also, the problem with the setting to wait longer is that this could lead to a situation where there is no leader for a long time. – Dehydrogenase 27/5, 2016 at 8:38

Sure but to pause so long that caused the system to think it's down is not ideal. Either you tune the wait time or reduce the pause time. – Whiffletree 27/5, 2016 at 8:40

I was thinking about adding a check to verify that the node is still a leader, just before it tries to commit to DB. Wouldn't that address the issue ? – Dehydrogenase 27/5, 2016 at 8:42

But long GC cause the whole application to pause, Java can't response until GC recovered. Unless your running a seperate process I can't see how you get around that. There was an article talking about the world come to a stand still when GC kicked in :) however I don't remember the link. Hopes this help. – Whiffletree 27/5, 2016 at 8:47

When you use ZooKeeper for leader election you can't guarantee uniqueness of the leader.It's possible to run into this situation even without GC pauses. For example, when a leader is isolated from the ZooKeeper quorum during a network partitioning or when a leader issues a long running query, dies and a new leader can issue a new query while the current is still active.

The workaround is to use compare-and-set when you update the database. Once new leader is elected you should get an increasing leader id (e.g. by updating a node in ZooKeeper and using its version or mzxid) and use it to guard each transaction issued by that leader.

For example if you want to change the state of the db then instead of the following transaction:

BEGIN TRANSACTION;
db.update($change);
END TRANSACTION;

you should use something like

BEGIN TRANSACTION;
if (db.leaderID <= $leaderID) {
    db.leaderID = $leaderID;
    db.update($change);
}
END TRANSACTION;

This trick will protect your system from uncertainties caused by concurrent leaders. Of course your database should be linearizable and support compare-and-set.

Murtagh answered 27/5, 2016 at 22:29 Comment(1)

Ah, I see. so you persists the leaderId to the DB immediately after the election, and then compare and set at the time of commit. Excellent idea !! – Dehydrogenase 28/5, 2016 at 5:8

To correct one of the answers, Zookeeper does guarantee leader uniqueness on network partitioning with quorum-based consistency. Upon a network partitioning, if a leader is isolated from a quorum, it will lose its leadership due to incapable of connecting to a quorum of nodes. In the meanwhile, a new leader is elected in the other partition. For the same reason, the partition where the old leader is in is unable to elect a new leader. The situation is resolved after the network partition is recovered by issuing a new leader election.

Paprika answered 25/6, 2021 at 23:44 Comment(0)

Recommended topics

Hot tags