Difference between ensemble and quorum in zookeeper
Asked Answered
S

4

15

I am new to zookeeper. I have configured it on a single machine. But I came across the words "ensemble" and "quorum" in the documentation of zookeeper.

Can anyone please tell me the difference between these?

  • Ensemble
  • Quorum
Sag answered 7/8, 2014 at 5:11 Comment(1)
H
32

This answer is for those who still have doubt understanding Ensemble and Quorum. Ensemble is nothing but a cluster of Zookeeper servers, where in Quorum defines the rule to form a healthy Ensemble. Which is defined using a formula Q = 2N+1 where Q defines number of nodes required to form a healthy Ensemble which can allow N failure nodes. You will understand about this formula in the following example.

Before I start with an example, I want to define 2 things-
Cluster: Group of connected nodes/servers (now on will use node) with one node as Leader/Master and rest as Followers/Slaves.
Healthy Ensemble: A cluster with only one active Leader at any given point of time, hence fault tolerant.

Let me explain with an example, which is used commonly across while defining Ensemble and Quorum.

  1. Lets say you have 1 zookeeper node. No need to worry here as we need more than 1 node to form a cluster.
  2. Now take 2 nodes. There is no problem forming a cluster but there is problem to form a healthy Ensemble, because - Say the connection between these 2 nodes are lost, then both nodes will think the other node is down, so both of them try to act as Leader, which leads to inconsistency as they can't communicate with each other. Which means cluster of 2 nodes can't even afford even a single failure, so what is the use of this cluster??. They are not saying you can't make a cluster of 2 nodes, all they are saying is - it is same as having single node, as both don't allow even a single failure. Hope this is clear
  3. Now take 3 nodes. There is no problem forming a cluster or healthy Ensemble - as this can allow 1 failure according the formula above 3 = 2N+1 => N = (3-1)/2 = 1. So when the next failure occurs (either connection or node failure), no node will be elected as Leader, hence the Ensemble won't serve any write/update/delete services, hence the states of the client cluster remains consistent across zookeeper cluster nodes. So the Leader election won't happen until there is majority nodes available and connected, where Majority m = (n/2)+1, where n stands for number of nodes available when the previous election happened. So here, 1st election happened with 3 nodes (as its a 3 node cluster). Then there was a 1st failure, so remaining 2 nodes can conduct election, as they have majority m = (3/2)+1 = 2. Then 2nd failure happened, now they don't have majority as there is only one node available for election, but the majority required is m = (2/2)+1 = 2.
  4. Now take 4 nodes. There is no problem forming a cluster or healthy Ensemble, but having 4 nodes is same as 3 nodes, because both allows only 1 failure. Lets derive it from the Quorum formula 4 = 2N+1 => N = (4-1)/2 = ⌊1.5⌋ = 1 //floor(1.5)=1
  5. Now take 5 nodes. There is no problem forming a cluster or healthy Ensemble - as this can allow 2 failure according the formula above 5 = 2N+1 => N = (5-1)/2 = 2.
  6. Now take 6 nodes. There is no problem forming a cluster or healthy Ensemble, but having 6 nodes is same as 5 nodes, because both allows only 2 failure. Lets derive it from the Quorum formula 6 = 2N+1 => N = (6-1)/2 = ⌊2.5⌋ = 2


Conclusion:

  1. To form a Quorum we need atleast 3 nodes - as 2 node cluster can't even handle single failure
  2. Its good to form an Ensemble of odd number of nodes - as n (even number) nodes tends to allow same number of failure as of n-1 (odd number) nodes
  3. Its not good to have more nodes, as they add latency into performance. Suggested Production cluster size is 5 - if one server is down for maintenance, it still can handle one more failure.
Hornback answered 8/3, 2019 at 5:56 Comment(0)
S
11

Ensemble is an array of nodes (or servers, if you like) that form your Distributed Computer Ecosystem.

Quorum is when things get interesting. On a specific assignment/job, Quorum ensures that a healthy leader-follower majority can be maintained. In other words, a conduct which ensures that majority vote can be obtained to proceed with an activity (e.g. commit/update/delete etc.). In Replication strategy, quorum is a must to have.

Lets try and use non-technical examples:

1) In your company - there is a board formed by 5 directors (ensemble).

|d1, d2, d3, d4, d5|----- BoD

2) Each director has equal say in each decision. But a majority if 3 directors at anytime should agree on a project. If no majority is there, the company will be dysfunctional.

3) One a particular project, P1 - they randomly voted to have a majority of d1,d2,d3 to be decision makers in the project. but d4 and d5 are fully aware of what's going on (so that they can step in anytime).

4) Now (God forbid), d3 passes away after a few months, again everyone agrees that the majority will be formed using d1,d2,d4. d5 is still aware of what's going on. NOte that we only have 4 directors left.

5) Disaster strikes again. d5 leaves the company for another competitor. But that doesn't change anything because the company is still functional with a 3-member BoD.

6) At any point of another disaster strikes the BoD and any of the directors become "unavailable" - company is dysfunctional i.e. we have lost the quorum forming criterion.

Zookeeper uses ceil(N/2) - 1 formula to get the maximum number of failures allowed for an Ensemble and maintain a stable quorum. In this case, the minimum recommended ensemble nodes are 3 (tolerates 1 failure maximum).

Segalman answered 20/7, 2017 at 10:25 Comment(2)
if minimum recommended ensemble nodes are 3, then if 1 failure happens, who will be elected out of two? Why minimum of 2 is not sufficient, if one master dies, still the remaining one slave can act as master right?Inkwell
@KanagaveluSugumar It also depends on how you would like to consider a healthy ensemble status for your system. In Apache kafka, you could configure consistency level. But in another complex system with strong consistency requirement, having an unstable quorum isn't acceptable, which means as soon as you have only 1 master 1 slave, there is no guarantee that the service would be stable.Segalman
E
6

When you want to have high availability in zookeeper server you use multiple zookeeper servers to create an ensemble. Basically zookeeper has master-slave architecture. In an ensemble there will be one master and rest will be the slaves. If the master fails one of the slaves will act as a master.

The sequence in which a master is assigned is called as quorum. When you create an ensemble, zookeeper internally creates a sequence ID for the slave severs. When the main master fails it will check the next sequence ID to create a new master. This concept of quorum also used while creating nodes in zookeeper.

Electronegative answered 9/8, 2014 at 7:43 Comment(0)
W
0

ensemble: Numbers of nodes in the group. Quorum: Number of required nodes to take the action. Example: you have 5 nodes. ensemble is 5. But according to majority rule Quorum should be 3. If we write no 3 nodes successfully, then we send success response to the client. Apache Zookeeper Quorum

Windstorm answered 6/8, 2019 at 11:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.