How to elect a master node among the nodes running in a cluster?
Asked Answered
A

4

34

I'm writing a managed cloud stack (on top of hardware-level cloud providers like EC2), and a problem I will face soon is:

How do several identical nodes decide which one of them becomes a master? (I.e. think of 5 servers running on EC2. One of them has to become a master, and other ones have to become slaves.)

I read a description of the algorithm used by MongoDB, and it seems pretty complicated, and also depends on a concept of votes — i.e. two nodes left alone won't be able to decide anything. Also their approach has a significant delay before it produces the results.

  1. I wonder if there are any less complicated, KISS-embrasing approaches? Are they used widely, or are they risky to adopt?

  2. Suppose we already have a list of servers. Then we can just elect the one that is up and has a numerically smallest IP address. What are downsides of this approach?

  3. Why is MongoDB's algorithm so complicated?

This is a duplicate of How to elect new Master in Cluster?, which gives less details and has not been answered for 6 months, so I feel it is appropriate to start a new question.

(The stack I'm working on is open-source, but it's on a very early stage of development so not giving a link here.)

UPDATE: based on the answers, I have designed a simple consensus algorithm, you can find a JavaScript (CoffeeScript) implementation on GitHub: majority.js.

Adkinson answered 23/12, 2010 at 23:22 Comment(0)
C
17

Leader election algorithms typically consider the split brain as a fault case to support. If you assume that it's not the nodes that fail but the networking, you may run into the case where all nodes are up, but fail to talk to each other. Then, you may end up with two masters.

If you can exclude "split brain" from your fault model (i.e. if you consider only node failures), your algorithm (leader is the one with the smallest address) is fine.

Corniculate answered 23/12, 2010 at 23:30 Comment(5)
Thanks. But I think MongoDB completely refuses to operate when less than a majority of the nodes are online. Any other ideas on why it needs to be complicated?Adkinson
@Andrey: this is the split brain cause. If you have the cluster split into two parts, the smaller part must not elect a leader. By doing so, you can guarantee that only the larger half of nodes (which is assumed to be online, but unreachable) will have a leader, guaranteeing that there will be at most one leader at any point in time.Goodell
OK, seems that for MongoDB, the protocol needs to be complex because they want to base it on the maxLocalOpOrdinal (to avoid losing excessive data on master re-election). I would be fine with any node elected, so basing my election on IPs seems to make things simpler. And yes, I get your point about network splits, I guess initially I will require 3+ nodes, and they won't do election unless a majority of nodes is online. Thanks.Adkinson
Also many thanks for “leader election” keywords. This brings in some useful Wikipedia knowledge.Adkinson
Martin, thanks again for you answer, it helped me get a first version of my consensus alg implemented, see majority.js on GitHub. We'll see how well it will work in practice.Adkinson
A
6

Use Apache ZooKeeper. It solves exactly this problem (and many more).

Antlia answered 24/12, 2010 at 0:9 Comment(4)
Hey, thanks for your answer, but I've decided I do not want to bring in any monster stuff into my stack. And omfg it's written in Java, it's never a good sign. :) If you're interested, see first ver of my impl as majority.js on GitHub. We'll see how well it will work in practice.Adkinson
Your code doesn't handle concurrent voting. Suppose node 1 runs the protocol and elects master A. Concurrently, node 2 runs the protocol and node A is netsplit from node 2. Node 2 elects node B as the master. Now you've got two masters. You need to implement the en.wikipedia.org/wiki/Paxos_algorithm . This is harder than it first appears. Getting ZooKeeper running (regardless of language bias) is a lot easier than implementing Paxos. An ad-hoc leader election algorithm will work mostly, then fail suddenly on you.Antlia
Spike, thanks for your comment (only saw it now, sorry). In my case, I pull an authoritative list of nodes from a trusted third-party (Amazon S3), so nodes just avoid changing their votes unless more than half of all nodes are reachable. I believe this handles network splits. Right?Adkinson
I likewise implemented Apache Zookeper (via Netflix Curator APIs) to manage the write-leader for a globally distributed (Germany, Virigina - US, Oregon - US) 9-node Kafka cluster and can attest to this working well. The key to avoiding split-brain cases is using an odd number of nodes and a majority election algorithm.Yukyukaghir
S
4

If your nodes also need to agree on things and their total order, you might want to consider Paxos. It's complicated, but nobody's come up with an easier solution for distributed consensus.

Saving answered 24/12, 2010 at 2:22 Comment(4)
Hey, thanks for the answer. For now I've implemented a really simple library available as majority.js on GitHub. We'll see soon, maybe I'll need something more complex — but I wish to keep it as simple as possible.Adkinson
Since your answer, a new algorithm has been created called Raft. In the paper "In search of an understandable consensus algorithm", it was shown to have the same properties and guarantees as multi-Paxos, but a user study showed it to be noticeably easier to learn. It also has many open source implementations.Eve
@AndreyTarantsov any specific reason to elect minimal ID as a master ?Kenny
@Kenny Well it doesn't have to be minimal, but it has to be something all nodes will agree on (i.e. that all nodes will compute in exactly the same way).Adkinson
R
3

Me like this algorithm:

  • Each node calculates the lowest known node id and sends a vote for leadership to this node
  • If a node receives sufficiently many votes and the node also voted for itself, then it takes on the role of leader and starts publishing cluster state.

and at link below have some many algorithm of election master-node in cluster: https://www.elastic.co/blog/found-leader-election-in-general#the-zen-way

Also can see Raft-algorithm: https://raft.github.io

Routine answered 6/11, 2016 at 15:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.