How does Hadoop Namenode failover process works? [closed]
Asked Answered
R

1

19

Hadoop defintive guide says -

Each Namenode runs a lightweight failover controller process whose job it is to monitor its Namenode for failures (using a simple heartbeat mechanism) and trigger a failover should a namenode fail.

How come a namenode can run something to detect its own failure?

Who sends heartbeat to whom?

Where this process runs?

How it detects namenode failure?

To whom it notify for the transition?

Rentfree answered 23/10, 2015 at 21:21 Comment(0)
P
0

According to the hadoop docs, that you can find here, in order to implement automatic failover there are a couple of things that need to be added to an HDFS deployment:

1: a Zookeeper quorum

2: the ZKFailoverController process.

To answer your questions from the docs:

Each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered

So to answer your questions:

Q: How come a namenode can run something to detect its own failure?

A: Each name-node maintains a session on ZooKeeper via a ZKFailoverController (ZKFC) service that runs on the same machine. When this session expires the other name-node will be notified that a failover should be triggered.

The ZKFC health monitor also periodically pings its local name-node (this is your heartbeat), if the name-node crashes the health monitor marks that name-node as unhealthy.

When the failed name-node is healthy and is the active name-node, it maintains a special "lock" znode. When the name-node is marked as unhealthy, this lock is deleted. When another name node sees that no other node currently holds the lock znode it will try and acquire the lock. If it does this, then it becomes the active name-node.

Q: Who sends heartbeat to whom? How it detects namenode failure?

A: Again. ZooKeeper session.

Q: Where this process runs?

A: You can install ZooKeeper on a single machine or a cluster. You can read the docs here.

Q: To whom it notify for the transition?

A: This is all handled by ZKFailoverController process running on each machine.

There's another good article here, which visualises this a bit better than my words.

Petrolic answered 2/4, 2023 at 6:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.