Election of new zookeeper leader shuts down the Spark Master

About

Asked 22/6, 2015 at 17:3 Answered 26/7, 2016 at 9:20

I realized that the master spark becomes unresponsive when I kill the leader zookeeper (of course I assigned the leader election task to the zookeeper). The following is the error log that I see on Master Spark node. Do you have any suggestion to resolve it?

15/06/22 10:44:00 INFO ClientCnxn: Unable to read additional data from
> server sessionid 0x14dd82e22f70ef1, likely server has closed socket,
> closing socket connection and attempting reconnect 

15/06/22 10:44:00
> INFO ClientCnxn: Unable to read additional data from server sessionid
> 0x24dc5a319b40090, likely server has closed socket, closing socket
> connection and attempting reconnect 

15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED 

15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED 

15/06/22 10:44:01 WARN
> ConnectionStateManager: There are no ConnectionStateListeners
> registered. 

15/06/22 10:44:01 INFO ZooKeeperLeaderElectionAgent: We
> have lost leadership 

15/06/22 10:44:01 ERROR Master: Leadership has
> been revoked -- master shutting down.

Alverson answered 22/6, 2015 at 17:3 Comment(1)

What are your exact config parameters for spark.deploy.recoveryMode, spark.zookeeper.url ? do you launch with --supervise ? What's your cluster manager ? – Sesquicentennial 4/5, 2016 at 11:53

This is the expected behaviour. You have to set up 'n' number of masters and you need to specify the zookeeper url in all the master env.sh

SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"

Note that zookeeper maintains quorum. This means you need to have odd number of zookeepers and only when the quorum is maintained zookeeper cluster will be up. Since spark depends up on zookeeper it implies that spark cluster will not be up until zookeeper quorum is maintained.

When you set up two(n) masters and bring down a zookeeper the current master will go down and the new master will be elected and all the worker nodes will be attached to the new master.

You should have started your worker by giving

./start-slave.sh spark://master1:port1,master2:port2

You have to wait for 1-2 minutes!! to notice this failover.

Celio answered 26/7, 2016 at 9:20 Comment(3)

Is there any configuration using which it can restart the master which went down? Or we have to manually start it every time? – Aerodrome 16/10, 2019 at 6:25

master is elected automatically. It won't go down. – Celio 18/10, 2019 at 17:11

I meant restarting Spark master which went down due to zookeeper – Aerodrome 22/10, 2019 at 5:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags