Watches and Ephemeral node doesn't work when state of zookeeper changes automatically?
Asked Answered
B

2

10

I have a very strange case with Python Kazoo library. What I am doing in my below code is -

As soon as I connect to Zookeeper using kazoo library, I create an ephemeral node and then keep a watch on some other node and then I keep on running the program forever in an infinite loop.. I have also added a listener to Zookeeper as well which will monitor the state as well.

Everything is working perfectly fine for me, ephemeral node is up, watch on my znode is also working fine...

Sometimes, I am seeing pretty weird behaviour because of connection interruptions or drop. As I mentioned above, I have added a listener to zookeeper which will monitor the state and I have a print statement as well.. I always see, those print statement getting printed out as Lost, Suspended , Connected, I believe because of connection interruptions and after that my ephemeral nodes dies up and my watch on the znode doesn't work either as well.

Below is my code which runs forever -

#!/usr/bin/python

from kazoo.client import KazooClient
from kazoo.client import KazooState
from kazoo.protocol.states import EventType


def watch_host(event):
    print event


def my_listener(state):
    if state == KazooState.LOST:
    # Register somewhere that the session was lost
        print "Lost"
    elif state == KazooState.SUSPENDED:
    # Handle being disconnected from Zookeeper
        print "Suspended"
    else:
    # Handle being connected/reconnected to Zookeeper
    # what are we supposed to do here?
    print "Being Connected/Reconnected"


zk = KazooClient(hosts='127.0.0.1:2181')
zk.start()

zk.add_listener(my_listener)

# start an ephemeral node
zk.create("/my/example/h0", b"some value", None, True)

# put a watch on my znode
children = zk.get_children("/my/example/test1", watch=watch_host)


while True:
    time.sleep(5)

Is there any way to overcome this problem? I want that whenever my Zookeeper state changes to Lost or Suspended or Connected. I want to have my ephemeral node up by creating it again (if this is the right approach) and my watch on the znode also be working as well always.

Because I will be running my program forever so for whatever reason if the Zookeeper state changes due to connection interruptions and it gets connected back again automatically, then I need to make sure my ephemeral node is also up and my watches on the znode also start working automatically..

Currently my ephemeral dies up and watches also doesn't work if the state is changing automatically..

Any idea how to overcome this problem?

Binion answered 24/11, 2013 at 5:25 Comment(2)
It's strange but Lost -> 'Suspended' transition seems to be invalid. Is it correct transitions sequence that you see Lost -> Suspended -> Connected?Faucher
Couldn't you create the ephemeral node in the branch of my_listener where you know you've been connected or reconnected? It sounds like you want to create the ephemeral node every time you become connected/reconnected, not just on the initial connection - but your code only does the latter.Haemostasis
P
7

Here is the thing, when there is a state change in the connection, your watcher will also get triggered. There is an Event that is given to the Watcher. It can be something like nodeDataChanged or nodeChildrenChanged. However, since it would be impossible to be notified of an event you're interested in when your session is terminated or there is a connection issue, your watcher will get notified of these session issues. I believe the event type for this is "None."

From http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches

Things to Remember about Watches

  • Watches are one time triggers; if you get a watch event and you want to get notified of future changes, you must set another watch.
  • Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the event and setting the watch again. (You may not care, but at least realize it may happen.)
  • A watch object, or function/context pair, will only be triggered once for a given notification. For example, if the same watch object is registered for an exists and a getData call for the same file and that file is then deleted, the watch object would only be invoked once with the deletion notification for the file.
  • When you disconnect from a server (for example, when the server fails), you will not get any watches until the connection is reestablished. For this reason session events are sent to all outstanding watch handlers. Use session events to go into a safe mode: you will not be receiving events while disconnected, so your process should act conservatively in that mode.

So, long story short, your watcher should crack open the event to see what kind it is and respond appropriately to the None type by going into some kind of failover mode.

What I usually do is my Watcher objects are also listeners. When the reconnection happens, I respond by resetting my watches, making sure to check if the appropriate znodes are present and creating them when necessary.

Procrustean answered 24/1, 2014 at 1:38 Comment(0)
H
5

I don't know anything about Python but I think I will highlight some basic points about ZNodes
Znodes are of two types: ephemeral or persistent

  • An ephemeral znode is deleted by ZooKeeper as soon as creating client’s session ends.
    A persistent znode once created is only deleted when explicitly deleted by a client (not necessarily the one that created it).
  • An ephemeral znode will never have any children associated with, not even ephemeral ones.
  • Watchers on node are triggered only once (in Java API version) so you need to re-register the event so that you get triggers for future updates on the nodes

In Java version (Java API), if client is connected to more then one server, and if it disconnects from the connected server then we get event triggered with KeeperState.Disconnected but it reattempts and connects to another server, in between this time ephemeral znode and all the watches are intact i.e. they are not destroyed but once event with KeeperState.Expired is invoked (when the client is unable to establish connection with any of servers in specified time) then ephemeral znode gets destroyed and we have to create a new client connection (instantiate a new ZooKeeper instance) if we want access the ensemble and then re-establish everything i.e. node creation and adding watches.

So I think in your case also this might be applicable, as mentioned in Understanding Kazoo States Section

When a connection transitions to LOST, any ephemeral nodes that have been created will be removed by Zookeeper. This affects all recipes that create ephemeral nodes, such as the Lock recipe. Lock’s will need to be re-acquired after the state transitions to CONNECTED again. This transition occurs when a session expires or when you stop the clients connection.

Hope this info helps you understanding various states and when to re-configure everything again.

Holliholliday answered 23/1, 2014 at 12:44 Comment(1)
How much time is between Disconnected and Expired events?Montgolfier

© 2022 - 2024 — McMap. All rights reserved.