What's the difference between ZooKeeper and any distributed Key-Value stores?
Asked Answered
W

2

22

I am new to zookeeper and distributed systems, and am learning it myself.

From what I understand for now, it seems that ZooKeeper is simply a key-value store whose keys are paths and values are strings, which is nothing different from, say, Redis. (And apparently we can use slash-separated path as keys in redis as well.)

So my question is, what is the essential difference between ZooKeeper and other distributed KV store? Why is ZooKeeper using so called "paths" as keys, instead of simple strings?

Way answered 16/7, 2015 at 17:34 Comment(1)
terrific question!Fulvous
A
26

You're comparing the high-level data model of ZooKeeper to other key value stores, but that's not what makes it unique. From a distributed systems standpoint, ZooKeeper is different than many other key value stores (especially Redis) because it is strongly consistent and can tolerate failures while a majority of the cluster is connected. Additionally, while data is held in memory, it's synchronously replicated to a majority of the cluster and backed by disk, so once a write succeeds, it guarantees that write will not be lost (barring a missile strike). This makes ZooKeeper very useful for storing small amounts of mission critical state like configurations.

Conversely, Redis is not a distributed system and does not provide the same sorts of guarantees that ZooKeeper does. Many other key value stores that are distributed are "eventually consistent." In other words, there's no guarantee that once a value is written all other processes in a distributed system can see that value.

Finally, in addition to the file-system-like interface for storing state, ZooKeeper provides fairly low-level features on which more complex problems can be solved. For examples of this, look at Apache Curator. Curator uses ZooKeeper's ephemeral nodes (nodes that disappear when the client that created them disconnects) to build things like locks and leader elections which are extremely useful for coordinating distributed systems. So, from that perspective, ZooKeeper's data model and associated features serve as primitives on which higher level tools for distributed coordination can be built.

Arleanarlee answered 16/7, 2015 at 18:0 Comment(0)
O
7

You can compare zookeeper with other distributed key-value store such as etcd and consul. These tools are also offering the same benefits of apache zookeeper. The main advantage of zookeeper is that it takes care of avoiding deadlock and race condition in a distributed applications. Zookeeper is not only a key-value store, It can be also used for service discovery and centralised service for maintaining configuration information in a distributed application.

The way zookeeper store its key-value pair is bit different than other key-value stores, Zookeeper uses z-node as a key. It looks like a unix filesystem tree and it starts with slash(/) It may be persistent or ephemeral. This key-value is served through RAM. Each node has its own ACL. Zookeeper stores transaction log and snapshot for recovering node in case of disaster, It is designed to behave as a fault tolerant and distributed k-v store, So it should be deployed as a cluster. A group of zookeeper server is called zookeeper ensemble. Here there is one zookeeper leader server and the remaining ones are the followers. This leader and follower relationship is derived from the leadership election between zk servers in a cluster.

Zookeeper is mainly used in HA implementation of Hadoop Namenode and YARN Resource manager here it takes care of promoting the active and standby status of these daemons, Kafka is designed to use Zookeeper for storing the topic and offset information.

Zookeeper can also be used as an alternative for etcd in kubernetes control plane.

Otey answered 10/12, 2018 at 5:41 Comment(2)
1# You say: "Zookeeper is not only a key-value store, It can be also used for service discovery and centralised service for maintaining configuration information in a distributed application." but how is that different from etcd? My understanding is that etcd can do this as well.Nodose
#2 You also say: "Zookeeper stores transaction log and snapshot for recovering node in case of disaster, It is designed to behave as a fault tolerant and distributed k-v store, So it should be deployed as a cluster." but isn't etcd also fault tolerant? And doesn't etcd also store data (the log) to disk?Nodose

© 2022 - 2024 — McMap. All rights reserved.