Apache Zookeeper: distribution of nodes across data centers
Asked Answered
S

2

5

I am working on a brand new SolrCloud - ZooKeeper infrastructure.

Some background information:

  • all other services (mostly web site infrastructure) are distributed across two data centers, with active-active configurations.
  • at the network level, the servers are setup on extended LANs, with dark fibre across the data centers. So latency is at a minimum.
  • the SolrCloud - ZooKeeper infrastructure will be used by most of these applications.

I got a SolrCloud, and a ZooKeeper ensemble running. Implementation at this level is fine.

But I wonder how to distribute my ZooKeeper servers. I must have an odd number of servers, but I only have two data centers. If one fails, I have a 50-50 chance that I will lose majority.

What should I do? So far I have thought of:

  • requesting a third data center (not likely to happen, $$$!)

  • host two per data center and two on an external cloud provider (Amazon or ...?). Again $$$

  • set up an odd number at data center 1 and use an observer on site 2. What then happens if site 1 fails? Can SolrCloud work with only one observer?

Sonar answered 25/4, 2013 at 18:4 Comment(2)
Are you sure it's a good idea to spread a single SolrCloud cluster across 2 data centers? Is it a mirrored cluster?Stupefaction
can you please share your solrcloud deployment diagram ?Cause we have distributed solrcloud across 3 data center but we are struggling with latency.Terti
S
2

I got a third site to host the other ZooKeeper instance. This site is another office of my company, not a "full data center". So each site has one ZooKeeper instance.

What allowed me to have one cluster spread over three data centers was that they are close enough together to get a dark fiber between them. The latency is very low and does not impact ZooKeeper performance.

Then for Solr, I got full replicas on the two main data centers. The third office only hosts a ZooKeeper for quorum. Using full replicas, I have all the data in each data center. If my Solr needs to increase later, I will shard, but for now our index is small.

It has proven solid for four years now, with one failure. And it was at the third office, not in a data center.

Sonar answered 29/9, 2017 at 18:15 Comment(0)
T
3

If your requirement is to serve all search requests from a local data center (at which request was origin) then you don’t need to go for a cross data center ZooKeeper deployment.

Because a cross data center ZooKeeper deployment is only needed to survive a DC crash (it is most likely not going to happen, and that's why you pay $$$$), so in that case there isn't any need to spawn a ZooKeeper cluster in multiple data centers.

Terti answered 28/10, 2013 at 9:23 Comment(0)
S
2

I got a third site to host the other ZooKeeper instance. This site is another office of my company, not a "full data center". So each site has one ZooKeeper instance.

What allowed me to have one cluster spread over three data centers was that they are close enough together to get a dark fiber between them. The latency is very low and does not impact ZooKeeper performance.

Then for Solr, I got full replicas on the two main data centers. The third office only hosts a ZooKeeper for quorum. Using full replicas, I have all the data in each data center. If my Solr needs to increase later, I will shard, but for now our index is small.

It has proven solid for four years now, with one failure. And it was at the third office, not in a data center.

Sonar answered 29/9, 2017 at 18:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.