I am working on a brand new SolrCloud - ZooKeeper infrastructure.
Some background information:
- all other services (mostly web site infrastructure) are distributed across two data centers, with active-active configurations.
- at the network level, the servers are setup on extended LANs, with dark fibre across the data centers. So latency is at a minimum.
- the SolrCloud - ZooKeeper infrastructure will be used by most of these applications.
I got a SolrCloud, and a ZooKeeper ensemble running. Implementation at this level is fine.
But I wonder how to distribute my ZooKeeper servers. I must have an odd number of servers, but I only have two data centers. If one fails, I have a 50-50 chance that I will lose majority.
What should I do? So far I have thought of:
requesting a third data center (not likely to happen, $$$!)
host two per data center and two on an external cloud provider (Amazon or ...?). Again $$$
set up an odd number at data center 1 and use an observer on site 2. What then happens if site 1 fails? Can SolrCloud work with only one observer?