zookeeper installation on multiple AWS EC2instances
Asked Answered
R

2

5

I am new to zookeeper and aws EC2. I am trying to install zookeeper on 3 ec2 instances.

as per zookeeper document, I have installed zookeeper on all 3 instances, created zoo.conf and add below configuration:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=localhost:2888:3888
server.2=<public ip of ec2 instance 2>:2889:3889
server.3=<public ip of ec2 instance 3>:2890:3890

also I have created myid file on all 3 instances as /opt/zookeeper/data/myid as per guideline..

I have couple of queries as below:

  1. whenever I am starting zookeeper server on each instance, it will start in standalone mode.(as per logs)

  2. can above configuration is really gonna connect to each other? port 2889:3889 & 2890:38900 - what these port all about. can I need to configure it on ec2 machine or I need to give some other port against it?

  3. Is I need to create security group to open these connection? I am not sure how to do it in ec2 instance.

  4. How to confirm all 3 zookeeper has started and they can communicate with each other?

Ramos answered 28/3, 2015 at 22:23 Comment(0)
S
10

The ZooKeeper configuration is designed such that you can install the exact same configuration file on all servers in the cluster without modification. This makes ops a bit simpler. The component that specifies the configuration for the local node is the myid file.

The configuration you've defined is not one that can be shared across all servers. All of the servers in your server list should be binding to a private IP address that is accessible to other nodes in the network. You're seeing your server start in standalone mode because you're binding to localhost. So, the problem is the other servers in the cluster can't see localhost.

Your configuration should look more like:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=<private ip of ec2 instance 1>:2888:3888
server.2=<private ip of ec2 instance 2>:2888:3888
server.3=<private ip of ec2 instance 3>:2888:3888

The two ports listed in each server definition are respectively the quorum and election ports used by ZooKeeper nodes to communicate with one another internally. There's usually no need to modify these ports, and you should try to keep them the same across servers for consistency.

Additionally, as I said you should be able to share that exact same configuration file across all instances. The only thing that should have to change is the myid file.

You probably will need to create a security group and open up the client port to be available for clients and the quorum/election ports to be accessible by other ZooKeeper servers.

Finally, you might want to look in to a UI to help manage the cluster. Netflix makes a decent UI that will give you a view of your cluster and also help with cleaning up old logs and storing snapshots to S3 (ZooKeeper takes snapshots but does not delete old transaction logs, so your disk will eventually fill up if they're not properly removed). But once it's configured correctly, you should be able to see the ZooKeeper servers connecting to each other in the logs as well.

EDIT

@czerasz notes that starting from version 3.4.0 you can use the autopurge.snapRetainCount and autopurge.purgeInterval directives to keep your snapshots clean.

@chomp notes that some users have had to use 0.0.0.0 for the local server IP to get the ZooKeeper configuration to work on EC2. In other words, replace <private ip of ec2 instance 1> with 0.0.0.0 in the configuration file on instance 1. This is counter to the way ZooKeeper configuration files are designed but may be necessary on EC2.

Spleeny answered 28/3, 2015 at 23:54 Comment(6)
Starting from version 3.4.0 you can use the autopurge.snapRetainCount and autopurge.purgeInterval directives to keep your snapshots clean.Octopus
Instead of puting the private IP of the machine, in the zoo.cfg file of the instance 1 (for example), it should be "0.0.0.0".Mysticism
@Mysticism are you saying if this were the configuration file for server.1 you could put 0.0.0.0 for that specific entry in the file? ZooKeeper configuration files are designed so that they can be duplicated across machines without being edited, which is why the myid file is separate. We can't put 0.0.0.0 as the IP of all the servers since you can't reference a remote server that way. Thus, the configuration file should have real IPs.Spleeny
That is true in a "normal" Zookeper environment, but Zookeeper doesn't work in AWS without doing that, at least for me didn't. Look at Look at this #30941481 for a better explanation, all i know is that is working for me in that way ! ;=)Mysticism
I haven't had any issues with this that I can recall, but interesting to know! Will update my answer.Spleeny
I would probably recommend not using IPs at all, but instead a link to an ELB in case those instances ever need to be recreated in the future. Amazon makes no guarantee that you'll be assigned the same IP when recreating instances.Antepenult
S
1

Adding additional info regarding Zookeeper clustering inside Amazon's VPC.

Solution with VPC's public IP addres should be preferable solution since Zookeeper and using '0.0.0.0' should be your last option. In case when you are using docker in your EC2 instance '0.0.0.0' will not work properly with Zookeeper 3.5.X after node restart.

The issue lies in resolving '0.0.0.0' and ensemble sharing of node addresses and SID order (if you will start your nodes in descending order, this issue may not occur).

So far the only working solution is to upgrade to 3.6.2+ version.

Supranatural answered 1/3, 2021 at 13:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.