How to solve redis cluster "Waiting for the cluster to join" issue?
Asked Answered
D

6

16

I have 3 machines and create 6 nodes for redis cluster, i have created it successfully months ago, but it dropped now, i try my best to fix it, but it not work, so i clean all data and re-create it from zero,when i use following command to create cluster, it block here, and waiting node join cluster, i do some research for it, i clean my data, log again and again, do it again and again, but it still not work.

redis-trib.rb create --replicas 1 10.2.1.208:6379 10.2.1.208:6380 10.2.1.209:6379 10.2.1.209:6380 10.2.1.15:6379 10.2.1.15:6380

show the result

redis-trib.rb create --replicas 1 10.2.1.208:6379 10.2.1.208:6380 10.2.1.209:6379 10.2.1.209:6380 10.2.1.15:6379 10.2.1.15:6380
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
10.2.1.208:6379
10.2.1.209:6379
10.2.1.15:6379
Adding replica 10.2.1.209:6380 to 10.2.1.208:6379
Adding replica 10.2.1.208:6380 to 10.2.1.209:6379
Adding replica 10.2.1.15:6380 to 10.2.1.15:6379
M: 73b3b99bb17de63aa99eaf592376f0a06feb3d66 10.2.1.208:6379
   slots:0-5460 (5461 slots) master
S: 05b33ed6691797faaf7ccec1541396472b9d2866 10.2.1.208:6380
   replicates f14702ebb1462b313dd7eb4809ec50e30e4eef36
M: f14702ebb1462b313dd7eb4809ec50e30e4eef36 10.2.1.209:6379
   slots:5461-10922 (5462 slots) master
S: 3a9f433a8503281b0ddfc6ec69016908735053b8 10.2.1.209:6380
   replicates 73b3b99bb17de63aa99eaf592376f0a06feb3d66
M: 2fd97e8842828dba6b425b6a30e764fb06915737 10.2.1.15:6379
   slots:10923-16383 (5461 slots) master
S: c46db592d49bc1e9d8b5efb27b9799929c5186a4 10.2.1.15:6380
   replicates 2fd97e8842828dba6b425b6a30e764fb06915737
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...........................................................................^C/usr/local/bin/redis-trib.rb:652:in `sleep': Interrupt
        from /usr/local/bin/redis-trib.rb:652:in `wait_cluster_join'
        from /usr/local/bin/redis-trib.rb:1305:in `create_cluster_cmd'
        from /usr/local/bin/redis-trib.rb:1695:in `<main>'
Downstate answered 19/9, 2016 at 8:27 Comment(1)
I confirm this is an issue with 5.0.4 (Ubuntu 16.04 LTS). Just tried to create a 6-node cluster with 3 servers, the message appeared and redis hung.Tyeshatyg
D
20

From the cluster tutorial on the official Redis website:

Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, for example 6379, plus the port obtained by adding 10000 to the data port, so 16379 in the example.

This second high port is used for the Cluster bus, that is a node-to-node communication channel using a binary protocol. The Cluster bus is used by nodes for failure detection, configuration update, failover authorization and so forth. Clients should never try to communicate with the cluster bus port, but always with the normal Redis command port, however make sure you open both ports in your firewall, otherwise Redis cluster nodes will be not able to communicate.

The command port and cluster bus port offset is fixed and is always 10000.

I used AWS, but didn't open ports 16379 and 16380, which were the ones causing this issue.

Downstate answered 20/9, 2016 at 4:10 Comment(1)
Hi, I am facing the same issue. Is this second high port is tcp port or binary port? I have used firewall-cmd command to open this high port. e.g firewall-cmd --permanent --add-port=16379/tcp. I am still getting same issue even after opening all high ports on firewall. Is there anything I am missing.Scorch
U
13

If there is no firewall problem between these 6 nodes, you may check bind setting in redis.conf.

You should bind the redis service on LAN IP, of course, but one more thing:

Delete 127.0.0.1 or move 127.0.0.1 to the end after LAN IP!

Just like this: bind 10.2.1.x 127.0.0.1 or bind 10.2.1.x

I met this issue when I creating a cluster between 3 nodes on 3 servers, waiting for cluster to join forever. This is a bug in redis maybe, at least in Redis 5.0, when you put 127.0.0.1 at front of LAN IP.

Upholsterer answered 3/11, 2018 at 3:31 Comment(2)
GREAT FIND HERE!!Berbera
This solution worked for me. My redis version is 5.0.9, so this is a bug in RedisScorch
T
3

You may also see this issue if you use 127.0.0.1 as the hostname instead of using the IP address. In that case, you would need to change it to use the ip address as the hostname. https://mcmap.net/q/748180/-redis3-cluster-infinite-waiting-for-the-cluster-to-join

Tintometer answered 2/2, 2021 at 11:40 Comment(2)
This was my problem - thanks!Junto
Now I didn't use 127.0.0.1 and it worked like a charm! Thanks.Selfstyled
D
1

Neither of these answers worked for me, but I found the following blog that helped:

https://linux.m2osw.com/redis-infamous-waiting-cluster-join-message

The problem was because I created an original server and cloned it to generate the other two nodes. The cloned nodes were using the same Node Id and redis did not like that.

The solution is to stop the redis server, then remove the nodes.conf file, for which the actual name is defined in your redis.conf file. Mine was actually called nodes-6379.conf. Then restart the redis server. Do this on all the nodes.

Disobedience answered 27/1, 2021 at 23:44 Comment(0)
M
0

That's can be because of copying VMs/containers with preinstalled Redis. When you installing Redis it creates config with pre-created IDs and the same configuration appears:

# cat /etc/redis/redis.conf
...
cluster-config-file nodes-6379.conf
...

You can check it and remove if that's the case:

# rm /var/lib/redis/nodes-6379.conf

And after restart redis. These paths are for Ubuntu. Link to post which helps me https://linux.m2osw.com/redis-infamous-waiting-cluster-join-message.

Morna answered 2/1, 2022 at 23:0 Comment(0)
D
0

Set cluster-announce-ip in redis.config to the same value of bind.

I tried every possible suggestion here, but that was what finally worked for me in my kubernetes / docker setup.

Indeed, under redis.conf section I found:

########################## CLUSTER DOCKER/NAT support  ########################

> # In certain deployments, Redis Cluster nodes address discovery fails, because
> # addresses are NAT-ted or because ports are forwarded (the typical case is
> # Docker and other containers).
> #
> # In order to make Redis Cluster working in such environments, a static
> # configuration where each node knows its public address is needed. The
> # following four options are used for this scope, and are:
> #
> # * cluster-announce-ip ...
Discrete answered 8/5, 2022 at 12:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.