How to remove dead node out of the Cassandra cluster?
Asked Answered
R

2

11
  1. I have the cassandra cluster of 12 nodes on EC2.
  2. Because of some failure we lost one of the node completely.I mean that machine do not exist anymore.
  3. So i have created the new EC2 instance with different ip and same token as that of the dead node and i also had the backup of data on that node so it works fine
  4. But the problem is the dead nodes ip still appears as a unreachable node in describe cluster.
  5. As that node (EC2 instance) does not exist anymore I can not use the nodetool decommission or nodetool disablegossip

How can i get rid of this unreachable node

Roede answered 21/12, 2011 at 12:36 Comment(0)
F
7

Normally when replacing a node you want to set the new node's token to (failure node's token) - 1 and let it bootstrap. As of 1.0 there is now a flag you can specify on startup to replace a dead node: "cassandra.replace_token=".

Since you have already added the new node with the same token there's an extra step:

  1. Move the new node's token to (failure node's token) - 1 using nodetool move
  2. Run nodetool removetoken <failed node's token> from one of the up nodes
  3. Run nodetool cleanup on each node

These are basically the pre 1.0 instructions for replacing a dead node with the additional token move.

Fount answered 21/12, 2011 at 17:12 Comment(6)
Thank you Psanford . But in my case I have already started the new node with the same token as the dead node had.Now the ring is fine and balanced.But in describe cluster it shows the dead node as unreachable one. Its still fine for most of the cases but ,we load the data with sstableloader and with unreachable node sstableloader does not work. What could be the possible solution for this?Roede
You need to clear out the knowledge of the old node from the ring. You can do that with nodetool removetoken. Your problem is that would also remove the replacement node. So you need to move the replacement node's token to token-1 before you do the removetoken.Fount
I have tried the solution you have suggested.We have successfully moved the new node. But during the removal of dead node it is getting stucked saying :'RemovalStatus: Removing token (62676456546693435176060154681903071729). Waiting for replication confirmation from [cassandra-1/10.101.101.01'Roede
In that case you can run the command nodetool removetoken force which will tell Cassandra to not wait for confirmation from the dead node.Fount
When all of this is done you likely need to restart all of the nodes so as to remove stale gossip information. I had failed to do this and when I tried to do a truncate on a CF I no longer needed I got an error messages that not all of the nodes were up. Still inspite of the fact that nodetool ring showed them all up. Running nodetool gossipinfo showed that the cluster still retained some knowledge of the dead nodes. Restarting all nodes fixed this.Citarella
I was able to solve my problem with just doing the nodetool removenode <uuid of the node that died> and then the nodetool move <token of the new node with the wrong token id> and then the new node appeared in the right spot of the ring and then obtained the proper token ID. (DSE 5.2.4)Roundfaced
E
13

I had the same problem and I resolved it with removenode, which does not require you to find and change the node token.

First, get the node UUID:

nodetool status

DN  192.168.56.201  ?          256     13.1%  4fa4d101-d8d2-4de6-9ad7-a487e165c4ac  r1
DN  192.168.56.202  ?          256     12.6%  e11d219a-0b65-461e-babc-6485343568f8  r1
UN  192.168.2.91    156.04 KB  256     12.4%  e1a33ed4-d613-47a6-8b3b-325650a2bbd4  RAC1
UN  192.168.2.92    156.22 KB  256     13.6%  3a4a086c-36a6-4d69-8b61-864ff37d03c9  RAC1
UN  192.168.2.93    149.6 KB   256     11.3%  20decc72-8d0a-4c3b-8804-cc8bc98fa9e8  RAC1

As you can see the .201 and .202 are dead and on a different network. These have been changed to .91 and .92 without proper decommissioning and recommissioning. I was working on installing the network and made a few mistakes...

Second, remove the .201 with the following command:

nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac

(in older versions it was nodetool remove ...)

But just like for the nodetool removetoken ..., it blocks... (see comment by samarth in psandord answer) However, it has a side effect, it puts that UUID in a list of nodes to be removed. So next we can force the removal with:

nodetool removenode force

(in older versions it was nodetool remove ...)

Now the node accepts the command it tells me that it is removing the invalid entry:

RemovalStatus: Removing token (-9136982325337481102). Waiting for replication confirmation from [/192.168.2.91,/192.168.2.92].

We also see that it communicates with the two other nodes that are up and thus it takes a little time, but it is still quite fast.

Next a nodetool status does not show the .201 node. I repeat with .202 and now the status is clean.

After that you may also want to run a cleanup as mentioned in psanford answer:

nodetool cleanup

The cleanup should be run on all nodes, one by one, to make sure the change is fully taken in account.

Eloquence answered 31/10, 2014 at 22:53 Comment(4)
Hello, Please i have a question. nodetool status return only one node (with ip@ 127.0.0.1) Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 83.05 KB 256 100.0% 460ddcd9-1ee8-48b8-a618-c076056aad07 rack1Ashaashamed
How can i do to change ip address ? and how can i get the list of all nodes on the ring as you have done ? thanks a lot. Best regards.Ashaashamed
Search for a file named cassandra.yaml and inside the file, search for an entry named seed_provider. That has entries for IP addresses. Then look into the cassandra-rackdc.properties and cassandra-topology.properties files. There is also a cassandra-topology.yaml which I think is not used anymore in newer versions.Eloquence
Note that in Cassandra 2.0+, nodetool's "remove" command is now "removenode".Sheelah
F
7

Normally when replacing a node you want to set the new node's token to (failure node's token) - 1 and let it bootstrap. As of 1.0 there is now a flag you can specify on startup to replace a dead node: "cassandra.replace_token=".

Since you have already added the new node with the same token there's an extra step:

  1. Move the new node's token to (failure node's token) - 1 using nodetool move
  2. Run nodetool removetoken <failed node's token> from one of the up nodes
  3. Run nodetool cleanup on each node

These are basically the pre 1.0 instructions for replacing a dead node with the additional token move.

Fount answered 21/12, 2011 at 17:12 Comment(6)
Thank you Psanford . But in my case I have already started the new node with the same token as the dead node had.Now the ring is fine and balanced.But in describe cluster it shows the dead node as unreachable one. Its still fine for most of the cases but ,we load the data with sstableloader and with unreachable node sstableloader does not work. What could be the possible solution for this?Roede
You need to clear out the knowledge of the old node from the ring. You can do that with nodetool removetoken. Your problem is that would also remove the replacement node. So you need to move the replacement node's token to token-1 before you do the removetoken.Fount
I have tried the solution you have suggested.We have successfully moved the new node. But during the removal of dead node it is getting stucked saying :'RemovalStatus: Removing token (62676456546693435176060154681903071729). Waiting for replication confirmation from [cassandra-1/10.101.101.01'Roede
In that case you can run the command nodetool removetoken force which will tell Cassandra to not wait for confirmation from the dead node.Fount
When all of this is done you likely need to restart all of the nodes so as to remove stale gossip information. I had failed to do this and when I tried to do a truncate on a CF I no longer needed I got an error messages that not all of the nodes were up. Still inspite of the fact that nodetool ring showed them all up. Running nodetool gossipinfo showed that the cluster still retained some knowledge of the dead nodes. Restarting all nodes fixed this.Citarella
I was able to solve my problem with just doing the nodetool removenode <uuid of the node that died> and then the nodetool move <token of the new node with the wrong token id> and then the new node appeared in the right spot of the ring and then obtained the proper token ID. (DSE 5.2.4)Roundfaced

© 2022 - 2024 — McMap. All rights reserved.