Does cassandra flush memtables on nodetool stopdaemon. If not what to do to avoid data loss
Asked Answered
T

2

5

I am using apache-cassandra-3.10

I understand instead of kill -9 pid, the only way to stop cassandra gracefully is nodetool stopdaemon.

But I want to know if nodetool stopdaemon also flushes the data in the memtables to sstables before shutdown.

If it does not flush then it would lead to data loss, when I stop the node using nodetool stopdaemon.

Also after researching on this , I read about the DURABLE_WRITES. What does durable write actually do ?

Also , the datastax documentation states under the section Setting DURABLE_WRITES "Do not set this attribute on a keyspace using the SimpleStrategy"

reference : https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_keyspace_r.html

What if my keyspace is configured with Simple Strategy , I still cannot benefit with DURABLE_WRITES in case it can help with data loss on shutdown ?

Is manually running nodetool flush before shutdown, the only way to make sure we do not lose data on shutdown ?

I read from https://issues.apache.org/jira/browse/CASSANDRA-3564 that the functionality to flush at shutdown has not been added.

Also there is a open ticket on the same issue https://issues.apache.org/jira/browse/CASSANDRA-12001

Intention is to avoid any data loss at shut down using nodetool stopdaemon. Basically flush all tables before shutdown , Considering Simple-strategy in use.

Teen answered 21/3, 2017 at 11:49 Comment(0)
C
4

nodetool drain will suffice.
From Datastax Documentation about nodeool drain,

Flushes all memtables from the node to SSTables on disk. Cassandra stops listening for connections from the client and other nodes. You need to restart Cassandra after running nodetool drain.
link: nodetool drain

Then you can either kill or run nodetool stopdaemon.

Covenanter answered 22/3, 2017 at 5:35 Comment(2)
The documentation also say's " You typically use this command before upgrading a node to a new version of Cassandra". So why use nodetool drain ?Teen
drain does two things, it doesn't just flush the tables to disk, it also stops new connections, so the flush can complete without new data coming in.Hon
F
3

Cassandra is very robust and crash-safe. Even if you kill/stop daemon you might not have data loss. But if you do safe shutdown, then you can save startup time for the Cassandra.

Follow the below steps to safe shutdown:

  1. nodetool disablegossip
  2. nodetool disablethrift
  3. nodetool disablebinary (In case of Cassandra 2.0 and above)
  4. nodetool drain

disabling gossip stops the communication to the other nodes, disabling thrift and binary stops communication with the clients.

Finally drain flushes all the tables.

Now stop Cassandra either by kill or stop daemon

Foetid answered 21/3, 2017 at 13:2 Comment(8)
So basically your sure that Cassandra will not flush everything in memory to disk , if we do not do a manual flush/drain ?Teen
When you write data, Cassandra will store data in commitlog(for faster access). It will not have in memory data, when you flush/drain the commitlog data is written to db. That's why when you kill the process also the data is not lost.Foetid
But there are is memtable in between commit log and sstable. I am concerned about the data held in memtable. Because the data in memtables is not shadowed in commitlog. Say if the memtable_cleanup_threshold is not yet reached and we are shutting down the node , then the data which is not flushed present in memtable is lost , Even If I do flush/drain I get back data which is there is commit log only.Teen
docs.datastax.com/en/cassandra/3.0/cassandra/configuration/… check "memtable_cleanup_threshold"Teen
Yes you are correct, there is a memtable, But Cassandra writes data in commitlog and memtable simultaneously. When the Memtable limit is reached, then the data is flushed and corresponding data in commitlog is purged. The commit log thus recovers the data in memtable in the event of a failure. For more information see how it works teddyma.gitbooks.io/learncassandra/content/model/….Foetid
Thanks for the source. I read that. So does that mean there is no way that in case of a shut down or failure there could be data loss ? Also I hope the content in the link you shared is up to date.Teen
Yes, that's why Cassandra is highly reliable.Foetid
Actually commitlog would be persisted to disk every 10 seconds, so there is possibility to lose data that been inserted in last 0 seconds which is not persisted neither by commilog nor sstable, although if RF and CL > 1, another copy of data kept on another nodeNomination

© 2022 - 2024 — McMap. All rights reserved.