Cassandra is configured to lose 10 seconds of data by default?

Asked 24/6, 2015 at 16:34 Answered 24/6, 2015 at 18:9

Solved cassandra data-integrity scylla data-loss durability

As the data in the Commitlog is flushed to the disk periodically after every 10 seconds by default (controlled by commitlog_sync_period_in_ms), so if all replicas crash within 10 seconds, will I lose all that data? Does it mean that, theoretically, a Cassandra Cluster can lose data?

Glycolysis answered 24/6, 2015 at 16:34 Comment(1)

Data is not sent to the memtable first! First it's appended to the commitlog and then it's stored in the memtable and then the ack is sent. Check the insert trace: datastax.com/dev/blog/tracing-in-cassandra-1-2 – Aspergillum 12/4, 2018 at 7:41

If a node crashed right before updating the commit log on disk, then yes, you could lose up to ten seconds of data.

If you keep multiple replicas, by using a replication factor higher than 1 or have multiple data centers, then much of the lost data would be on other nodes, and would be recovered on the crashed node when it was repaired.

Also the commit log may be written in less than ten seconds it the write volume is high enough to hit size limits before the ten seconds.

If you want more durability than this (at the cost of higher latency), then you can change the commitlog_sync setting from periodic to batch. In batch mode it uses the commitlog_sync_batch_window_in_ms setting to control how often batches of writes are written to disk. In batch mode the writes are not acked until written to disk.

The ten second default for periodic mode is designed for spinning disks, since they are so slow there is a performance hit if you block acks waiting for commit log writes. For this reason if you use batch mode, they recommend a dedicated disk for the commit log so that the write head doesn't need to do any seeks to keep the added latency as low as possible.

If you are using SSDs, then you can use more aggressive timing since the latency is greatly reduced compared to a spinning disk.

Razzia answered 24/6, 2015 at 18:9 Comment(4)

As far as i understand commit log is already on the disk, so even if a node crashes in under 10 secs and the restarts, shouldn't it replay everything from commit log and recover the data ? – Fimble 16/12, 2016 at 6:21

@Fimble The data gets written to the disk only after every 10 seconds. Therefore, "you can potentially lose up to that much data if all replicas crash within that window of time." Please check out: wiki.apache.org/cassandra/Durability – Aspergillum 12/4, 2018 at 7:50

"written to disk" should be "fsynced" everywhere. Commit log writes happen before the mutation is completed, but the commitlog itself is not fsynced on every write. The data loss is only expected if the crash is a machine level crash, not a process level one. – Microgroove 20/8, 2018 at 8:19

Looks like mongodb-is-web-scale.com applies to Cassandra too – Olnay 22/10, 2018 at 21:39

Cassandra's default configuration sets the commitlog_sync mode to periodic, causing the commit log to be synced every commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to that much data if all replicas crash within that window of time.

Chimene answered 24/6, 2015 at 17:49 Comment(0)

Recommended topics

Hot tags