Cassandra Compaction vs Repair vs Cleanup
Asked Answered
C

1

11

After posting a question and reading this and that articles, I still do not understand the relations between those three operations-

  • Cassandra compaction tasks
  • nodetool repair
  • nodetool cleanup

Is repair task can be processed while compaction task is running, or cleanup while compaction task is running? Is cleanup is a operation that need to be executed weekly as repair? Why repair operation need to be executed manually and it is not in Cassandra default behavior?

What is the ground rules for healthy cluster maintenance?

Carencarena answered 7/6, 2016 at 16:22 Comment(0)
S
21

A cleanup is a compaction that just removes things outside the nodes token range(s). A repair has a "Validation Compaction" to build a merkle tree to compare with the other nodes, so part of nodetool repair will have a compaction.

Is repair task can be processed while compaction task is running, or cleanup while compaction task is running?

There is a shared pool of for the compactions across normal compactions, repairs, cleanups, scrubs etc. This is the concurrent_compactors setting in the cassandra.yaml that defaults to a combination of the number of cores and data directories: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L572

Is cleanup is a operation that need to be executed weekly as repair?

no, only after topology changes really.

Why repair operation need to be executed manually and it is not in Cassandra default behavior?

Its manual because its requirements can differ a lot on what your data and gc_grace requirements are. https://issues.apache.org/jira/browse/CASSANDRA-10070 is bringing it inside Cassandra though so in the future it will be automatic.

What is the ground rules for healthy cluster maintenance?

I would (opinion) say:

  • Regular backups (depending on requirements, and acceptable data loss this can be anything from weekly/daily to constantly with incremental).
    • This is just as much for "internal" mistakes ("Opps i deleted a customer") as outages. Even with strong multi-dc replication you want some minimum backups.
  • Making sure a Repair completes for all tables that have deletes at least once within the gc_grace time of those tables.
  • Metric and log storage pretty important if you want to be able to debug issues.
Schneider answered 7/6, 2016 at 17:30 Comment(2)
Good summary! Thank you.Carencarena
Is it safe to perform cleanup without repair after adding a node to the cluster?Eternal

© 2022 - 2024 — McMap. All rights reserved.