From the documentation:
Using the nodetool repair -pr (–partitioner-range) option repairs only the primary range for that node, the other replicas for that range still have to perform the Merkle tree calculation, causing a validation compaction. Because all the replicas are compacting at the same time, all the nodes may be slow to respond for that portion of the data.
There is probably never a time where I can accept all nodes to be slow for a certain portion of the data. But I wonder: Why does it do that (or is there maybe just a mixup with the "-par" option in the documentation?!), when nodetool repair
seems to be smarter:
By default, the repair command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots. For example, if you have RF=3 and A, B and C represents three replicas, this command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots (A<->B, A<->C, B<->C) instead of repairing A, B, and C all at once. This allows the dynamic snitch to maintain performance for your application via the other replicas, because at least one replica in the snapshot is not undergoing repair.
However, the datastax blog addresses this issue:
This first phase can be intensive on disk io, however. You can mitigate this to some degree with compaction throttling (since this phase is what we call a validation compaction.) Sometimes that isn’t enough though, and some people try to mitigate this further by using the -pr (–partitioner-range) option to nodetool repair, which repairs only the primary range for that node. Unfortunately, the other replicas for that range will still have to perform the Merkle tree calculation, causing a validation compaction. This can be a problem, since all the replicas will be doing it at the same time, possibly making them all slow to respond for that portion of your data. Fortunately, there is way around this by using the -snapshot option.
That could be nice, but actually, there is no -snapshot
option for nodetool repair
(see the manpage, or the documentation) (has this option been removed?!)
So overall,
- I cannot use
nodetool repair -pr
, it seems, because I always need at least to keep the system responsive enough to read/write with consistency ONE, without significant delay. (Note: We have only one data center.) Or am I missing/misunderstanding something? - Why is
nodetool repair
smart, keeping one node responsive, whilenodetool repair -pr
makes all nodes slow for a portion of data? - Where is the
-snapshot
option: Has it been removed, never implemented, or does it now maybe automatically work like that, also when usingnodetool repair -pr
?
-snapshot
option was made the default behavior in 2.0, while the-par
option was added to allow access to the previous behavior (which is still useful if you can accept the hit to performance and/or want the repair to complete faster. – Repugn