Clarifications about nodetool repair -pr
Asked Answered
B

1

9

From the documentation:

Using the nodetool repair -pr (–partitioner-range) option repairs only the primary range for that node, the other replicas for that range still have to perform the Merkle tree calculation, causing a validation compaction. Because all the replicas are compacting at the same time, all the nodes may be slow to respond for that portion of the data.

There is probably never a time where I can accept all nodes to be slow for a certain portion of the data. But I wonder: Why does it do that (or is there maybe just a mixup with the "-par" option in the documentation?!), when nodetool repair seems to be smarter:

By default, the repair command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots. For example, if you have RF=3 and A, B and C represents three replicas, this command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots (A<->B, A<->C, B<->C) instead of repairing A, B, and C all at once. This allows the dynamic snitch to maintain performance for your application via the other replicas, because at least one replica in the snapshot is not undergoing repair.

However, the datastax blog addresses this issue:

This first phase can be intensive on disk io, however. You can mitigate this to some degree with compaction throttling (since this phase is what we call a validation compaction.) Sometimes that isn’t enough though, and some people try to mitigate this further by using the -pr (–partitioner-range) option to nodetool repair, which repairs only the primary range for that node. Unfortunately, the other replicas for that range will still have to perform the Merkle tree calculation, causing a validation compaction. This can be a problem, since all the replicas will be doing it at the same time, possibly making them all slow to respond for that portion of your data. Fortunately, there is way around this by using the -snapshot option.

That could be nice, but actually, there is no -snapshot option for nodetool repair (see the manpage, or the documentation) (has this option been removed?!)

So overall,

  • I cannot use nodetool repair -pr, it seems, because I always need at least to keep the system responsive enough to read/write with consistency ONE, without significant delay. (Note: We have only one data center.) Or am I missing/misunderstanding something?
  • Why is nodetool repair smart, keeping one node responsive, while nodetool repair -pr makes all nodes slow for a portion of data?
  • Where is the -snapshot option: Has it been removed, never implemented, or does it now maybe automatically work like that, also when using nodetool repair -pr?
Bed answered 24/9, 2014 at 9:23 Comment(6)
Seems safe to assume there was a snapshot option at the time the blog was written (July 2013) because Cassandra 1.2 docs talk about a nodetool repair -snapshot option: datastax.com/documentation/cassandra/1.2/cassandra/operations/….Photocurrent
This doc has some nodetool usage info (-pr option not recommended) and includes examples of using the parallel (par) option. datastax.com/documentation/cassandra/2.1/cassandra/tools/…Photocurrent
@catpaws: Thanks, both your comments contain some very valuable information (which is in fact not in the 2.0 documentation)!Bed
I think the -snapshot option was made the default behavior in 2.0, while the -par option was added to allow access to the previous behavior (which is still useful if you can accept the hit to performance and/or want the repair to complete faster.Repugn
@Photocurrent - it's odd that this doc which is linked from the doc you cite, says that -pr option is recommended for routine maintenance.Chopine
@Chopine it's been almost a couple of years since that post, and docs do change, but in this case I think you clicked the wrong link? On that page, I'm still seeing "Performing partitioner range repairs by using the -pr option is generally not recommended."Photocurrent
H
5

The blog below addresses these issues:

http://www.datastax.com/dev/blog/repair-in-cassandra

A simple nodetool repair will not only kick off a repair on the node itself but also all the nodes that hold replicas if its ranges. While this is ok, it is very expensive and typically not an operation you'll carry out on a busy production system during peak times.

Consequently nodetool repair -pr will carry out a repair of the primary ranges on that node. You will need to run this on every node of the cluster as the blog says. Customers with large production systems will typically use this in a rolling fashion across their cluster.

On another note Datastax OpsCenter offers the repair service which runs smaller sub-range repairs all the time so although you're always repairing its going on in the background all the time at a lower resource level.

As for the snapshots, running a regular repair will invoke a snapshot as you stated, you can also invoke a snapshot yourself using nodetool snapshot

Hope this helps!

Headley answered 17/1, 2015 at 20:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.