Meaning of eventual consistency in Cassandra?
Asked Answered
I

5

14

What is the meaning of eventual consistency in Cassandra when nodes in a single cluster do not contain the copies of same data but data is distributed among nodes. Now since a single peice of data is recorded at a single place (node). Why wouldn't Cassandra return the recent value from that single place of record? How do multiple copies arise in this situation?

Introspection answered 3/1, 2011 at 12:18 Comment(0)
A
4

Its up to the client to decide the appropriate consistency level (zero, any, one, quoram or all). (The consistency level controls both read and write behavior based on your replicationfactor.) In a single node cluster the consistency levels any, one, quorom and all are equivalent.

Auriculate answered 3/1, 2011 at 12:33 Comment(4)
but consistency among what data ?, since a single data is located at a single place in a cluster.. there multiple are no copies of data..then what consistency ?Introspection
on a single node cluster you dont have to worry about consistency (aslong as you dont do asynchronous writes (CL.ZERO, dont use this one)).Auriculate
i m sorry i think you got me wrong.. i mean in a cluster with several nodes, the data is distributed/ sharded (and not replicated) so there are no multiple copies of a single piece of data amongst different nodes in n-node cassandra cluster, then how is consistency defined in this case where there is,infact, no multiple copies?... i hope you got my point..Introspection
if you have a replicationfactor=1 then there will be just a single replica of each data set. as stated in my answer above: "The consistency level controls both read and write behavior based on your replicationfactor"Auriculate
T
7

Cassandra's consistency is tunable. What can be tuned?

  • Number of nodes needed to agree on the data for reads... call it R
  • Number of nodes needed to agree on the data for writes... call it W

In case of 3 nodes, if we chose 2R and 2W, then during a read, if 2 nodes agree on a value, that is the true value. The 3rd may or may not have the same value.

In case of write, if 2W is chosen, then if data is written to 2 nodes, it is considered enough. This model IS consistent.

If R + W > N where N is number of nodes, it will be eventually consistent.

Cassandra maintains a timestamp with each column and each field of column to eventually become consistent. There is a background mechanism to reach a consistent state. But like I said, if R + W > N, then it is consistent solid. That is why consistency is considered tunable in Cassandra.

Full consistency has to be reached at some point. This can be done using read repair i.e. during a read from say 3 nodes, 2 return a value, and 3rd is out of date, then a repair can be performed by cassandra on the 3rd node. This can also be done by a batch job from time to time.

Twana answered 3/3, 2017 at 23:52 Comment(0)
A
4

Its up to the client to decide the appropriate consistency level (zero, any, one, quoram or all). (The consistency level controls both read and write behavior based on your replicationfactor.) In a single node cluster the consistency levels any, one, quorom and all are equivalent.

Auriculate answered 3/1, 2011 at 12:33 Comment(4)
but consistency among what data ?, since a single data is located at a single place in a cluster.. there multiple are no copies of data..then what consistency ?Introspection
on a single node cluster you dont have to worry about consistency (aslong as you dont do asynchronous writes (CL.ZERO, dont use this one)).Auriculate
i m sorry i think you got me wrong.. i mean in a cluster with several nodes, the data is distributed/ sharded (and not replicated) so there are no multiple copies of a single piece of data amongst different nodes in n-node cassandra cluster, then how is consistency defined in this case where there is,infact, no multiple copies?... i hope you got my point..Introspection
if you have a replicationfactor=1 then there will be just a single replica of each data set. as stated in my answer above: "The consistency level controls both read and write behavior based on your replicationfactor"Auriculate
B
2

Even with replication factor = 1, consistency is not necessarily immediate because writes are buffered on the node that you send them to and hence don't necessarily immediately get sent to the node responsible for that key.

But it depends on what consistency level you choose.

Mostly the use-case for Cassandra is with replication factor > 1, which is where consistency becomes more of an issue. RF=3 seems to be a common setting (as it allows Quorum reads/writes with one node unavailable)

Batfish answered 4/1, 2011 at 7:3 Comment(1)
How long can it take before the write is flushed from memtable (memory buffer) to the disk (= to responsible node)?Wendish
T
2

Here is a nice explain about eventually consistent: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

Thad answered 4/3, 2011 at 5:42 Comment(0)
L
1

Cassandra tends to compromise latency and consistency for availability. It’s “eventually consistent,” a model for NoSQL database consistency that’s used with distributed setups. Rather than maintain strict consistency that could really slow things down at scale, eventual consistency enables high availability—just at the cost of every instance of your data not being synced up across all servers right away.

Lukas answered 30/12, 2016 at 8:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.