CAP theorem - Availability and Partition Tolerance
Asked Answered
E

14

325

While I try to understand the "Availability" (A) and "Partition tolerance" (P) in CAP, I found it difficult to understand the explanations from various articles.

I get a feeling that A and P can go together (I know this is not the case, and that's why I fail to understand!).

Explaining in simple terms, what are A and P and the difference between them?

Emie answered 10/9, 2012 at 6:25 Comment(3)
don't go for the readymade anslwers . Read , visualize and understand each C , A , P separately . Design a distributed cluster architecture (maybe 3 DB) and now apply your understanding . See what happens to C,A,P when failures of the distributed (DB's) happens . Once you understand , then check for answers and apply with your logic . Remember - Even if you understand , it might not be clear . so, think and apply your understanding . ThanksUtopia
Somehow the above ksat.me link goes to 404 url because it ends with '/'. ksat.me/a-plain-english-introduction-to-cap-theorem This works fine and is very detailed explanation of each of 'C', 'A', 'P'Diphyodont
my answer here which describes what should be considered before choosing hbase?Prepense
I
648

Consistency means that data is the same across the cluster, so you can read or write from/to any node and get the same data.

Availability means the ability to access the cluster even if a node in the cluster goes down.

Partition tolerance means that the cluster continues to function even if there is a "partition" (communication break) between two nodes (both nodes are up, but can't communicate).

In order to get both availability and partition tolerance, you have to give up consistency. Consider if you have two nodes, X and Y, in a master-master setup. Now, there is a break between network communication between X and Y, so they can't sync updates. At this point you can either:

A) Allow the nodes to get out of sync (giving up consistency), or

B) Consider the cluster to be "down" (giving up availability)

All the combinations available are:

  • CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved).
  • CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.
  • AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)

You should note that CA systems don't practically exist (even if some systems claim to be so).

Irisation answered 10/9, 2012 at 8:14 Comment(16)
In AP why we do not have guaranteed that all nodes will have the same data? Ok, because of we do not have "C" but.. this is not clear for me... I want to know why this happens...Antiicer
@Antiicer Sorry for the late answer. If you have both availability (the cluster doesn't go down) and partition tolerance (the database can survive nodes being unable to communicate), then you can't guarantee that all nodes will always have all the data (consistency), because nodes are up and accepting writes, but can't communicate those writes to each other.Irisation
Late to the party, but it's worth showcasing some examples in each category, eg. blog.nahurst.com/visual-guide-to-nosql-systemsFrancophile
it'd really help to include a simple illustration/example about node-clusters meant here. is it a system or a data table/collections spread across different system or something else?Eclogite
Pragmatically, nodes are most often individual systems (or software running on those systems) connected by some networking mechanism.Irisation
“Availability means the ability to access the cluster ..”— this should be ‘extent of access to the cluster’. The cluster is still up, but only few nodes are accessible.Faience
For this statement: B) Consider the cluster to be "down" (giving up availability), doesn't it mean we lost both A and P? The cluster doesn't function now...Anestassia
This is a relevant read. Google Cloud Spanner claims to be CA system in practice (not technically) for most use cases : cloud.google.com/blog/products/gcp/…Korey
B) Consider the cluster to be "down" (giving up availability) In this case, how does the system is partition tolerant?Callis
You can remain "partially available" in something like a master-slave setup by making slaves unavailable during a partition, while leaving the master online. You just can't keep the whole cluster available during a partition - only the parts capable of declaring what authoritative state is.Irisation
If it makes sense to ask, How availability works in the AP system? Consider 3 nodes A, B, C system, with RF=3, and B & C are down. Any write to node A with consistency ALL/QUORUM would fail as B, C is down. How availability is achieved here?Nichellenichol
Requiring ALL is essentially CP, not AP. QUORUM works by ensuring that writes only go to the majority cluster in a partition, but if no cluster of a majority of nodes can be formed, it can't continue. No system can maintain availability with a critical number of nodes being offline.Irisation
I feel partition tolerance and consistency goes hand-in-hand in case of CA. Example, if a partition occurs and I choose to be out of sync then data is immediately inconsistent. This is because I have to be available. Then how is system CA in that case, because I have lost consistency. I have rather become AP at this point of time. Am I the only one getting confused reading this theory from CAP?Vomitory
@Vomitory If you "choose to be out of sync" then you chosen to accommodate partitions by giving up consistency - you are indeed AP! An actual CA system - one which demands both consistency and availability and gives up partition tolerance - can only reliably function when no partitions are possible (ie, a single node network). An actual CA system cannot survive a partition because it's not partition tolerant! See the article linked in the bottom of my answer for more.Irisation
Yes! Actual CA systems cannot have a partition. I read your last line later after reading educative.io/blog/what-is-cap-theorem. Cheers!Vomitory
Also, I conclude that practically the systems can be either consistent or available as they always are gonna be having a partition. For me, partition tolerance isn't really a practical component of theorem, esp. the way it is represented as a triangle. It just confuses the reader to draw permutations and combinations of 2^3 rather than understanding it as a boolean :)Vomitory
G
90

Considering P in equal terms with C and A is a bit of a mistake, rather '2 out of 3' notion among C,A,P is misleading. The succinct way I would explain CAP theorem is, "In a distributed data store, at the time of network partition you have to chose either Consistency or Availability and cannot get both". Newer NoSQL systems are trying to focus on Availability while traditional ACID databases had a higher focus on Consistency.

You really cannot choose CA, network partition is not something anyone would like to have, it is just an undesirable reality of a distributed system, networks can fail. Question is what trade off do you pick for your application when that happens. This article from the man who first formulated that term seems to explain this very clearly.

Glissade answered 23/1, 2014 at 17:10 Comment(3)
This is what I also understand from CAP theorem. On network partition, you can either chose consistency or availability.Nabalas
Agree, traditional SQL databases are CA, but they don't have any partitioning, only failover for HA. Can a system without P be even considered distributed?Wiskind
Indeed, there is a lot of noise and misinterpretation in the internet, most of answers in this thread are also confusing. However this article really resolved all mess I had.Floydflss
A
25

Here is how I'm discussing CAP, regarding P particularly.

CA is only possible if you are OK with a monolithic, single server database (maybe with replication but all data on one "failure block" - servers are not considered to partially fail).

If your problem requires scale out, distributed, and multi-server --- network partitions can happen. You're already requiring P. Few problems I approach are amenable to single-server-always paradigms (or, as Stonebraker said, "distributed is table stakes"). If you can find a CA problem, solutions like a traditional non-scale-out RDBMS provides a lot of benefits.

For me, rare: so we move on to discussing AP vs CP.

You only choose between AP and CP operation when you have a partition. If the network & hardware is operating correctly, you get your cake and eat it too.

Let's discuss the AP / CP distinction.

AP - when there is a network partition, let the independent parts operate freely.

CP - when there is a network partition, shut down nodes or disallow reads and writes so there are deterministic failures.

I like architectures that can do both, because some problems are AP and some are CP - and some databases can do both. Among the CP and AP solutions, there are subtleties as well.

For example, in an AP dataset, you have the possibility of both inconsistent reads, and generating write conflicts - these are two different possible AP modes. Can your system be configured for AP with high read availability but disallows write conflicts? Or can your AP system accept write conflicts, with a strong and flexible resolution system? Will you need both eventually, or can you pick a system that only does one?

In a CP system, how much unavailability do you get with small partitions (single server), if any? Greater replication can increase unavailability in a CP system, how does the system handle those tradeoffs?

These are all questions to ask with CP vs AP.

A great read in this area right now is Brewer's "12 years later" post. I believe this moves forward the CAP debate with clarity, and recommend it highly.

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

Ancelin answered 9/7, 2014 at 17:31 Comment(3)
CA system is indeed confusing, I have a question regarding your CA example of a monolithic database. If it is just a single server, where does the "A" come from, since it appears to me that the failure of the said server will result in no service being available?Similitude
Good question. Servers can have a disk fail, or even have DIMMs fail, or have power supplies fail if they are designed for high availability. Even imagine being on multiple power grids. You get higher and higher availability, but there is never a "network" inside that has the capability to partition and run with components disagreeing. While more esoteric hardware exists ( look up SQL NON-STOP ), examples of RAID arrays with failing and resuming components are still common these days, and provide very high availability in a single server.Ancelin
Hm, my read of your response @BrianBulkowski is that the "A" is saying "it'll still be available even if there's a network partition", not "it'll still be available if the node goes down". Is that accurate?Manhole
V
20

CAP Theorem

Consistency:

A read is guaranteed to return the most recent write(like ACID) for a given client. If any request comes during that time it has to wait till data sync is completed across/in the node(s).


Availability:

every node (if not failed) always executes queries and should always respond to requests. It does not matter whether it returns the latest copy or not.


Partition-tolerance:

The system will continue to function when network partitions occur.


Regarding AP, Availability(always accessible) can exist with(Cassendra) or without(RDBMS) partition tolerance

pic source

Valse answered 18/4, 2017 at 9:19 Comment(0)
H
18

I have gone through lot of links, but none of them could give me satisfactory answer, except one.

Hence I am describing CAP in very simple wordings.

  • Consistency: Must return same Data, regardless to from which node is it coming.

  • Availability: Node should respond (must be available).

  • Partition Tolerance: Cluster should respond (must be available), even if there is a a partition (i.e. network failure) between nodes. enter image description here ( Also one main reason it confuses more is bad naming convention of it. If I had right, I might have given DNC theorem instead: Data Consistency, Node Availability, Cluster Availability, where each corresponds to Consistency, Availability and Partition Tolerance respectively )

CP database: A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.

AP database: An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)

CA database: A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance. In a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed database in theory, for all practical purposes, a CA distributed database can exist but should not exist.

Hence, this doesn’t mean you can’t have a CA database for your distributed application if you need one. Many relational databases, such as PostgreSQL, deliver consistency and availability and can be deployed to multiple nodes using replication.

Source: https://www.ibm.com/cloud/learn/cap-theorem

Haddington answered 29/6, 2020 at 19:26 Comment(0)
U
3

I feel partition tolerance is not explained well in any of the answers so just to explain things in some more detail CAP theorem means:

C: (Linearizability or strong consistency) roughly means

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state).

A:

“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.

P:

Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.

Source: Awesome Martin kleppmann's work

Just to take some example: Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.

Unseam answered 23/6, 2019 at 21:8 Comment(0)
P
3

enter image description here

According to the above diagram C is disconnected but A,B, D can continue its work. Now we can call system is partially working(Partition Tolerance).

Think a particular transaction needs only A,B and D.system can perform it without making any inconsistencies.

But When C has to be involved in a particular transaction, system can perform in two ways.

1.A can reject user request because C is not available.

So the system has Partition-Tolerance and consistency (P,C).
But no availability, because of the rejection.

2.A can hold messages that should received by C in A’s local memory and transfer when the C is connected back.

So the system has Partition-Tolerance and availability (P,A).
But no consistency.because C has not updated.
Paleolithic answered 20/9, 2022 at 9:19 Comment(0)
U
2

In simple CAP theorem states that its impossible for a distributed system to simultaneously provide all three guarantees:

enter image description here

Consistency

Every node contains same data at the same time

Availability

At least one node must be available to serve data every time

Partition tolerance

Failure of the system is very rare

Mostly every system can only guarantee minimum two features either CA, AP, or CP.

Uttica answered 17/5, 2018 at 7:45 Comment(1)
You didn't answer the question. Actually, what you're saying is exactly the thing that made the OP confused.Disembowel
I
2

Simple way to understand CAP theorem:

In case of network partition, one needs to choose between perfect availability and perfect consistency.

Picking consistency means not being able to answer a client's query as the system cannot guarantee to return the most recent write. This sacrifices availability.

Picking availability means being able to respond to a client's request but the system cannot guarantee consistency, i.e., the most recent value written. Available systems provide the best possible answer under the given circumstance.

This explanation is from this excellent article. Hope it will help.

Ive answered 15/10, 2019 at 9:2 Comment(0)
I
2

I will explain in detail with the ATM example mentioned here

The CAP theorem talks about the trade-offs between consistency and availability that you have to make if your distributed system ever suffers partitions. Distributed system means you store the data in multiple nodes and partition means the connection between those nodes is somehow broken.

A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

Imagine we have a small bank and only 2 ATM's. Customers can deposit, withdraw and check the balance. You have to make sure that balance never goes below zero. The connection between those ATMs can be broken in 3 ways

1- The ATM that you need use is not working. you just put a sign that is out of order

2- The one that you are going to use is working but the other ATM is not working

3- They both are working but there is a network problem and they cannot communicate with each other.

This distributed system is suffering partition and we need to choose between availability and consistency:

  • If the bank chooses a consistent design, the ATM would not process your request because it cannot update the balance in the other ATM.

  • If the bank chooses the availability, your ATM would process the request, keep track of what happened, and later when the connection is established, it just tells other ATM what has happened but the balance will be inconsistent meanwhile.

Intemperate answered 22/1, 2023 at 4:59 Comment(0)
S
1

Brewer's keynote, the Gilbert paper, and many other treatments, places C, A and P on an equal footing as desirable properties of an implementation and effectively say 'choose two!'. However, this is often considered to be a misleading presentation, since you cannot build - or choose! - 'partition tolerance': your system either might experience partitions or it won't.

CAP is better understood as describing the tradeoffs you have to make when you are building a system that may suffer partitions. In practice, this is every distributed system: there is no 100% reliable network. So (at least in the distributed context) there is no realistic CA system. You will potentially suffer partitions, therefore you must at some point compromise C or A.

https://github.com/henryr/cap-faq#10-why-do-some-people-get-annoyed-when-i-characterise-my-system-as-ca

Siglos answered 7/5, 2021 at 4:4 Comment(0)
H
0

Consistency – When we are sending the read request, if it is returning result, it should return the most recent write given by client request. Availability – Your request for read/write should always succeed. Partition tolerance – When there is network partition (problem for some machines to talk with each other) occurs, system should still work.

In a distributed there are chances that network partition will occur and we cannot avoid “P” of CAP. So we choose between “Consistency” and “Availability”.

http://bigdatadose.com/understanding-cap-theorem/

Huffman answered 5/3, 2015 at 8:46 Comment(0)
D
0

A distributed system has three characteristics, according to the CAP theorem:

Consistency (C) denotes that all system components have the same information.

The availability (A) of a system means that it does not stop working because another system fails.

Partition tolerance (P) indicates that a system will continue to function in the event of arbitrary network package loss.

According to the CAP theorem, a system can have no more than two of these three features. (AP, CP, CA)

Dish answered 9/2, 2023 at 7:46 Comment(0)
L
0

I'll put my two cents in as well.

At some point I understood that most confusing part for me in CAP theorem is CA, especially RDBS (Postgres, MySQL) related to this magic CA letters.

enter image description here

It seems very natural that RDBS relates to CA, why it confusing me?

Because when I think of RDBS, I think it of it as a production ready system which usually involves replication.

And when RDBS run with replication it is not related to CA anymore.

Why? Because replication breaks Availability. So replicated RDBS is related to CP.

Another question comes to mind here - why A went away with RDBS replication??? Because replicated RDBS involves replication process which makes system LESS available than non-replicated.

Lurette answered 6/3 at 16:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.