Why is two-phase commit considered atomic?
Asked Answered
G

2

7

Two phase commit is described as an "atomic commitment protocol". I would expect this to mean that all clients see the state of the world from either before a transaction commits, or after it commits -- with no in-between state. It seems though that it can enter a state where a transaction is partially committed and clients see inconsistent data, breaking atomicity.

Consider the case with two databases, A and B. If there is a partition during the commit phase after A has committed but before B has committed, the transaction is partially committed. A user querying A and B will not see consistent data -- the transaction has committed on A, but B has data from before the commit.

The "Consistent" part of ACID also seems to be broken -- a client querying A and B could see data that violates business rules.

I guess the idea is that the system will eventually be able to recover from this, when the partition is over and the transaction manager instructs B to commit. In the meantime though, the system is in an inconsistent "partially committed" state. Isn't the whole point of atomicity to prevent this? By the time consistency is restored, the damage could already be done.

What property is referred to when two-phase commit is said to be atomic?

Grumpy answered 25/2, 2016 at 5:46 Comment(4)
Possible duplicate of How ACID is the two-phase commit protocol?Siphonostele
I think this is a question of Isolation Level. Below Repeatable Read you can see some data as it was before the transaction committed, and some data as it was after the transaction committed. As long as there are no dirty reads, that's still considered ACID (and this can happen even with a non-distributed database). Isolation Level Serializable is not usually required.Siphonostele
The answer to the other question says "2PC really only promises that an operation is Atomic". My question is why is it considered to be atomic when it doesn't seem atomic. What is the "atomicity" that 2pc provides?Grumpy
The answer also says that "2PC is not resilient to all failure scenarios" and that make break each of the four ACID. For example, if you shut down B in the middle it may not have committed yet (while A has), and will rollback during recovery, so you end up with an inconsistent state. But short of that, it is "atomic": The whole transaction will have been committed, or none of it. How the results are visible to other transactions is a question of Isolation.Siphonostele
M
3

Atomic means that either the operation will have some effect or the system will remain at the same state. The 2PC algo works such that first the coordinator asks all the distributed machines to prepare for the transaction. After receiving a Yes it sends in the command to commit the transaction.

If the coordinator receives a success from all the machines only then the transaction is complete otherwise if there is a network outage after that or any other issue then you'll fall into the issue of Two Generals' Problem. Its atomic as much as a distributed system can be.

Consistency can only be achieved with the isolation level. Allow reads or not and allow dirty reads or not.

Manualmanubrium answered 25/2, 2016 at 9:53 Comment(0)
C
2

I do not have an academic background in this but from my practical experience (I'm a QE for Narayana project) 2PC is not stated as ACID. It only ensures that transaction will be atomic. Atomic where everything or nothing is finished.

I think you've put limitation of 2PC quite well in your question.

As transaction is distributed over more DBs/JMS brokers/... there is no assurance that those will be isolated from each other. As stated transaction manager only manages resources and says when to prepare (lock) and when to commit. For example if connection between transaction manager and second resource is put down during commit phase when first resource is already committed then yes you can see already committed data at first resource but second one waits for commit will be processed after connection is put up. But you are ensured that it will be committed at the end. Isolation is (could be) ensured at level of particular resource (all transactions working with data on second resource will be isolated).

On the other hand I think that consistency is not broken. Consistency means that transaction brings the system from one valid state to another according to defined rules. System will be consistent at the start of transaction and at the end of transaction.

As ACID is stated (https://en.wikipedia.org/wiki/ACID) then normally even DB transaction relaxes isolation ACID property by default. The most DBs use as default isolation level READ COMMITTED which does protect you all troubles (non-repeatable reads and phantom reads could happen, when depends on DB implementation).

Candiot answered 7/3, 2016 at 23:3 Comment(1)
I returned back to this question by accident. I would like to enhance my response here. The issue is that the question mixes notion of consistency of ACID and consistency of CAP. Atomicity defines that the transaction is all committed or all aborted. Nothing else about having partial results during transaction executions. Consistency says that: "on the completion of a transaction, the database is structurally sound". I would point out the "on the completion". The ambiguity is that consistency from CAP says: "all executions of reads and writes seen by all nodes be atomic".Candiot

© 2022 - 2024 — McMap. All rights reserved.