2PC vs Sagas (distributed transactions)

Asked 21/2, 2018 at 13:10 Answered 12/1, 2024 at 23:27

Solved transactions cloud microservices distributed-computing saga

I'm developing my insight about distributed systems, and how to maintain data consistency across such systems, where business transactions covers multiple services, bounded contexts and network boundaries.

Here are two approaches which I know are used to implement distributed transactions:

2-phase commit (2PC)
Sagas

2PC is a protocol for applications to transparently utilize global ACID transactions by the support of the platform. Being embedded in the platform, it is transparent to the business logic and the application code as far as I know.

Sagas, on the other hand, are series of local transactions, where each local transaction mutates and persist the entities along with some flag indicating the phase of the global transaction and commits the change. In the other words, state of the transaction is part of the domain model. Rollback is the matter of committing a series of "inverted" transactions. Events emitted by the services triggers these local transactions in either case.

Now, when and why would one use sagas over 2PC and vice versa? What are the use cases and pros/cons of both? Especially, the brittleness of sagas makes me nervous, as the inverted distributed transaction could fail as well.

Viceregent answered 21/2, 2018 at 13:10 Comment(1)

Interesting... No answers and no close votes... – Astronavigation 1/6, 2018 at 9:9

In my understanding (not a big user of 2PC since I consider it limiting):

Typically, 2PC is for immediate transactions.
Typically, Sagas are for long running transactions.

Use cases are obvious afterwards:

2PC can allow you to commit the whole transaction in a request or so, spanning this request across systems and networks. Assuming each participating system and network follows the protocol, you can commit or rollback the entire transaction seamlessly.
Saga allows you split transaction into multiple steps, spanning long periods of times (not necessarily systems and networks).

Example:

2PC: Save Customer for every received Invoice request, while both are managed by 2 different systems.
Sagas: Book a flight itinerary consisting of several connecting flights, while each individual flight is operated by different airlines.

I personally consider Saga capable of doing what 2PC can do. Opposite is not accurate.

I think Sagas are universal, while 2PC involves platform/vendor lockdown.

Updates/Additions (optional read):

My answer has been here for a while, and I see that the topic has gained some traction since.

I want to clarify a couple of points on this topic for those who come here and are not sure which route to take.

Saga is a domain modeling (i.e., technology-agnostic) concept, while 2PC is a technology-specific notion with some (maybe many) vendors implementing it. For an analogy, it's the same if we compare the domain events (bare objects) with message brokers (such as RabbitMQ for example).
2PC can be a good choice if you are anyway married to platforms that implement such a protocol. Not all do, and thus I call this a limitation. I see that people found an argument that Saga is more limiting because it's harder to implement, but that's like saying orange is juicier than apple is sweet. Two different things.
Consider the human factor too. Some people (developers, architects) are technology geeks. They call business logic or domain model a boilerplate code. I belong to another group of people who consider the domain model the most valuable piece of code. Such a preference also affects decisions between Saga and 2PC, as well as who likes what. I can't explain why you should prefer domain-driven thinking over technology-driven solutions because it won't fit on this page and you will abandon reading my answer. Please find more online, maybe through my writings.

@freakish in the comments mentioned a fair point: 2PC prefers consistency, while Saga degrades it to "eventual consistency." If you have a situation where consistency is more important than availability (please read CAP), then maybe you do need a system transaction protocol like 2PC. Otherwise, I recommend going with business transactions such as Saga. Please read System Transactions vs Business Transactions e.g. in PEAA.

Gauguin answered 23/7, 2018 at 20:34 Comment(17)

nice answer but as Sagas are capable of what 2PC can do, they have the overhead of implementing the redo mechanism. I feel critique about the last line of your answer :D – Forwardlooking 17/12, 2018 at 16:19

The last line speaks about vendor lockdown vs staying universal and platform independent. What exactly do you feel is not accurate in it? – Gauguin 17/12, 2018 at 16:31

In that case you are right. 2PC lacks platform independence – Forwardlooking 18/12, 2018 at 14:24

Nice explanation. – Glasper 5/4, 2019 at 16:26

"I personally consider Saga capable of doing what 2PC can do." Saga has very weak consistency guarantees compared to 2PC. For example saga has no read isolation, at least out of the box like 2PC. It is the other way around: 2PC can do anything that saga can and more. – Stopple 2/12, 2021 at 14:46

It depends on how you define the constraints and what's more valuable to your solution. First, 2PC is only possible if all integrated systems/platforms implement the same protocol. That alone, to me, is not a universally flexible statement. e.g., does AWS implement some kind of 2PC-compliant protocol? (I don't know, I'm just asking) Second, your argument was about a weak consistency guarantee. I am fine with that, and the entire "eventual consistency" movement is also fine with that, apparently. – Gauguin 2/12, 2021 at 21:19

@Gauguin what I really wanted to say (which I turned into an answer) is that your answer doesn't address the most important thing which is the consistency model. You may be ok with eventual consistency, but someone else might be not. This is a huge issue with huge consequences. You cannot just use sagas and 2PC interchangeably. If you have strong consistency requirements then the flexibility of saga is completely irrelevant. These methods have different consequences. However I don't see this being mentioned in your answer at all. Which makes me wonder how it even got 50 votes... – Stopple 2/12, 2021 at 21:34

@Stopple If XA protocol (mentioned in your answer) takes weeks, that is nothing other than eventual consistency. So you say you want consistency over availability out of CAP. That is a very rare case. Perhaps that is why people [almost = 99%] always prefer the other way around. That would explain the upvotes. too. I hope you do agree that if you run after consistency too much, you lose availability, in the context of this entire topic - distributed systems. – Gauguin 2/12, 2021 at 21:44

@Gauguin first of all you are completely wrong: whether a transaction takes a day, week or a year has nothing to do with consistency. Long time doesn't make it eventually consistent. Eventual consistency means that once the system starts mutating its state it will be consistent somewhere in the future but it might be inconsistent in the meantime. As opposed to strong consistency which means the state is consistent at all times, even when a transaction waits whole year to be commited. It simply means that from the outside it looks like it never happend and then after a year it magically appears. – Stopple 2/12, 2021 at 21:47

Secondly: how do you know that availability is preffered over consistency? And that it is not only rare but very rare to have the opposite requirement? Speculation. And what is more important: irrelevant. It is up to others to decide whether they need consistency or not, not up to you. – Stopple 2/12, 2021 at 21:49

Allow me to disagree, @freakish. I can't explain all this in a comment. I will agree on one thing though - I will add a note in my answer to say that it is a tradeoff. It really is. BTW, I added some updates to my answer already. – Gauguin 2/12, 2021 at 21:50

I added the last paragraph, which I hope addresses your feedback. However, I stay with my recommendation, and maybe the links in the paragraph also clarify why. Also, as I already said, I disagree with what you said about eventual consistency. Eventual consistency does not mean dirty reads and all that, especially with Sagas. It is easily solvable with proper aggregate designs and proper implementation of a Saga pattern. – Gauguin 2/12, 2021 at 21:57

@Gauguin I appreciate your update. But I also think that you heavily overestimate what sagas can do. I already gave in my answer an example of consistency requirement (account balance always nonnegative) that cannot be easily solved with sagas. And in fact I'm pretty confident that any solution will be more or less equivalent to 2PC, except that on top of saga. Still, I might be wrong and I'd love to see your design. – Stopple 2/12, 2021 at 22:3

@Gauguin also one more comment about strong consistency. There is a concrete example where strong consistency is mandatory. Bitcoin, probably all cryptocurrencies and blockchain based tech. Without strong consistency this type of networks would be simply broken and unusable. Although these do not use 2PC, but other mechanisms (proof of work, proof of stake, etc) for different reasons. – Stopple 3/12, 2021 at 11:52

@Stopple good point. I don't know much about bitcoin and other alike implementations so I will not be able to argue about it. – Gauguin 3/12, 2021 at 13:41

@Gauguin for situations like your non-negative account requirement, it is quite it easy to add a last step to a saga confirming the "commit" for the few services genuinely not tolerating eventual consistency. You can always maintain strict consistency even without 2PC, it simply requires making it explicit in the domain model. I personally find this approach better than hiding consistency requirements in the database layer. Such situations should be expressed and handled in the domain and not rely on vendor technology. 2PC is overrated due to RDBMS addiction and anemic domains. – Ceuta 27/6, 2022 at 4:9

2PC solves a problem actually created by the relational model in the first place. I've worked with a NoSQL Database not supporting joins and transactions, and I've found that the need for database transactions always come from a lack of domain expressiveness and resilience. Whenever I find myself thinking that I need a transaction, I see it as a red flag that something is missing in my domain model. 2PC is a bad solution hiding modeling problems but may be an acceptable shortcut in some cases. @Gauguin made a valid point about this coming down to seeing the domain as boilerplate code or not. – Ceuta 27/6, 2022 at 4:19

I'm adding my answer in order to address the main difference between sagas and 2PC which is a consistency model.

Sagas, on the other hand, are series of local transactions, where each local transaction mutates and persist the entities along with some flag indicating the phase of the global transaction and commits the change.

Interesting description. What exactly this flag is? Is each node supposed to commit changes after the global transaction completes (and this is tracked by this flag)? And each node keeps local changes invisible to the outside until this happens? If that's the case, then how is that different from 2PC? If that's not the case, then what this flag is even for?

Generally, as far as I understand, a saga is a sequence of local transactions. If any of the nodes in the sequence fails then the flow is reversed and each node spawns a compensating transaction in the reversed order.

With this idea however we encounter several issues: the first one is what you've already noticed yourself: what if compensating transactions fail? What if any communcation at any step fails? But there's more, with that approach dirty reads are possible. Say Node1 succeeds and Node2 fails. We then issue a compensating transaction on Node1. But what if some another process reads data after Node1 was updated but before compensating transaction reverts that update? Potential inconsitency (depending on your requirements).

Generally, sagas are: eventually consistent and efficient (no global resource locking) by design. If you have full control over all nodes then saga can be made strongly consistent but that requires a lot of manual (and not obvious, e.g. communication issues) effort, and likely will require some resource locking (and thus we will lose performance). In that case why not use 2PC to begin with?

On the other hand 2PC is strongly consistent by design, which makes it potentially less efficient due to resource locking.

So which one to use? That depends on your requirements. If you need strong consistency then 2PC. If not then saga is a valid choice, potentially more efficient.

Example 1. Say you create an accounting system where users may transfer money between accounts. Say that those accounts live on separate systems. Furthermore you have a strict requirement that the balance should always be nonnegative (you don't want to deal with implicit debts) and maybe a strict requirement that a maximum amount can be set and cannot be exceeded (think about dedicated accounts for repaying debts: you cannot put more money than the entire debt). Then sagas may not be what you want, because due to dirty reads (and other consistency phenomena) we may endup with a balance outside of the allowed range. 2PC will be an easier choice here.

Example 2. Similarly you have an accounting system. But this time a balance outisde of range is allowed (whoever owns the system will deal with that manually). In that scenario perhaps sagas are better. Because manually dealing with a very small number of troublesome states is maybe less expensive then maintaining strong consistency all the time.

Stopple answered 2/12, 2021 at 21:16 Comment(4)

Good thought path in this answer. I am commenting because I want to clarify how Saga could achieve always a non-negative balance example. User submits a transaction request, which is a Saga in a way. The Saga goes through phases, first phase - deduct amount. Second phase, add amount. Deducting an amount is an atomic operation in itself, so if you deducted successfully, then you can add successfully. The only thing is that the amount is nowhere (or is between) for a moment, but that's not a big deal. This approach is well within the Saga's competency. – Gauguin 2/12, 2021 at 22:12

Well, okay, for completeness. If the second system fails, you need to retry. Saga knows it has deducted the amount, so it needs to retry. If Saga's logic determines that it needs to revert the transaction, that is also trivial because the money was taken, and you put it back. The account stays always positive. Either way, what this solution tells us is that Saga is a business concept. i.e., you write each concrete Saga's logic from scratch. That's by design and not a bad thing as such. – Gauguin 2/12, 2021 at 22:19

@Gauguin yes, everything can be done. But at what cost? In my simple scenario this a matter of correct order of the "add" and "remove" operations, that's right. However that is no longer the case if I extend the saga to some third step (e.g. save the result of the transfer to third database) that may fail. My point is that now we dive into a dangerous territory of fixing potential inconsistencies manually and ad hoc methods. Which is not easy at all. – Stopple 3/12, 2021 at 7:36

Also I've never said that saga is a bad design. I'm only saying that there are issues with it and whoever uses it should be aware of them. There are issues with 2PC as well (resource locking mostly). – Stopple 3/12, 2021 at 7:39

Your comparisons are not logically consistent. Older solutions like Sagas take more work to implement that XA/2PC

Typically, 2PC is for immediate transactions. Typically, Sagas are for long running transactions.

this is incorrect, XA transactions can run for weeks if you want, no-timeouts are an option. I've worked with systems where XA/2PC run for a week, some where they run for 1ms.

I personally consider Saga capable of doing what 2PC can do. Opposite is not accurate.

No, Sagas are a more primitive solution to XA. XA is the newer solution. In Sagas boilerplate needs to be developed to handle the transactions. XA moves the common elements of transaction management to the underlying platform, reducing the boiler plate bloat developers have to manage.

I think Sagas are universal, while 2PC involves platform/vendor lockdown.

XA spec has been implemented by many vendors and is pretty universal. Implementing 2PC across multiple platforms across multiple organizations has not been a problem for over 30 years.

Spann answered 9/8, 2021 at 16:31 Comment(6)

Please clarify about last one. May be it is correct for databases, but in general I think it is not. For example backend have to (1) request API of first 3rd party payment provider to add balance, (2) request API of second 3rd party payment provider to reduce balance, (3) save record about two successful request into local db (for notify admin via UI). So how to implement such logic with 2PC? I don't have any expirience with this technology. Please describe specific technology and how it handle this scenario. – Berndt 21/9, 2021 at 16:6

You should be able to download a copy of XA spec and see how it works. Databases, Messaging Servers, and Application Containers, I've used this for more than 20 years mix and matching products from many vendors and opensource with multiple transport protocols. But I think you hit the nail on the head, you don't have experience with this technology. From Spring-not-Swing to Microservices, there is a great volume of ideology, rhetoric, virtual signalling, and negative marketing but not a lot of experience with technology already developed and problems already solved. – Spann 23/9, 2021 at 16:7

@Berndt how would you implement that with saga? Say you have balance 0, you add 10, saga continues and the last step fails (for whatever reason). You start compensating transactions. But what if in the meantime someone withdraws 5 from the account that has 10 now? Compensating transaction arrives, your balance is -5 now. Is that ok? Maybe it is, maybe it is not, maybe my system does not allow negative balance. Such thing can be guaranteed with 2PC. But not with saga (or at least not easily). My point is: these are not equivalent and should not be treated that way. – Stopple 2/12, 2021 at 15:17

Okay, I see here is an entire answer to dispute my answer. I still stand by my answer, because what I see is a technology enthusiast trying to sell a technology over the domain concept. Saga is a domain model concept, XA/2PC (I'm not familiar with XA yet) sounds very technology-specific implementation. That alone makes it less flexible and less favorable for many, myself included. – Gauguin 2/12, 2021 at 21:24

Sorry, your answer is not the one mentioning XA. I have not read yours yet. One more adventure. – Gauguin 2/12, 2021 at 21:45

@freakish, I did not tell anything about comparsion saga vs 2PC. I asked how XA/2PC or 2PC can handle case above. Just curious about technology. Nothing more :) – Berndt 3/12, 2021 at 19:32

Others have already explained Saga and 2PC. I'd take a shot at when and why would one use sagas over 2PC and vice versa? For small number of participants and strong consistency, 2PC is recommended. For large number of participants, saga is preferred.

Taxable answered 12/1, 2024 at 23:27 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags