Why is 2-phase commit not suitable for a microservices architecture?
G

6

25

I've read a post saying that:

We can not implement traditional transaction system like 2 phase commit in micro-services in a distributed environment.

I agree completely with this.

But it would be great if someone here can explain the exact reason for this. What would be the issues I'm going to face if I'm implementing 2-phase commit with microservices?

Thanks in advance

Garage answered 19/3, 2019 at 20:42 Comment(1)
I am not an expert, also don`t know how to answer exactly all the matters for this question, but IMO it is purely on how the architecture is based. For example, there is no way of you know which service processed your transaction on phase 1 to poll and ask again for confirmation, usually you call an API gateway, which will point to N instances of the service running.African
M
13

Some things to note and also give some background:

  1. In most scenarios microservices interact via HTTP (a stateless protocol) and as a result global/ XA transactions are just not applicable/ possible.
  2. Exactly once semantics are not possible and you should go for "at least once". This means all services should be idempotent.
  3. One good example of why is not possible of achieving "exactly once" semantics in such a setup is that http connections very frequently are lost on the way back to the client. This means that via a POST the state of the server has changed, while the client receives a timeout error.
  4. Inside the boundaries of a microservices you can use them just fine. As you mentioned Kafka you can quite easily consume (from 1 topic) and produce (to 1 or more topics) a single atomic/ all or nothing operation (exactly once semantics).
  5. But if you want global and long running transactions among microservices that interact via http the only practical option (you might see global transaction via http if you google, but for a production system just ignore them), is to design for eventual consistency. In brief this means, retry for ever for recoverable errors (this is a whole chapter in itself) and expose compensating endpoints or produce compensating events that will eventually amend non-recoverable errors. Check out the sagas pattern. Narayana Transaction Manager has good Sagas support and a good products comparison.
  6. Check out the related microservices patterns that offer an alternative to XA transactions (you might see this as global transactions or 2 phase commit/ 2PC) like Transactional Outbox or Event Sourcing that offer nice "at least once semantics".
  7. Distributed systems are very complicated and you should have a reason to go for such a solution. If you go distributed, operations that your monolith can safely delegate to your transaction manager, will have to be dealt by the developer/ architect :-).
  8. Also, the majority of non SQL databases/ systems do not support XA transactions (i.e. global transactions) at all, as they slow processing dramatically.
Migrate answered 20/3, 2019 at 10:23 Comment(0)
S
22

The main reason for avoiding 2-phase commit is, the transaction co-ordinator is a kind of dictator as it tells all other nodes what to do. Usually the transaction co-ordinator is embedded in the application server. The problem happens when after the 1st phase or prepare phase the transaction co-ordinator or the application server goes down. Now, the participating nodes don't know what to do. They cannot commit because they don't know if others have replied to the co-ordinator with a "no" and they cannot rollback because others might have said a "yes" to the co-ordinator. So, until the co-ordinator comes back after 15 minutes (say) and completes the 2nd phase, the participating data stores will remain in a locked state. This inhibits scalability and performance. Worse things happen when the transaction log of the co-ordinator gets corrupted after the 1st phase. In that case, the data stores remain in the locked state forever. Even restarting the processes won't help. The only solution is to manually check the data to ensure consistancy and then remove the locks. These things usually happen in a high pressure situation and therefore it's definitely a huge operational overhead. Hence the traditional 2-phase commit is not a good solution.

However, it should be noted here that some of the modern systems like Kafka have also implemented a 2-phase commit. But this is different from the traditional solution in that here every broker can be a co-ordinator and thus the Kafka's leader election algorithm and the replication model alleviate the issues mentioned in the traditional model.

Saiff answered 20/3, 2019 at 1:12 Comment(0)
M
13

Some things to note and also give some background:

  1. In most scenarios microservices interact via HTTP (a stateless protocol) and as a result global/ XA transactions are just not applicable/ possible.
  2. Exactly once semantics are not possible and you should go for "at least once". This means all services should be idempotent.
  3. One good example of why is not possible of achieving "exactly once" semantics in such a setup is that http connections very frequently are lost on the way back to the client. This means that via a POST the state of the server has changed, while the client receives a timeout error.
  4. Inside the boundaries of a microservices you can use them just fine. As you mentioned Kafka you can quite easily consume (from 1 topic) and produce (to 1 or more topics) a single atomic/ all or nothing operation (exactly once semantics).
  5. But if you want global and long running transactions among microservices that interact via http the only practical option (you might see global transaction via http if you google, but for a production system just ignore them), is to design for eventual consistency. In brief this means, retry for ever for recoverable errors (this is a whole chapter in itself) and expose compensating endpoints or produce compensating events that will eventually amend non-recoverable errors. Check out the sagas pattern. Narayana Transaction Manager has good Sagas support and a good products comparison.
  6. Check out the related microservices patterns that offer an alternative to XA transactions (you might see this as global transactions or 2 phase commit/ 2PC) like Transactional Outbox or Event Sourcing that offer nice "at least once semantics".
  7. Distributed systems are very complicated and you should have a reason to go for such a solution. If you go distributed, operations that your monolith can safely delegate to your transaction manager, will have to be dealt by the developer/ architect :-).
  8. Also, the majority of non SQL databases/ systems do not support XA transactions (i.e. global transactions) at all, as they slow processing dramatically.
Migrate answered 20/3, 2019 at 10:23 Comment(0)
G
3

The "We can not" here really means "It's a bad idea, and I don't want to, and if I admit the possibility then I might not be able to convince you not to insist".

Of course you can implement 2-phase commit across microservices, but:

  • 2-phase commit requires a significant development effort in every service that can participate in a transaction,
  • It causes a lot of contention between clients that grows with the communications latency between servers; and
  • All the services involved have to agree on a lot of protocol, configuration, deployment, and other details that determine how the 2-phase commit will actually work.

These problems are hard enough to manage among a few closely-coupled services on co-located servers with dedicated networks. In the more heterogeneous environments, with more servers, and higher latencies, that characterize microservice deployments, it becomes much, much harder.

Gefen answered 20/3, 2019 at 16:57 Comment(0)
G
1

Whole idea of microservices is loosely coupled and independent services. Since 2pc means we have 2 phase to commit the transaction. controlling node will drive the transaction and all other nodes first respond they are ready and in second phase they all commit or roll back depending on phase one.

what happens if controlling node is down? what happens when any of the other nodes are down? because of this limitation your whole transaction can't get through. In distributed transactions your nodes can be in different data centers or regions. the slowest node to respond will keep the other nodes in the waiting state while they could move on. so atomicity is hampering performance.

You can't scale the system and whole point the services should be independent and scaleable is lost.

2pc is not the wrong answer but in most cases we consider eventual consistency. if you have system that requires strong consistency then 2pc may be the option.

Guesswarp answered 19/3, 2019 at 21:5 Comment(0)
M
1

I would suggest that it is not that we cannot implement XA or 2PC for microservices, but rather that the HTTP-based API world hasn't been politically acceptable yet. In an older application, components might represent a larger set of complex business logic steps, but also span, hardware, geography, organizations and technologies. i.e. my business components could be spread across multiple companies with different user interfaces in each. The network protocols to integrate all these supported distributed transactions (2PC) as well as propagating user identities, etc.

This has the bureaucracy and legacy taint to it. Standardized distributed transactions supported by multiple interoperable platforms go back to the '80s. IT being heavily fashion-driven, what would happen in your workplace if you advocated too heavily for this type of feature.

Note: on the question of what would happen if the controlling node is down. In traditional applications, if the component committing the application died in mid-commit, the transaction would eventually timeout and be rolled-back on each of the components. In some cases recoverable transaction features were available. if the committer recovered before the time-out it would recover the transaction and continue the commit. We see this in enterprise applications a lot if an application server bounces, it recovers in-process work.

while I'm in a ranting mood :) - some pundits claim that XA has to be implemented on a single platform, like JTA. I've never found this to be true, XA has always worked for me across databases, application servers, and mainframes seamlessly)

Matri answered 17/2, 2021 at 19:45 Comment(0)
B
0

Two-phase commit is a good and proven solution for distributed transaction scenario when you want to connect two systems or applications.

Key reason for avoiding 2-phase commit for a microservices architecture is scalability. Suppose you implemented 2PC in first service which triggers a long running transaction. It updates its own DB and put a message on the queue for next service. There is no use of 2PC here because next service is not participating in same transaction rather it starts its own local transaction.

Brickey answered 14/1 at 15:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.