PBFT: Why cant the replicas perform the request after 2/3 have prepared? why do we need commit phase?

Asked 1/7, 2018 at 16:54 Answered 12/9, 2024 at 7:55

I know there are some questions on this website that asks the same questions. However the answer is never clear:

In PBFT, why cant the replicas execute the requests after 2/3s have prepared? why is commit phase needed? if 2/3 + 1 replica have agreed to prepared, then I owuld think they can execute the request without broadcasting again?

Eufemiaeugen answered 1/7, 2018 at 16:54 Comment(4)

How can each replica know it's time to commit without communicating? – Wobble 2/7, 2018 at 21:38

Once the replicates received 2/3 prepared, they can commit. – Eufemiaeugen 3/7, 2018 at 14:30

"Once the replicates received 2/3 prepared, they can commit" --> they only confirm the request execution order. If commit phase disappears from the protocol, and due to partially synchronous nature assumed by PBFT, replicas can have different state over time. – Acromegaly 22/8, 2018 at 9:38

If there is no view change, we can commit after prepare. But if we have view change, we need commit phase. Commit phase ensures that once request m with sequence number n is commited at some replica(with proofs prepared from 2f +1 replicas named Q1), this (n,m) pair is gathered by new primary by some honest View-Change message from Q1. Then all replicas will redo (n,m） accepting New-View message. So, safety is kept. – Nebo 12/5, 2020 at 2:43

(Edited) In addition to previous (incomplete) answer, a quote from from Practical Byzantine Fault Tolerance and Proactive Recovery might help. Note that author claims that Prepare phase is enough for ordering requests in same view, but it is not enough for ordering requests across view changes, so that is why Commit phase is needed.

This ensures that replicas agree on a total order for requests in the same view but it is not sufficient to ensure a total order for requests across view changes. Replicas may collect prepared certificates in different views with the same sequence number and different requests. The commit phase solves this problem as follows.

Client's requests should be totally-ordered and should be executed in the exactly same order. Replicas reach consensus on the order of requests in prepare phase by collecting prepare messages by quorum size you mentioned, but doesn't execute it right away in that phase because they have to execute the same request in the same order. (In State Replication Machine system, all the state machines have to deterministically execute the same request in the same order to satisfy safety condition; Execution order affects the state machine's state)

So in commit phase, they have to reach consensus on the execution timing so that they execute the same request in the same time unit for safety condition.

Following your comment "Once the replicas received 2/3 prepared, they can commit", the internal state of each state machines(PBFT's node) would diverge, violating safety condition. That is why commit phase is needed.

Answer to your comment;

Above situation is possible when the replicas execute the request as soon as they get the prepare messages by quorum size. I think the important fact that PBFT assumes partial synchrony; messages can be arbitrarily delayed due to unstable network speed or adversary, but eventually received. So each replica can execute the request message at different time point, and the one example situation is illustrated.

Answer to your second comment

I think I need to elaborate the answer with illustrating coordinated attack of malicious replicas including primary. Let's say n replicas where n = 3f + 1 = 100, f = 33 in Byzantine fault tolerant system. In the system, the system can tolerate f number of Byzantine faulty replica. Now I give an counter-example to answer your question. Consider the following setting; I partitioned n replicas into three group;

G1 = {b1, b2, ... , b33} for Byzantine faulty replicas including Byzantine primary(b1), |G1| = 33
G2 = {r1, r2, ... , r33} for correct replica group, |G2| = 33
G3 = {r34, r35, ... , r67} for correct replica group, |G3| = 34

Because n = |G1| + |G2| + |G3| = 33 + 33 + 34 = 100, above partition makes sense. And G1 is entirely controlled in a coordinated way by super-genius hacker who are especially interested in destroy the protocol.

Now, I will demonstrate how above setting violates safety condition if the commit-phase disappears from the protocol; (The safety condition means that the state of G2 and G3 should be the same). For simple description, the consensus value is simplified as a binary value, not the request with sequence number.

[Pre-Prepare phase]: Primary(b1) sends a 0 value to G2 and 1 value to G3. This situation is not a problem cause we assume Byzantine primary.
[Prepare phase]: Now replicas in G2 and G3 exchange the message from the primary to check if they both have the same message. But, In this phase, replicas from G1 sends a 0 value to G2 and sends a 1 value to G3. After message exchange, the situation is as follows

a. Replicas in G2 received following results; 66 votes for 0 value, 34 votes for 1 value.

b. Replicas in G3 received following results; 33 votes for 0 value, 33+34=67 votes for 1 value.

Because quorum size is 2f+1 = 67, replicas in G3 accepts the proposed value from Byzantine primary who coordinates with Byzantine replicas while replicas in G2 doesn't.

So in the system, even though the system can tolerate up to 33 Byzantine faulty replicas including primary, it immediately fails in your assumption.

Acromegaly answered 22/8, 2018 at 9:23 Comment(12)

can you give an example? I cant seem to find a way for their states to diverge. – Eufemiaeugen 22/8, 2018 at 18:30

I added an example in the answer post. – Acromegaly 22/8, 2018 at 23:35

thank you so much yongrae for your effort. However, I dont think this is correct. PBFT only makes sure that the system will tolerate up to 1f byzantine nodes - this means if you get 2/3 of the nodes to agree, then at least 1 honest node has executed your request, and this is exactly what we want. There is no implication on the consistency of the internal state of all the nodes. Furthermore, even if we have 2 step commits - during the commit phase, some nodes can also drop out,. causing the diagram above to be true as well. – Eufemiaeugen 24/8, 2018 at 0:47

I added yet another example to answer your question. It covers specific voting scenarios where Byzantine primary and Byzantine replicas cooperate to destroy the protocol. – Acromegaly 24/8, 2018 at 20:21

Thank you again for the answer! I am loving this conversation - however I still think you are incorrect: in your example G3 had 2f+1 amount confirmation and they will perform commit. G1 is byzantine so that none of them will actually broadcast the commits to G2. However, G3 will, because they are honest replicas. In this case, G2 will hear at least "f" replicas broadcasting the commit for "1", and per PBFT, after a replica receives "f" number of commits, it will also commit. In addition, G3 can also construct proof that there has been more than 2f+1 prepares even if G1 is Byzantine – Eufemiaeugen 25/8, 2018 at 21:31

this is a really good conversation yongrae, do you have a contact info I can use? I am looking to hire top talents that understand PBFT. – Eufemiaeugen 25/8, 2018 at 21:32

"G1 is byzantine so that none of them will actually broadcast the commits to G2" --> It is wrong statement. Byzantine nodes can do everything in coordinated way. – Acromegaly 27/8, 2018 at 10:52

As explained earlier, G3 commit the value, and G2 doesn't. Safety condition states that G2 and G3 must have the same state. (We already have f number of Byzantine nodes;G1). – Acromegaly 27/8, 2018 at 11:0

Anyway, thanks for your interest about me, but I am currently not in job market though ;(. – Acromegaly 27/8, 2018 at 11:5

Hi yongrae, when G3 have 2f+1(=67) PREPARES(value:1) from G3(34) and faulty G1(33)，all nodes in G3 enters "COMMIT" status. Then G3 will broadcast COMMIT(value:1) and wait for 2f+1(=67) COMMITS(value:1). This can happen because both G3 and faulty G1 can broadcast COMMITS(value:1). So in the end, G3 still do the commitments while G2 dont. So the commit phase does not prevent the conseunsus status from being violated, G3 and G2 are still in different status. How to prevent this situation? – Nebo 7/5, 2020 at 18:11

I would also like to know the answer to @Nebo ‘s question. My guess is that yongrae ‘s example is not actually a violation of safety since it’s unrealistic to ask that all replicas are always perfectly in sync in a partial synchronous network; I think having a prefix state is also accepted. So to break safety G1 would need to force G2 to commit 0, which I don’t think they can, even in the prepare-execution example provided…? – Flam 14/10, 2022 at 14:12

i agree it's not a safety violation, but for liveness we need to change primary. Once we take into account the need for view changes, we need to make sure a prepared certificate from G3 survives the view change. Without commit phase, none of the nodes in G3 can be sure the other nodes are prepared. If there is a partition between new primary and prepared node(s) in G3, prepared certificate will not make it to new primary, so it would be a mistake to execute. With commit you can execute because new primary always receive a prepared certificate from G3 (2f + 1 intersect f + 1 correct nodes). – Mourning 12/5, 2024 at 15:40

Suppose there is no commit phase:

Assume that replica A prepares (n, m) then executes m immediately, while the other replicas neither prepared nor executed it. However, at this time, a view-change occurs, and the new primary cannot communicate with A. The new primary accepts the 2f+1 from other replicas, which have neither prepared nor executed m, stating that their maximum sequence number is n-1.

In the new view, the primary may generate (n, m') for a subsequent request m', which is different from the state of A.

Crossruff answered 19/11, 2019 at 9:56 Comment(0)

PBFT do need the commit phase to ensure that message m is assigned with sequence number n even during view change. This is because if a message m with sequence number n is committed at some replica after 2f+1 COMMITS is received, this (n,m) will be included in NEW-VIEW message thus rebroadcasting to all other nodes by new primary.

This question confused me for quite a long time and today I get it. I will try to explain it clearly but I suggest you be very very familiar with the paper, at least you should know what does "prepared" "commit_local" mean.

Overview of PBFT

The PBFT have three algorithms:

Request handling algorithm
- The client send message to primary with "id == currentView mod |R|"
- Three messages
  - 1) Pre-prepare(n, m, v, i). The acceptor will accept this if in view v they do not accept another pre-prepare whose sequence number n is assigned with another request m'. The condition "in view v" is important because it ensures that a faulty primary cannot stop a replica by sending a faulty Pre-prepare message.
  - 2) Prepare(n, m, v, i). If a replica gathers enough(2f+1) valid Prepare message it is prepared. Two non-faulty prepared replicas will agree on (n,m). This is because 2f+1 ensures that there is at least a non-faulty node k so k cannot send different messages.
  - 3) Commit(n, m, v, i). If a replica reaches "prepared", it will broadcast a Commit message and wait for another 2f Commits. Then it is committed locally and ensures that at least f+1 non-faulty replica are prepared.
View change
- To ensure liveness.
- Messages:
  - 1) View-Change: If a node feels timeout it will create a View-Change. This view-change will bring all prepared messages(with proofs) known to it.For better understanding I ignore CHECKPOINT message.
  - 2) New-View: If the new primary gathers 2f+1 View-Change, it will take all prepared messages in it. The latest prepared message is marked as "max-s". The new primary will broadcast and the acceptor will redo for each sequence number.
Garbage collection algorithm
- A way to remove logs after backing up. We don't discuss it in this post.

Why commit phase cannot be omitted?

Commit phase is key to safety during view change

From a high level view, the commit phase ensures this invariant:

If commit_local(n,m,v) is true for some replica i, then commit_local(n,m',v+1) is false for any non-fault replica i'.

Prove:

Commit_local(n,m,v) is reached at some replica i means that 2f+1 replica(Q1) declares prepared. This means that there is some non-faulty replica in Q1 votes for View-Change in New-View message. Hence, there is at least one prepared message (n,m) in New-View message. This message broadcasts to all other replicas with New-View message.

For any replica i' receiving New-View, it may have two status:

1) It already commit (n,m)

2) It hasn't commit (n,m) yet. It may be waiting for COMMITS, or PREPARES, or even PRE-PREPARE, whatever.

In the second case, it will reprocess from pre-prepare(n,m) phase for v+1. So (n,m) is kept during view change. In one word, the commit phase will ensure that if (n,m) is committed at some replica, then (n,m) will be included in NEW-VIEW thus it is sent to all other replicas in view-change phase.

What if we omit commit phase?

What if we omit the commit phase? The safety is destroyed because m can be committed with n at some replicas while another m' with n committed at some other replicas.

Suppose a non-faulty replica commits the request after (m,n) is prepared in view v. This means 2f+1 replicas(Q1) declared they receive pre-prepare for (m,n) in view v.

At the same time, it is possible that no other replica is prepared yet since the network is partially synchronized. After some timeout a view change happens.

Now since no replica is prepared, it is possible that some quorum does not send any view change with sequence number >= n, so in new primary max-s < n. Now new primary will assign n with a new message m' from client's new request. Though at least f+1 non-faulty replica in Q1 receives pre-prepare (n,m) in view v, these old pre-prepares cannot prevent new pre-prepare messages (n, m') in view v+1 from being accepted. They are not in same view! Recall the condition of accepting pre-prepare messages!!

Now, (n,m) is committed in some replica while (n,m') is committed in other replicas during view change.

Conclusion: the commit phase ensures that if (n,m) is committed at some replica, then (n,m) will be included in NEW-VIEW thus it is sent to all other replicas in view-change phase.

Nebo answered 11/5, 2020 at 18:42 Comment(1)

This did it for me: "Conclusion: the commit phase ensures that if (n,m) is committed at some replica, then (n,m) will be included in NEW-VIEW thus it is sent to all other replicas in view-change phase." Thank you! – Jotter 17/2, 2022 at 18:41

Example:

The leader(primary) send pre-prepare message
Replicas accept the pre-prepare and send prepare messages
Due to network partition, only the leader receives 2f+1 prepare messages and enters the prepared state. All other replicas is still in "pre-prepared" state(network partition prevent them from receiving enough prepare messages)
The leader goes down while network between other replicas recover
How should the new leader deal with the value which is prepared in the dead old leader but still "pre-prepare" in all other replicas?

If no commit phase, the prepared state should be considered committed. Then the value should be considered committed, because the old leader has "committed" it. To keep safety, we need to learn this value and re-propose it, just like what we do in view-change.

But can we learn it? The only node who "commits" it has go down. If we learn it from other replicas, the byzantine nodes will lie. (This is why we only re-propose prepared values with 2f+1 proof messages in view-change)

In the real pBFT(with prepare phase), the aforementioned value is just ignored, it's only prepared not committed

And a similar question for real pBFT: How to deal with the value only committed by the dead old leader?

To be committed, this value must have been in prepared state in 2f+1 replicas. The value can be get back in view-change through re-propose the prepared values

PS: In Paxos, there is no byzantine nodes, less phases is need

Incommunicative answered 9/4, 2023 at 3:2 Comment(0)

The thing is if the n sequence number and m message are committed in view v (by a non-faulty replica) there should not exist any v' (v' > v), that commit (n, m') for that non-faulty replica. If we remove the commit phase this case can happen:

Consider this bad scenario if we omit the commit phase: (f=1)

Given view v a non-faulty replica receives 2f+1 prepare certificates for (n, m). So assumes it should execute the request and give an answer back to the client.
There are f malicious, so they can convince the client the request was committed (n, m).
The new primary is elected because other non-faulty backups reached time out and have not commented yet.
The new primary in v+1 assigning (n, m') which m′≠ m, which eventually committed. Bad thing happened!

 Safety is violated!

The solution: Replica needs to make sure enough replicas also received enough (2f+1) prepare certificates. Only then it can executed and reply to client.

Pictorial answered 12/9, 2024 at 7:55 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Overview of PBFT

Why commit phase cannot be omitted?

Commit phase is key to safety during view change

What if we omit commit phase?

Recommended topics

Hot tags