How does raft preserve safty when a leader commits a log entry and crashes before informing followers this commitment?

Asked 10/12, 2020 at 8:18 Answered 26/3, 2023 at 15:10

Solved distributed-computing distributed-system consensus raft

In my understanding, a leader sends AppendEntries RPC to the followers, and if majority of followers return success, the leader will commit this entry. It will commit this entry by applying it to its own state machine, and it will also return to the client to let the client know that the command is successful.

However, at this time, this commitment is not known to the followers yet. It will inform the followers in the next AppendEntries (or heartbeat) RPC calls.

In the simplest case, if the leader crashes after the commitment and before the next AppendEntries, raft will use the "only most up to date follower can win" strategy to ensure that the next leader must contain this log entry (although not committed), and the new leader will commit this entry and send AppendEntries to other followers. In this way, the log entry is safely kept.

However, consider the following complicated scenario (extracted from PHD thesis "CONSENSUS: BRIDGING THEORY AND PRACTICE" page 23).

At this point, the log entry from term 2 has been replicated on a majority of the servers, but it is not committed. If S1 crashes as in (d1), S5 could be elected leader (with votes from S2, S3, and S4) and overwrite the entry with its own entry from term 3.

How if at this point, it is committed in Server S1, but not committed in other servers yet? If S1 then crashes as in (d1), this log entry will be overwritten by S5?

In my understanding, a committed entry (applied to state machine and possibly informed the client about the result) shall never be overwritten?

Did I misunderstand anything of the raft protocol?

Thanks.

Carillo answered 10/12, 2020 at 8:18 Comment(0)

There are more conditions in Raft to commit an entry.

On page 4 of this paper (The 1-page summary of raft) it says

Leaders:

...

If there exists an N such that N > commitIndex, a majority of matchIndex[i] ≥ N, and log[N].term == currentTerm set commitIndex = N (§5.3, §5.4).

In other words, not only does the entry have to be replicated to a majority, its term has to be from the current term. This is why in a practical system a new leader will propose a no-op so that it can advance this commitIndex.

So now we know the leader won't commit until then, but what if it commits but doesn't send the commit.

Later in section 5.4.1 of the same paper it says (emphasis mine):

Raft...guarantees that all the committed entries from previous terms are present on each new leader from the moment of its election....Raft uses the voting process to prevent a candidate from winning an election unless its log contains all committed entries. A candidate must contact a majority of the cluster in order to be elected, which means that every committed entry must be present in at least one of those servers.

In short, the new leader by definition must have the entries that the old leader considered committed.

Portillo answered 11/12, 2020 at 5:0 Comment(0)

Yet it still could be the case where the new leader has the committed entry but does NOT know it SHOULD commit it.

Let's say S1 is the old leader, and Log1 is appended to S2 and S3. At this point, S1 knows it is time to commit Log1. S1 crashes right after it commits locally.

Now S2 comes to power and has Log1, but it will never commit it until a new log entry arrives at S2, bumping S2's commitIdx, which EVENTUALLY applies Log1.

Studdard answered 26/3, 2023 at 15:10 Comment(0)

Recommended topics

Hot tags