Google File System Consistency Model
Asked Answered
T

2

8

I was reading about GFS and its consistency model but I'm failing to grasp some of it. In particular, can someone provide me with a specific example scenario (or an explanation of why it cannot happen) of:

  • concurrent record append that could result in record duplication
  • concurrent record append that could result in undefined regions
  • concurrent writes (on a single chunk) that could result in undefined regions
Tardigrade answered 9/1, 2015 at 16:1 Comment(0)
K
11

I'm quoting from http://research.google.com/archive/gfs.html. Check out Table 1, which is a summary of the possible outcomes for writes/appends:

Table 1 from GFS whitepaper

  1. "If a record append fails at any replica, the client retries the operation. As a result, replicas of the same chunk may contain different data possibly including duplicates of the same record in whole or in part." So any failure on a replica (e.g. timeout) will cause a duplicate record at least on the other replicas. This can happen without concurrent writes.

  2. The same situation that causes a duplicate record also causes an inconsistent (and hence undefined) region. If a replica failed to acknowledge the mutation, it may not have performed it. In that case when the client retries the append this replica will have to add padding in place of the missing data, so that the record can be written at the right offset. So one replica will have padding while other will have the previously written record in this region.

  3. A failed write can cause an inconsistent (hence undefined) region as well. More interestingly, successful concurrent writes can cause consistent but undefined regions. "If a write by the application is large or straddles a chunk boundary, GFS client code breaks it down into multiple write operations. They [...] may be interleaved with and overwritten by concurrent operations from other clients. Therefore, the shared file region may end up containing fragments from different clients, although the replicas will be identical because the individual operations are completed successfully in the same order on all replicas. This leaves the file region in consistent but undefined state [...]."

Kimmy answered 21/1, 2015 at 23:27 Comment(4)
Hi Daniel thanks for your answer, it answers most of my questions. There is still one thing though: my primary source of confusion comes from the fact that the client only communicates with the primary replica and it is this replica that waits for acknowledgements from the other ones. Therefore if one secondary replica fails an append, then the primary one will know that this has happened and has no reason to increase the pointer to the end of the file; therefore new appends should override that inconsistent area later on. What am I missing?Tardigrade
I think it would be possible to do as you say. But I think GFS does not do that. I figure they would mention this if they did it. My uneducated guess is that this is to increase throughput for concurrent appends. If the primary wanted to be able to move back the pointer in the case of a failure, it would not be able to accept other record appends while one is in progress. If it increases the pointer even for failed mutations, it is then able to accept other appends right away. It would be great if someone could confirm/correct this theory.Kimmy
Once the primary receives a failure on a replica, the next record it inserts will necessarily be the same one right ? As it says that 'client retries' the operation. Then how will other appends be possible like you sayAry
Why would they not be possible?Kimmy
L
1

I don't think it really has to do with concurrent append but wih the at least once semantics of their system.

Failure is a fundamental problem of large distributed systems. In the presence of failure a sender may not know if the computer on the other end of the network fully received its message.

For such occasions distributed systems guarantee that a message is either delivered either at most once or delivered at least once.

In this case, it appears GFS decided upon at least once delivery to the storage nodes.

Lomax answered 10/1, 2015 at 16:22 Comment(1)
Thanks for your answer! So this should happen only if the primary chunk server experiences an error AFTER it has already written the data and updated the end of the file right?Tardigrade

© 2022 - 2024 — McMap. All rights reserved.