A Client Walks Into a Server And Asks "What's New?" – Problems With Timestamps
Asked Answered
C

0

1

I'm looking for a solution to an edge case scenario where a client continually asking the server for what's new will fail due to timestamps.

In this example, I'm not using sequence numbers because of another edge case problem. You can see that problem here: A Client Walks Into a Server And Asks "What's New?" – Problems With Sequence Numbers

Assume we're using timestamps. Every row update adds a timestamp of the server time. Clients continually ask what's new since the timestamp of the last item they received. Simple? Yes, but...

Failure scenario:

The times below are arbitrary for readability. Assume milliseconds in the real world.

2:50 Client C checks for updates.
2:59 Client A starts update on a row. (Sets lastModified to 2:59)
2:59 Client B starts update on a row. (Sets lastModified to 2:59)
3:00 Client A Row update becomes visible on DB. (lastModified still at 2:59)
3:00 Client C checks for updates >2:50. Get’s A’s update.  Good.
3:01 Client B Row update becomes visible on DB. (lastModified still at 2:59)
3:10 Client C checks for updates >2:59. Gets nothing. Misses B's update. Bad.

This assumes that the lastModified can't be set atomically and there may be a delay between it's setting and the row becoming available in the database. If the database were sharded, this delay could be much larger.

We could set the check for update to arbitrarily ask for an early time causing overlap. This is inefficient due to potentially duplicate data being retrieved but not fatal. However, is it possible to know how much overlap is needed for all cases? Could a sharded database rarely delay displaying an update by seconds? Minutes?

Having clients ask "what's new" repeatedly seems like a common use case and I find it surprising not to find a better wealth of best practices on this.

Any ideas on solving this scenario or recommending a better, preferably platform agnostic, solution for asking for changes?

Curling answered 5/2, 2014 at 22:48 Comment(4)
You say: 3:10 Client C checks for updates >2:59. Gets nothing. Misses B's update. Bad. Why does it miss B's update since in the previous row you say: 3:01 Client B Row update becomes visible on DB. (lastModified still at 2:59) ? Shouldn't it now see B's update?Havenot
Because it's looking for files a lastModified greater than 2:59. 2:59 is the lastModified the client last received. B was last updated at exactly 2:59. You could solve the above example with >= however that's just one example. The delay could be longer. And we might not know the maximum overlap.Curling
I think i understand now. Is it possible the problem might be modeled wrong? To me the idea of a row in a db table means that whoever reads it always reads the latest value without necessarily caring about who wrote it. If you care about all the changes (and you want to get them in the right order as it seems to me from the example) maybe your row shouldn't be a single row but a sequence of rows - something like the producer-consumer problem. So instead of always updating and reading a single row you will just insert new rows and then read, preferably with some kind of an index, on that table.Havenot
Yes, it could be modeled wrong. However, I can't see a solution to this which continues to use time stamps. Multiple rows are of course used. The above is a simplification. Unless you mean we should be using a change table to track changes. We do in fact do this however, in the interests of not duplicating data, the change table links merely tracks what's changes and links back to the master records. In the end, we went with a solution involving sequence numbers as linked above.Curling

© 2022 - 2024 — McMap. All rights reserved.