A Client Walks Into a Server And Asks "What's New?" – Problems With Sequence Numbers

Asked 5/2, 2014 at 19:34 Answered 7/2, 2014 at 10:1

Solved mongodb client-server synchronization

I'm looking for a solution to an edge case scenario where a client continually asking the server for what's new will fail.

In this example, I'm not using timestamps because of another edge case problem. That's handled in this question: A Client Walks Into a Server And Asks "What's New?" – Problems With Timestamps

Assume we're using sequence numbers. There's a single sequence number that is atomically updated every time the table is changed. When any row is updated it records the current sequence. Then it's just a matter of the client asking for what's new since the last sequence it asked for. Simple? Yes, but...

Failure scenario:

Sequence starts at 1
1) Client A starts update. Updates sequence to 2
2) Client B starts update. Updates sequence to 3
3) Client B updates rows with sequence 3
4) Client C requests changes >1.  Gets B's changes. Good.
5) Client A updates rows with sequence 2. 
6) Client C requests changes >3.  Gets nothing. Doesn’t get Client A’s changes.

Because we're using a MongoDB, I don't believe we can easily lock the sequence during an update. And, if we could, I worry about performance.

Having clients ask "what's new" repeatedly seems like a common use case and I find it surprising not to find a better wealth of best practices on this.

Any ideas on solving this scenario or recommending a better, preferably platform agnostic, solution for asking for changes?

Faustina answered 5/2, 2014 at 19:34 Comment(2)

What's the difference between Client A's changes and Client C's changes? The amount of time you wait? If Client C asks "What's new?" again, does he get Client A's changes? – Masthead 5/2, 2014 at 19:36

Client C does not in this example submit changes. Client C won't get Client A's changes in the above because he won't ask for a low enough sequence number. – Faustina 6/2, 2014 at 10:17

One thing you can do is maintain a heap of sequence numbers that are in use as well as the "next sequence number to assign". Try the following:

When you grab a sequence number, put it in an "in-use" map.
When you are done making changes with that sequence number, remove it from the "in-use" std::set
Keep track of the minimum in the set. Whenever the minimum changes from "x" to "y", have Client C request values from x to y, but no greater than y.

So, in your example, when you update the sequence to 2, 1 is put in the in-use set. Then, when you update to 3, 2 is put in there, and the set contains 1 and 2. When the work for 2 is done, 2 is removed from the set, but client C does not pick up any changes because the minimum, 1, is unchanged. When client A is done with 1, the minumum changes from 1 to 3, and client C may read the changes from 1 to 3.

For a more complex example, suppose you have 6 clients use sequence numbers 11, 12, 13, 14, 15, and 16, yet finish in the following order: 12, 13, 11, 15, 14, 16 (which is the order they are removed from the "in-use" set. In this example, after 11 is gone, client C may read 11 through 13, because the minumum changes from 11 to 14. Then, after 14 is gone, client C may read 14 and 15, as the minimum changes from 14 to 16. Then, when 16 is gone, client C can read 16.

This is basically the algorithm we use in TokuMX replication that decides what oplog entries may be replicated to secondaries. Clients A and B would be threads doing writes to the oplog and Client C would be a tailable cursor from the secondary pulling oplog data.

Sindysine answered 5/2, 2014 at 20:39 Comment(5)

My only thought is it seems the minimum should be the last sequence received by the client, not the minimum defined by the in use table. For example, in your more complex scenario, Client C coming into the middle of the update may read >1 and <14. 14 is the lowest number in the in use table. This allows it to potentially pick up earlier sequences it may not have received yet. Say, there was an earlier completed update to sequence 10 and Client C hasn't seen that yet. Then, when the in use table is empty, Client C next reads >13 with no maximum. 13 was the last sequence it received. – Faustina 6/2, 2014 at 10:15

I might be misunderstanding something, but in the middle of the update, 11 is the lowest in the in-use table, not 14. The idea is that even though 12 and 13 have completed, because 11 has not, 12 and 13 are not read by client C yet. Client C will pick them up once 11 is completed. There is also another implicit detail that when the in-use table is empty, client C can read everything that exists. – Sindysine 6/2, 2014 at 11:57

In finishing in the following order: 12, 13, -CLIENT READS-, 11, 15, 14, 16, you are correct. 11 is the lowest in the in-use table. – Faustina 6/2, 2014 at 12:8

However, your original example had the the client reading in the following order: 12, 13, 11, -CLIENT READS-, 15, 14, 16, you are correct. In this case 14 is the lowest. – Faustina 6/2, 2014 at 12:12

I was just clarifying that the client should ask for all entries where sequence > clientLastSequenceReceived and sequence < currentMinimumFromInUseTable. – Faustina 6/2, 2014 at 12:14

This is clarifying and providing some examples of Zardosht Kasheff's correct answer above.

At the start of an update we add the sequence number to an in-use list
Any client requesting “what’s new” can’t ask for anything >= the lowest number on the list.
Thus if a couple clients were updating with sequences 10, 11, 12, a client would ask for everything from > sequenceOfLastItemReceivedByClient and < 10 (the lowest number on the in use list.)

Here's my original example solved:

Sequence starts at 1
Client A starts update. Uses sequence 2.  In use table now has “2"
Client B starts update. Uses sequence 3. In use table now has “2,3"
Client B updates rows with sequence 3. In use table now has “2"
Client C wants to request >1 .  However, min in-use table is 2.  So it requests >1 and <2 which is none.
Client A updates rows with sequence 2. In use table now empty.
Client C wants to request changes >1. Gets A and B.

And Zardosht's example

Clients start updates with sequences 11, 12, 13, 14, 15, and 16. In use table is "11, 12, 13, 14, 15, 16”
Updates 12, 13, 11 finish.  In use table is “14, 15, 16”
Client C wants to request >1.  Is allowed >1 and <14.  Gets sequences 11, 12, 13
Updates 15, 14, 16 finish.  In use table is empty.
Client C requests >13. Gets 15, 14, 16.

And a final example:

Clients start updates with sequences 11, 12, 13, 14, 15, and 16. In use table is "11, 12, 13, 14, 15, 16”
Updates 12, 16, 11 finish.  In use table is “13, 14, 15”
Client C wants to request >1.  Is allowed >1 and <13.  Gets sequences 11, 12
Updates 13, 14, 15 finish.  In use table is empty.
Client C requests >12. Gets 13, 14, 15, 16

All above examples pass. This seems to be a workable solution.

Faustina answered 7/2, 2014 at 10:1 Comment(0)

Recommended topics

Hot tags