What is the best way to use collection as round robin in Mongodb

Asked 6/7, 2018 at 16:20 Answered 29/3, 2022 at 20:9

I have a collection named items with three documents.

{
  _id: 1,
  item: "Pencil"
}
{
  _id: 1,
  item: "Pen"
}
{
  _id: 1,
  item: "Sharpner"
}

How could I query to get the document as round-robin? Consider I got multiple user requests at the same time.

so one should get Pencil other will get Pen and then other will get Sharpner.

then start again from the first one.

If changing schema is a choice I am also ready for that.

Luganda answered 6/7, 2018 at 16:20 Comment(1)

Honestly I do not know if you can do that, but the simplest approach is changing your schema storing a counter different for each document (which could be your _id), then selecting them using this counter. Obviously you have to keep in memory the current counter. – Familiar 6/7, 2018 at 16:32

I think I found a way to do this without changing the schema. It is based on skip() and limit(). Moreover you can specify to keep the internal sorting order for returned documents but as the guide says you should not rely on this, especially because you are losing performance since the indexing is overridden:

The $natural parameter returns items according to their natural order within the database. This ordering is an internal implementation feature, and you should not rely on any particular structure within it.

Anyway, this is the query:

db.getCollection('YourCollection').find().skip(counter).limit(1)

Where counter stores the current index for your documents.

Familiar answered 6/7, 2018 at 16:55 Comment(2)

Where does counter come from? – Hobbie 6/7, 2018 at 17:25

@Hobbie it has to be declared. At least you have to store the next targeted document. MongoDB can not keep it in memory (it does not have any sense). – Familiar 6/7, 2018 at 17:54

Few things to start..

_id has to be unique across a collection especially when the collection is only a replication set.
This is a very stateful requirement and would not work well with a distributed set of services for example.

With that said, assuming you really just want to iterate from the database i would use cursors to accomplish this. This will do a collection scan and is very inefficient for the record.

var myCursor = db.items.find().sort({_id:1});

while (myCursor.hasNext()) {
   printjson(myCursor.next());
}

My suggestion is that you should pull all results from the database at once and do your iteration in the application tier.

var myCursor = db.inventory.find().sort({_id:1});
var documentArray = myCursor.toArray();
documentArray.foreach(doSomething)

Prisca answered 25/3, 2022 at 19:0 Comment(7)

My use case (idk if it was the original question, but I think it was) has concurrent access from multiple clients, so I don't think any of your suggestions would work. My workaround is writing a "last accessed" value to the document when used (using find_one_and_update) and sorting on that value picking the least recently used one. But it doesnt really scale well... – Showker 26/3, 2022 at 15:56

Ok if you have the need to fulfill this requirement from a distributed system my question is does it have to be consistent across the system? If so why? The design you mention here will work, but you will find that it wont scale due to latches on the document you are trying to update / select from other services in the system. – Prisca 27/3, 2022 at 16:51

My particular use case is that my db contains test data for a performance test (a list of customers). Multiple load generators are reading from the db concurrently. Strictly correct ordering is not important though. – Showker 28/3, 2022 at 10:7

If its just Performance Testing and you have no requirement to be globally consistently ordered then i would use the second answer i provided here provided the dataset can be held in memory and loop through in memory.. Otherwise the first works to only grab what you need for the next request. Upside to grabbing all of the data at once will be that you arent waiting for a response from a DB to produce load. – Prisca 28/3, 2022 at 13:8

Sorry, I was unclear: ordering and (most importantly) not getting the same record twice is important, but strict ordering is not (it is ok if a client doesnt get the right one every time, as long as there are no ”duplicates”) – Showker 28/3, 2022 at 15:3

ahh if thats the case i suggest you partition your data dynamically.. such that instance 1 gets one "chunk" and instance 2 gets a different "chunk". Get that data all at once before the test starts then kill the process when you have iterated through all data you require. You could also use a cursor with that approach if you like. – Prisca 28/3, 2022 at 16:29

Hmm. Thats not a bad idea. But as I have a variable number of readers it might become a little hard - and I dont know how many customers I’ll need before hand. I guess I could explore it.. it just feels like there should be a simpler way.. – Showker 28/3, 2022 at 17:50

If this is about distribution you may consider fetching random documents instead of round-robin via aggregation/$sample:

db.collection.aggregate([
 {
  "$sample": {
    "size": 1
 }
}
])

playground

Or there is options to randomize via $rand ...

Barncard answered 28/3, 2022 at 20:29 Comment(0)

Use text findOneAndUpdate after restructuring the data objects

db.counter.findOneAndUpdate( {}, pipeline)
    {
        "_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pencil",
        "counter" : 1
    }

 db.counter.findOneAndUpdate( {}, pipeline)
    {
        "_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pen",
        "counter" : 2
    }

where the data object is now:

    {
        "_id" : ObjectId("6242fe3bc1551d0f3562bcb2"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pencil",
        "counter" : 1
    }

and the pipeline is:

[{$project: {
     values: 1,
     selected: {
      $arrayElemAt: [
       '$values',
       '$counter'
      ]
     },
     counter: {
      $mod: [
       {
        $add: [
         '$counter',
         1
        ]
       },
       {
        $size: '$values'
       }
      ]
     }
}}]

This has some merits:

Firstly, using findOneAndUpdate means that moving the pointer to the next item in the list and reading the object happen at once.
Secondly,by using the {$size: "$values"} adding a value into the list doesn't change the logic.
And, instead of a string an object could be used instead.

Problems: This method would be unwieldy with more than 10's of entries

It is hard to prove that this method works as advertised so there is an accompanying Kotlin project. The project uses coroutines so it is calling a find/update asynchronously.

text GitHub

The alternative (assuming 50K items and not 3): Set-up a simple counter {counter: 0} and update as follows:

db.counter.findOneAndUpdate({},
[{$project: {
 counter: {
  $mod: [
   {
    $add: [
     '$counter',
     1
    ]
   },
   50000
  ]
 }
}}])

Then use a simple select query to find the right document.

I've updated the github to include this example.

Getty answered 29/3, 2022 at 20:9 Comment(4)

Nice solution! When you say it would become unwieldy, do you mean performance-wise? I have up to 50k entries.. – Showker 29/3, 2022 at 20:35

Oh. I misread your suggestion. Do you really mean I should put all the data in a single document? That would not work for me (for all kinds of reasons) – Showker 30/3, 2022 at 7:31

It is too slow with 50K entries in the values array. So you are left with a solution where the counter is updated with a findOneAndUpdate [{$project: {counter: { $mod: [{ $add: [ '$counter',1]}, 50000]}}}] followed by a second query using that index (you can add it as a field to the test documents ) – Getty 30/3, 2022 at 7:36

Look at the alternative in the GitHub – Getty 30/3, 2022 at 15:7

Recommended topics

Hot tags