MongoDB - paging
Asked Answered
S

6

88

When using MongoDB, are there any special patterns for making e.g. a paged view? say a blog that lists the 10 latest posts where you can navigate backwards to older posts.

Or do one solve it with an index on e.g. blogpost.publishdate and just skip and limit the result?

Symptomatology answered 19/2, 2011 at 9:31 Comment(1)
I'm going to leave this one hanging since there seems to be some disagreement about what is the correct way to make this scale.Symptomatology
P
108

Using skip+limit is not a good way to do paging when performance is an issue, or with large collections; it will get slower and slower as you increase the page number. Using skip requires the server to walk though all the documents (or index values) from 0 to the offset (skip) value.

It is much better to use a range query (+ limit) where you pass in the last page's range value. For example if you are sorting by "publishdate" you would simple pass the last "publishdate" value as the criteria for the query to get the next page of data.

Pehlevi answered 19/2, 2011 at 19:5 Comment(10)
Will be great to see some documents thats confirm that skip in mongodb iterate through all documents.Alyce
Here you go: skip docs If there is any other place where the information should be updated please let me know.Pehlevi
@ScottHernandez: I have paging with links to multiple pages (like so: Page: First, 2, 3, 4, 5, Last) and sorting on all the fields. Only one of my fields is unique (and indexed), will a range query work for this use-case? I'm afraid not, I just wanted to confirm if it were at all possible. Thanks.Macgregor
user183037: Generally no, this method works for next/prev style navigation. You could play some tricks but they get complicated and are not good general solutions.Pehlevi
Here is the skip docs linkClose
Seems like this would not work if there were multiple documents with the same publishdate value.Naughty
Is this answer still current? One thing I'm confused about is the case where skip is relatively small -- that is, you expect to allow access to the first few pages. Is that a problem?Mack
What happens if you dont have date but just a list of documents?Sly
Note that range is better only as long as people using your app only want to sort / search on unique fields - or don't care about the order of the results to search queries. It doesn't help for scenarios like searching and sorting on non-unique fields (e.g. name, email domain, page title, etc). At the point where using skip() becomes slow it might be time to consider indexing the data some other way - such as pushing it to an Elasticsearch service and running searches against that rather than against MongoDB.Mundell
Could you see this link guys #47568513Govern
H
15
  1. Range based paging is hard to implement if you need to sort items in many ways.
  2. Remember if the field value of the sort parameter is not unique , then Range based paging will become unrealiable.

Possible solution: try to simplify the desgin , thinking about if we can only sort by id or some unique value?

And if we can , then range based pageing can be used.

The common way is use sort() , skip() and limit() to implement paging what is described above.

Horlacher answered 15/9, 2011 at 6:53 Comment(6)
a good article with Python code examples can be found here codementor.io/arpitbhayani/…Complicity
Thank you - great answer! I'm annoyed when people suggest pagination by using filters e.g. { _id: { $gt: ... } }... it simply doesn't work if using custom ordering - e.g. .sort(...).Frowsty
@NickGrealy I followed a tutorial to do just this and I am now in a situation where paging 'looks' like it works but I get missing documents because I am using the mongo ID but as new data gets inserted into the db, and then the collection is sorted alphabetically if the starting page contains records that start A but the ID's are higher than records starting AA because they were inserted after then the AA records are not returned by the paging. Is skip and limit suitable? I have in the region of 60 million documents to search.Estuarine
@Estuarine - this is worthy of conversation - you won't get your answer here in the comments. Question: what behaviour do you expect? You're working with a live system, with records being created and deleted all the time. If you re-request a live snapshot of your data for every new page load, then you should expect your underlying data to change. What should be the behaviour? If you work with a "point in time" data snapshot, you'll have "fixed pages", but you'll also have "out of date" data. How big is the problem you're describing, and how often do people encounter it?Frowsty
Its definitely worth a conversation, my issue is that I retrieved a one off file in alphabetical order of licence plates and every 15 minutes apply updates to changed (removed or added) plates, the issue is that if a new plate is added and it starts with an A for instance and due to page size is the last on the page then if next is requested then no records are returned I believe (an assumption and a contrived example but illustrative of my issue) because the ID is higher than any others in the set. I am looking at using the full licence plate to drive the greater than part of the query now.Estuarine
The truth is that paging beyond just a limited resultset is a bit of a myth. Humans don't scale to 60-million rows, so your software doesn't need to. If someone is looking for something and you return more than, say 50 rows, it could be better to just tell them if there were more records and they should consider trying to provide more information to search by. Also, in terms of the eventual consistency problem you're battling with rows coming and going while you're paging, consider putting a "correct as of [date/time]" message someone near the results (a user could simply leave a page open).Edulcorate
L
5

Thisis the solution I used when my collection grew too large to return in a single query. It takes advantage of the inherent ordering of the _id field and allows you to loop through a collection by specified batch size.

Here it is as an npm module, mongoose-paging, full code is below:

function promiseWhile(condition, action) {
  return new Promise(function(resolve, reject) {
    process.nextTick(function loop() {
      if(!condition()) {
        resolve();
      } else {
        action().then(loop).catch(reject);
      }
    });
  });
}

function findPaged(query, fields, options, iterator, cb) {
  var Model  = this,
    step     = options.step,
    cursor   = null,
    length   = null;

  promiseWhile(function() {
    return ( length===null || length > 0 );
  }, function() {
    return new Promise(function(resolve, reject) {

        if(cursor) query['_id'] = { $gt: cursor };

        Model.find(query, fields, options).sort({_id: 1}).limit(step).exec(function(err, items) {
          if(err) {
            reject(err);
          } else {
            length  = items.length;
            if(length > 0) {
              cursor  = items[length - 1]._id;
              iterator(items, function(err) {
                if(err) {
                  reject(err);
                } else {
                  resolve();
                }
              });
            } else {
              resolve();
            }
          }
        });
      });
  }).then(cb).catch(cb);

}

module.exports = function(schema) {
  schema.statics.findPaged = findPaged;
};

Attach it to your model like this:

MySchema.plugin(findPaged);

Then query like this:

MyModel.findPaged(
  // mongoose query object, leave blank for all
  {source: 'email'},
  // fields to return, leave blank for all
  ['subject', 'message'],
  // number of results per page
  {step: 100},
  // iterator to call on each set of results
  function(results, cb) {
    console.log(results);
    // this is called repeatedly while until there are no more results.
    // results is an array of maximum length 100 containing the
    // results of your query

    // if all goes well
    cb();

    // if your async stuff has an error
    cb(err);
  },
  // function to call when finished looping
  function(err) {
    throw err;
    // this is called once there are no more results (err is null),
    // or if there is an error (then err is set)
  }
);
Lucho answered 13/7, 2015 at 19:55 Comment(3)
don't know why this answer doesn't have more upvotes. this is a more efficient way to paginate than skip/limitRagtime
I've also come by this package, but how it the performance compared to skip/limit and the answer provided by @Scott Hernandez ?Dodwell
How would this answer work, for sorting on any other field?Frowsty
P
1

Range based paging is doable, but you need to be smart about how you min/max the query.

If you can afford to you should try caching the results of a query in a temporary file or collection. Thanks to TTL collections in MongoDB you can insert your results into two collections.

  1. Search+User+Parameters Query (TTL whatever)
  2. Results of query (TTL whatever + cleaning interval + 1)

Using both assures you will not get partial results when the TTL is near the current time. You can utilize a simple counter when you store the results to do a VERY simple range query at that point.

Pantile answered 3/5, 2013 at 0:28 Comment(0)
R
1

Here is an example of retrieving a list of User documents order by CreatedDate (where pageIndex is zero-based) using the official C# driver.

public void List<User> GetUsers() 
{
  var connectionString = "<a connection string>";
  var client = new MongoClient(connectionString);
  var server = client.GetServer();
  var database = server.GetDatabase("<a database name>");

  var sortBy = SortBy<User>.Descending(u => u.CreatedDate);
  var collection = database.GetCollection<User>("Users");
  var cursor = collection.FindAll();
  cursor.SetSortOrder(sortBy);

  cursor.Skip = pageIndex * pageSize;
  cursor.Limit = pageSize;
  return cursor.ToList();
}

All the sorting and paging operations are done on server side. Although this is an example in C#, I guess the same can be applied to other language ports.

See http://docs.mongodb.org/ecosystem/tutorial/use-csharp-driver/#modifying-a-cursor-before-enumerating-it.

Ringsmuth answered 15/3, 2014 at 5:49 Comment(0)
D
0
    // file:ad-hoc.js
    // an example of using the less binary as pager in the bash shell
    //
    // call on the shell by:
    // mongo localhost:27017/mydb ad-hoc.js | less
    //
    // note ad-hoc.js must be in your current directory
    // replace the 27017 wit the port of your mongodb instance
    // replace the mydb with the name of the db you want to query
    //
    // create the connection obj
    conn = new Mongo();

    // set the db of the connection
    // replace the mydb with the name of the db you want to query
    db = conn.getDB("mydb");

    // replace the products with the name of the collection
    // populate my the products collection
    // this is just for demo purposes - you will probably have your data already
    for (var i=0;i<1000;i++ ) {
    db.products.insert(
        [
            { _id: i, item: "lamp", qty: 50, type: "desk" },
        ],
        { ordered: true }
    )
    }


    // replace the products with the name of the collection
    cursor = db.products.find();

    // print the collection contents
    while ( cursor.hasNext() ) {
        printjson( cursor.next() );
    }
    // eof file: ad-hoc.js
Drain answered 14/8, 2014 at 12:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.