Mongo aggregation cursor & counting
Asked Answered
B

2

15

According to the mongodb node driver docs the aggregate function now returns a cursor (from 2.6).

I hoped that I could use this to get a count of items pre limit & skipping but there doesn't seem to be any count function on the created cursor. If I run the same queries in the mongo shell the cursor has an itcount function that I can call to get what I want.

I saw that the created cursor has an on data event (does that mean it's a CursorStream?) which seemed to get triggered the expected number of times, but if I use it in combination with cursor.get no results get passed into the callback function.

Can the new cursor feature be used to count an aggregation query?

Edit for code:

In mongo shell:

> db.SentMessages.find({Type : 'Foo'})
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).count()
3

> db.SentMessages.find({Type : 'Foo'}).limit(1)
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).limit(1).count();
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).count()
2014-08-12T14:47:12.488+0100 TypeError: Object #<Object> has no method 'count'

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).itcount()
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ]).itcount()
1

> exit
bye

In Node:

var cursor = collection.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ], { cursor : {}});

cursor.get(function(err, res){
  // res is as expected (1 doc)
});

cursor.count() does not exist

cursor.itcount() does not exist

The on data event exists:

cursor.on('data', function(){
    totalItems++;
});

but when used in combination with cursor.get, the .get callback function now contains 0 docs

Edit 2: The cursor returned appears to be an aggregation cursor rather than one of the cursors listed in the docs

Burtie answered 11/8, 2014 at 8:58 Comment(4)
Could you show us some code that isn't working as you expect it to? Is the count() function defined on cursors not working? Note that the aggregation framework in the Node driver only returns a cursor when you set the cursor option.Touchdown
I've added some code examples in mongo shell (which works how I would like it to work) and using the nodejs driverBurtie
I answered this duplicated question, but Martijn Pieters pointed out that is bad etiquette to answer duplicated questions, and proceded to delete my answer, so just go to this link: https://mcmap.net/q/350099/-how-to-get-the-length-of-a-cursor-from-mongodb-using-python I don't have time for flagging duplicates and for darn drama.Purple
Does this answer your question? How to get the length of a cursor from mongodb using python?Purple
S
21

This possibly deserves a full explanation for those who might search for this, so adding one for posterity.

Specifically what is returned is an Event Stream for node.js which effectively wraps the stream.Readable interface with a couple of convenience methods. A .count() is not one of them at present and considering the current interface used would not make much sense.

Similar to the result returned from the .stream() method available to cursor objects, a "count" would not make much sense here when you consider the implementation, as it is meant to process as a "stream" where eventually you are going to reach an "end" but otherwise just want to process until getting there.

If you considered the standard "Cursor" interface from the driver, there are some solid reasons why the aggregation cursor is not the same:

  1. Cursors allow "modifier" actions to be processed prior to execution. These fall into the categories of .sort(), .limit() and .skip(). All of these actually have counterpart directives in the aggregation framework that are specified in the pipeline. As pipeline stages that could appear "anywhere" and not just as a post-processing option to a simple query, this would not make much sense to offer the same "cursor" processing.

  2. Other cursor modifiers include specials like .hint(), .min() and .max() which are alterations to "index selection" and processing. Whilst these could be of use to the aggregation pipeline, there is currently no simple way to include these in query selection. Mostly the logic from the previous point overrides any point of using the same type of interface for a "Cursor".

The other considerations are what you actually want to do with a cursor and why you "want" one returned. Since a cursor is usually a "one way trip" in the sense that they are usually only processed until an end is reached and in usable "batches", then it makes a reasonable conclusion that the "count" just actually comes at the end, when in fact that "queue" is finally depleted.

While it is true that in fact the standard "cursor" implementation holds some tricks, the main reason is that this just extends a "meta" data concept as the query profiling engine must "scan" a certain number of document in order to determine which items to return in the result.

The aggregation framework plays with this concept a little though. Since not only are there the same results as would be processed through the standard query profiler, but also there are additional stages. Any of these stages has the potential to "modify" the resulting "count" that would actually be returned in the "stream" to be processed.

Again, if you want to look at this from an academic point of view and say that "Sure, the query engine should keep the 'meta data' for the count, but can we not track what is modified after?". This would be a fair argument, and pipeline operators such as $match and $group or $unwind and possibly even including $project and the new $redact, all could be considered a reasonable case for keeping their own track of the "documents processed" in each pipeline stage and update that in the "meta data" that could possibly be returned to explain the full pipeline result count.

The last argument is reasonable, but consider also that at the present time the implementation of a "Cursor" concept for the aggregation pipeline results is a new concept for MongoDB. It could be fairly argued that all "reasonable" expectations at the first design point would have been that "most" results from combining documents would not be of a size that was restrictive to the BSON limitations. But as usage expands then perceptions are altered and things change to adapt.

So this "could" possibly be changed, but it is not how it is "currently" implemented. While .count() on a standard cursor implementation has access to the "meta data" where the scanned number is recorded, any method on the current implementation would result in retrieving all of the cursor results, just as .itcount() does in the shell.

Process the "cursor" items by counting on the "data" event and emitting something ( possibly a JSON stream generator ) as the "count" at the end. For any use case that would require a count "up-front" it would not seem like a valid use for a cursor anyway, as surely the output would be a whole document of a reasonable size.

Seda answered 19/8, 2014 at 6:42 Comment(1)
Thanks for the detailed explanation as well! Ignoring implementation details for a moment, from my end the best interface would be for an aggregation cursor and a query cursor to have the same methods. Not that I mind streaming and manually counting, but a bit flow altering to do that for some cursors and not others, especially when the count can potentially be cached during cursor computation. Seriously though, thanks for taking the time to answer this thoroughly.Hypercorrect
H
0

You can use .toArray(), e.g.

// resultCursor from aggregation
var resultArray = resultCursor.toArray();
print(resultArray.length);
Herr answered 25/5, 2023 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.