I'm wondering how sorting with an index actually works in MongoDB. There are a couple articles in the MongoDB documentation, but they don't actually describe how the sort proceeds or the time complexity. Searches of SO and the interweb in general so far haven't turned up anything relevant.
Let's assume there are a documents in a collection, the find() clause matches b documents, there's a limit of c documents returned, a >> b >> c, and c is some suitably large number such that the returned set cannot fit in memory - let's say 1M documents, for example.
At the start of the operation, there exist b documents that needs to be sorted and a sorted tree index of size a for the feature the documents will be sorted by.
I can imagine:
A) traverse the index in order, and for each ObjectID traverse the list of b documents. Return matches until c is reached. This would be O(ab).
B) as A), but build a hashset of the ObjectIDs in the b documents first. This is O(a), but takes O(b) memory.
I've tried to consider sorts based on traversing the set of b documents, but can't seem to come up with anything faster than O(b log b), which is no better than sorting without an index.
I assume (but maybe I'm wrong) that every sort doesn't require an index scan, so how does the sort actually work?
Update:
Kevin's answer and provided link narrow down the question a lot, but I'd like to confirm/clarify a few points:
- As I understand it, you cannot use different indexes for the query and the sort if you want to avoid an in-memory sort. When I read this page it appeared as though you could (or at least, it didn't specify one way or the other), but that seems to be incorrect. Essentially, the documents are sorted because they're looked up in the order of the index during the query and therefore returned in the order of the index. Right?
- When querying on a compound index, the the sorting index must be the first index in the compound index, except for indexes where the query is an equality. If not, the sort is performed in memory. Right?
How does sorting work with
$in
or$or
queries? For example, assume the query is{a: {$in: [4, 6, 2, 1, 3, 10]}, b: {$gt: 1, $lt: 6}}
... and there's a compound index on a
and b
in that order. How would the sort work in the cases the sort is on a
or b
? $or
is even more complicated since, as I understand it, $or
queries are essentially split into multiple separate queries. Are $or
queries always an in-memory sort, at least for merging the results of the separate queries?