What can be done with Mongo Aggregation / Performance of Mongo Aggregation
Asked Answered
G

2

8

I built a MongoDB. I want to do aggregation by certain grouping. I found this document, which will do that for me. Everything is ok, but certain limitations are pointed out:

  1. Output from the pipeline can only contain 16 megabytes. If your result set exceeds this limit, the aggregate command produces an error.

  2. If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce an error.

  3. The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.

How many rows / documents can I process with MongoDB aggregation? I am afraid to use this. Can anyone guide me about this?

Gyrostat answered 10/1, 2013 at 10:33 Comment(1)
+1 for API link & summary of limitations.Afb
G
18

I got valid and helpful answer from google groups. Would like to share with you all.

The limitation is not on the number of documents: the limitation is on the amount of memory used by the final result (or an intermediate result).

So: if you aggregate 200 000 documents but the result fits into the 16MB result, then you're fine. If you aggregate 100 documents and the result does not fit into 16 MB, then you'll get an error.

Similarly, if you do a sort() or a group() on an intermediate result, and that operation needs more than 10% of available RAM, then you'll get an error. This is only loosely related to how many documents you have: it's a function of how big the particular stage of the pipeline is.

Can i increase 16MB via any settings?

Is 16MB limitation only for End-Result OR Is that for that particular aggregation (means, intermediate results + any temporary holdings + End result)?

The 16MB limit is not adjustable. This is the maximum size of a document in MongoDB. Since the Aggregation framework is currently implemented as a command, the result from the aggregation must be returned in a single document: hence the 16 MB limit.

see this post

Gyrostat answered 29/1, 2013 at 10:29 Comment(1)
Not sure when $out was added to aggregation before or prior to this answer, but: docs.mongodb.org/manual/reference/operator/aggregation/outBose
F
1

The amount of processing that can occur with the aggregation framework depends on your schema.

The aggregation framework can only output the relative of one document at the moment (for larger output you will want to watch: https://jira.mongodb.org/browse/SERVER-3253 ) and it will output in the form of:

{
    result: { //the result },
    ok: 1/0
}

So you have to make sure that what you get back out of your $group/$project is not so big that you don't get back the results you need. Most of the time this will not be the case and a simple $group even on millions of rows can result in a response of smaller than 16Meg.

We have no idea of the size of your documents or the aggregative queries you wish to run as such we cannot advise nout.

If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce an error.

That is pretty self explanatory really. If the working set for an operation is so large that it takes more than 10 percent RAM ($group/Computed fields/$sort on computed or grouped fields) then it will not work.

Unless you try and misuse the aggregation framework to do your app logic for you then you should never really run into this problem.

The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.

Since $group is really hard to not do in memory (it "groupes" the field) this means that operations on that group are also in memory, i.e. $sort this is where you can start to use up that 10% if your not careful.

Fireball answered 10/1, 2013 at 10:47 Comment(2)
if i have 4 millions docs and i if i do simple aggregation like this >>>db.store.aggregate({$group: {_id: "$SO", total: {$sum: "$net"}}}); will it lead to any performance problem? It would probably produce 250 groups.Gyrostat
@Gyrostat Hmm hard to say, I can say that the aggregation framework will work on that set but it is hard to say what benchmark you will getFireball

© 2022 - 2024 — McMap. All rights reserved.