Get the size of all the documents in a query
Asked Answered
C

2

9

Is there a way to get the size of all the documents that meets a certain query in the MongoDB shell?

I'm creating a tool that will use mongodump (see here) with the query option to dump specific data on an external media device. However, I would like to see if all the documents will fit in the external media device before starting the dump. That's why I would like to get the size of all the documents that meet the query.

I am aware of the Object.bsonsize method described here, but it seems that it only returns the size of one document.

Cyprus answered 17/11, 2015 at 21:35 Comment(1)
Mongo has to scan all documents to get the size. So I see only incremental approach when you iterate through your query result.Pixilated
C
17

Here's the answer that I've found:

var cursor = db.collection.find(...); //Add your query here.
var size = 0;
cursor.forEach(
    function(doc){
        size += Object.bsonsize(doc)
    }
);
print(size);

Should output the size in bytes of the documents pretty accurately.

I've ran the command twice. The first time, there were 141 215 documents which, once dumped, had a total of about 108 mb. The difference between the output of the command and the size on disk was of 787 bytes.

The second time I ran the command, there were 35 914 179 documents which, once dumped, had a total of about 57.8 gb. This time, I had the exact same size between the command and the real size on disk.

Cyprus answered 19/11, 2015 at 14:45 Comment(1)
works flawlessly! is it okay to use it in production?Holbein
H
3

Starting in Mongo 4.4, $bsonSize returns the size in bytes of a given document when encoded as BSON.

Thus, in order to sum the bson size of all documents matching your query:

// { d: [1, 2, 3, 4, 5] }
// { a: 1, b: "hello" }
// { c: 1000, a: "world" }
db.collection.aggregate([
  { $group: {
    _id: null,
    size: { $sum: { $bsonSize: "$$ROOT" } }
  }}
])
// { "_id" : null, "size" : 177 }

This $groups all matching items together and $sums grouped documents' $bsonSize.

$$ROOT represents the current document from which we get the bsonsize.

Hyperon answered 4/12, 2021 at 11:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.