How to Speed Up Mongodump, Dump Not Finishing

mongodump -h myHost -d myDatabase -c mycollection --query "{'cr' : {\$gte: new Date(1388534400000)}, \$or: [ { 'tln': { \$lte: 0., \$gte: -100.}, 'tlt': { \$lte: 100, \$gte: 0} }, { 'pln': { \$lte: 0., \$gte: -100.}, 'plt': { \$lte: 100, \$gte: 0} } ] }"

[timestamp] Collection File Writing Progress: 10214400/5066505869 0% (objects) [timestamp] Collection File Writing Progress: 10225100/5066505869 0% (objects) [timestamp] 10228391 objects [timestamp] Metadata for database.collection to dump/database/collection.metadata.json

I've just faced this issue, and the problem is that mongodump is basically not very smart. It's traversing the _id index, which likely means lots and lots and lots of random disk access. For me, dumping several collections, mongodump was simply crashing due to cursor timeouts.

The issue is also described here: https://jira.mongodb.org/browse/TOOLS-845. However, that doesn't really provide a great resolution part from "Works as Designed". It's possible there's something funny about the index, but I think in my case it was just a large enough collection that the amount of disk access was seriously hard work for my poor little Mac Mini.

One solution? Shut down writing and then use --forceTableScan, which makes a sequential pass through the data, which might well be faster than using the _id index if you are using a custom _id field (I was).

The docs are a bit sketchy, but it reads as if the normal mongodump behaviour might be to traverse the _id index using a snapshot and then filter by the query. In other words, it might be traversing all 5 billion records in _id order, not stored data order, i.e., randomly, to complete the query. So you might be better building a tool that reads from a real index and writes directly.

For me, --forceTableScan was enough, and it meant (a) it actually completes successfully, and (b) it's an order of magnitude or more faster.

Recommended topics

Hot tags