I have a NodeJS application (using the node-mongodb-native driver version 2.1) which does the following:
- Opens a connection to MongoDB.
- Queries a collection (batchSize is set to 1000). This query returns about 1,300,000 documents which I have verified myself.
- Since this is too many documents to fit in the bson response (about 16mb if I recall correctly), I stream my results using the
stream()
function on my cursor object. - In batches of 1000 documents, I upload them to my Amazon CloudSearch index.
Everything works as expected - the documents are successfully uploaded to my AWS service and all is fine.
However, as soon as 85,000 documents have been streamed, the stream emits the end
event. It consistently does this and no errors are thrown. Usually I'd chalk this up to something like a timeout being hit, but the fact that it happens every time 85,000 documents are uploaded and then immediately ends/closes the stream, it makes me think something is up.
My code is as follows:
var options = {
socketTimeoutMS: 120000,
connectTimeoutMS: 120000,
keepAlive: 1
};
var url = "www.myMongoAddress.com";
mongo.connect(url, options, function(err, db) {
var collection = db.collection('myCollection');
var stream = collection.find({mySearch: 'criteria'}).batchSize(1000).stream();
var batch = [];
var total = 0;
stream.on('end', function() {
console.log('stream ended!');
console.log('processed ' + total + ' docs');
});
stream.on('data', function(doc) {
doc = doc.map(function(d) {
// Do some transforms on the data here.
});
batch.push(doc);
if (batch.length == 1000 || !stream.hasNext()) {
stream.pause();
uploadDocsToAWS(function() {
stream.resume();
total += batch.length;
batch = [];
});
}
});
});
Assuming there are more than 85,000 documents that are returned by my query, the stream always ends at 85,000 documents and always takes about 5 minutes (using the Unix time
utility, the average time is 5 minutes).
Is there anything I can try to help diagnose this problem?
I have removed some edits as they are no longer applicable
Edit1: I've updated the original code to show the connection settings I am now using too, and they don't seem to be helping (connection settings were sourced from: http://mongodb.github.io/node-mongodb-native/2.1/reference/connecting/connection-settings/)
Edit2: I've simplified the problem a fair bit. Basically, the number of documents I have to process does not matter - my stream will always end prematurely and I'm not sure why
find({mySearch: 'criteria'}).addCursorFlag('noCursorTimeout', true)...
– Bronson