Mongo connection stream closes unexpectedly in NodeJS application
Asked Answered
C

1

6

I have a NodeJS application (using the node-mongodb-native driver version 2.1) which does the following:

  1. Opens a connection to MongoDB.
  2. Queries a collection (batchSize is set to 1000). This query returns about 1,300,000 documents which I have verified myself.
  3. Since this is too many documents to fit in the bson response (about 16mb if I recall correctly), I stream my results using the stream() function on my cursor object.
  4. In batches of 1000 documents, I upload them to my Amazon CloudSearch index.

Everything works as expected - the documents are successfully uploaded to my AWS service and all is fine.

However, as soon as 85,000 documents have been streamed, the stream emits the end event. It consistently does this and no errors are thrown. Usually I'd chalk this up to something like a timeout being hit, but the fact that it happens every time 85,000 documents are uploaded and then immediately ends/closes the stream, it makes me think something is up.

My code is as follows:

var options = {
    socketTimeoutMS: 120000,
    connectTimeoutMS: 120000,
    keepAlive: 1
};
var url = "www.myMongoAddress.com";
mongo.connect(url, options, function(err, db) {
    var collection = db.collection('myCollection');
    var stream = collection.find({mySearch: 'criteria'}).batchSize(1000).stream();
    var batch = [];    
    var total = 0;

    stream.on('end', function() {
        console.log('stream ended!');
        console.log('processed ' + total + ' docs');
    });

    stream.on('data', function(doc) {
        doc = doc.map(function(d) {
            // Do some transforms on the data here.
        });
        batch.push(doc);

        if (batch.length == 1000 || !stream.hasNext()) {
            stream.pause();
            uploadDocsToAWS(function() {
                stream.resume();
                total += batch.length;
                batch = [];
            });
        }
    });
});

Assuming there are more than 85,000 documents that are returned by my query, the stream always ends at 85,000 documents and always takes about 5 minutes (using the Unix time utility, the average time is 5 minutes).

Is there anything I can try to help diagnose this problem?

I have removed some edits as they are no longer applicable

Edit1: I've updated the original code to show the connection settings I am now using too, and they don't seem to be helping (connection settings were sourced from: http://mongodb.github.io/node-mongodb-native/2.1/reference/connecting/connection-settings/)

Edit2: I've simplified the problem a fair bit. Basically, the number of documents I have to process does not matter - my stream will always end prematurely and I'm not sure why

Coypu answered 11/5, 2016 at 14:43 Comment(9)
Does changing the batch size up or down have any effect on the number of documents processed before end?Muddlehead
@Muddlehead Hah, yes! I just changed the batch size to 4000 and now it consistently stops at 92000 after about 5 minutes. I'm beginning to think this may be a timeout issue... Any ideas how to increase the timeout/remove it altogether?Coypu
I believe in the options of find you can do: {timeout:false} But I'll be honest, I'm not convinced it's a timeout. I've seen weird things with CPU usage around resume calls in the past. Unfortunately I never fully resolved those. The troubling thing is increasing the the batch 3x didn't increase docs scanned linearly. So timeout is definitely the highest suspect (for me).Muddlehead
@Muddlehead Yeah I'll give it a go and let you know. Which timeout option will I be looking to change? mongodb.github.io/node-mongodb-native/driver-articles/… is it any of the ones mentioned on there? There are quite a few.Coypu
I think it's just: find({}, {timeout : false}) which is supposed to be false by default but given other db configs I've begged to differ more than once now.Muddlehead
I'll give that a go.Coypu
@Muddlehead still having problems. See the original question for my updates.Coypu
Can you try: find({mySearch: 'criteria'}).addCursorFlag('noCursorTimeout', true)...Bronson
@Bronson Still no luck, unfortunately :( It only processes a portion of the documents before the stream ends.Coypu
P
0

Unless I am missing something your options for your mongoDB connection are

var options = { socketTimeoutMS: 120000, connectTimeoutMS: 120000, keepAlive: 1 };

Which is only 2 minuets. You say your process takes much longer than that? Have you tried increasing these values?

So you may have buffered that X amount of data in the stream during that time hence why you have a longer 'timeout' during the stream than expected.

Pesthouse answered 11/5, 2016 at 23:17 Comment(1)
Yeah I thought it was a timeout thing but it doesn't matter what those values are unfortunately :( when there are about 30,000 documents left to process the stream ends almost immediately after it has opened and it processes nothing. From about 40000-60000 it will process 4000 before closing the stream. It's very weird and I can't quite figure this out.Coypu

© 2022 - 2024 — McMap. All rights reserved.