JavaScript leaking memory (Node.js/Restify/MongoDB)
Asked Answered
D

4

7

Update 4: By instantiating the restify client (see controllers/messages.js) outside of the function and calling global.gc() after every request it seems the memory growth rate has been reduced a lot (~500KB per 10secs). Yet, the memory usage is still constantly growing.

Update3: Came across this post: https://journal.paul.querna.org/articles/2011/04/05/openssl-memory-use/

It might be worth noting that I'm using HTTPS with Restify.

Update 2: Updated the code below to the current state. I've tried swapping out Restify with Express. Sadly this didn't make any difference. It seems that the api call at the end of the chain (restify -> mongodb -> external api) causes everything to retain to memory.

Update 1: I have replaced Mongoose with the standard MongoDB driver. Memory usage seems to grow less fast, yet the leak remains..


I've been working on trying to locate this leak for a couple of days now.

I'm running an API using Restify and Mongoose and for every API call I do at least one MongoDB lookup. I've got about 1-2k users that hit the API multiple times in a day.

What I have tried

  • I've isolated my code to just using Restify and used ApacheBench to fire a huge amount of requests (100k+). The memory usage stays around 60MB during the test.
  • I've isolated my code to just using Restify and Mongoose and tested it the same way as above. Memory usage stays around 80MB.
  • I've tested the full production code locally using ApacheBench. Memory usage stays around 80MB.
  • I've automatically dumped the heap on intervals. The biggest heap dump I had was 400MB. All I can see that there are tons of Strings and Arrays but I cannot clearly see a pattern in it.

So, what could be wrong?

I've done the above tests using just one API user. This means that Mongoose only grabs the same document over and over. The difference with production is that a lot of different users hit the API meaning mongoose gets a lot of different documents.

When I start the nodejs server the memory quickly grows to 100MB-200MB. It eventually stabilizes around 500MB. Could this mean that it leaks memory for every user? Once every user has visited the API it will stabilize?

I've included my code below which outlines the general structure of my API. I would love to know if there's a critical mistake in my code or any other approach to finding out what is causing the high memory usage.

Code

app.js

var restify = require('restify');
var MongoClient = require('mongodb').MongoClient;

// ... setup restify server and mongodb

require('./api/message')(server, db);

api/message.js

module.exports = function(server, db) {

    // Controllers used for retrieving accounts via MongoDB and communication with an external api
    var accountController = require('../controllers/accounts')(db);        
    var messageController = require('../controllers/messages')();

    // Restify bind to put
    server.put('/api/message', function(req, res, next) {
        // Token from body
        var token = req.body.token;

        // Get account by token
        accountController.getAccount(token, function(error, account) {

            // Send a message using external API
            messageController.sendMessage(token, account.email, function() {
                res.send(201, {});
                return next();
            });
        });
    });
};

controllers/accounts.js

module.exports = function(db) {

    // Gets account by a token
    function getAccount(token, callback) {
        var ObjectID = require('mongodb').ObjectID;

        var collection = db.collection('accounts');

        collection.findOne({
            token: token
        }, function(error, account) {

            if (error) {
                return callback(error);
            }

            if (account) {
                return callback('', account);
            }

            return callback('Account not found');
        });
    }
};

controllers/messages.js

module.exports = function() {

    function sendMessage(token, email, callback) {

        // Get a token used for external API
        getAccessToken(function() {}

            // ... Setup client

            // Do POST
            client.post('/external_api', values, function(err, req, res, obj) {
                return callback();
            });

        });
    }

    return {
        sendMessage: sendMessage
    };
};

Heap snapshot of suspected leak enter image description here

Dissatisfy answered 22/3, 2014 at 10:29 Comment(7)
in api/message.js is there a reason to call return next()? Not sure if this would help, but it seems a little redundant as you have already written the response.Desolate
I'd recommend using a profiler to find out where the leak is coming from instead of manually searching by changing code. Take a look at: github.com/felixge/node-memory-leak-tutorial and / or github.com/node-inspector/node-inspectorMalady
@Vasil I'm calling next in order to call an event in Restify to do work after the request is handled. Don't worry, I've tried disabling the event and not calling next(). Doesn't seem to make a difference.Dissatisfy
@TimothyStrimple Thanks for the info. I've gathered some heapsnapshots prior to posting. I have trouble interpreting the snapshots though. I've included a recent snapshot at the bottom of the question. I suspect the leak to be located there.Dissatisfy
Also are you by any chance using the 'vm' module as there is a know memory leak in it: github.com/joyent/node/issues/6552Desolate
@Vasil I don't use the vm module and it looks like that restify and mongodb do neither. The leak looks surprisingly similar though.Dissatisfy
I don't know the libraries you're using, so can't tell how much the style of the code is dictated by their APIs. But the style of getAccount is asking for a leak. The callback that provides the account (and also the one for error) looks like it could easily lead to an indefinitely long chain of activations, where - in this case - each will reference the entire account collection from the DB call. I'd look for a way to provide the account as a true return value to let the stack unwind.Willow
S
2

Might be a bug in getters, I got it when using virtuals or getters for mongoose schema https://github.com/LearnBoost/mongoose/issues/1565

Sleepwalk answered 24/3, 2014 at 8:48 Comment(1)
This is highly frustrating. I'm close to having rewritten most of my code with no result at all. I have played around with manually calling the GC. This seemed to have helped in the sense that the memory usage does not grow as fast yet it still continues to grow. Any idea how I can pinpoint the objects being held? If I look at the dump all I see is seemingly random objects from Restify/MongoDB and all kinds of Arrays and Strings. This makes me believe that every call just retains the whole stack every time.Dissatisfy
V
2

It's actually normal to only see string and arrays, as most programs are largely based on them. The profiler that allow sorting by total object count are therefore not of much use as they many times give the same results for many different programs.

A better way to use the memory profiling of chrome is to take one snapshot for example after one user calls an API, and then a second heap snapshot after a second user called the API.

The profiler gives the possibility to compare two snapshots and see what is the difference between one and the other (see this tutorial), this will help understand why the memory grew in an unexpected way.

Objects are retained in memory because there is still a reference to them that prevents the object from being garbage collected.

So another way to try to use the profiler to find memory leaks is to look for an object that you believe should not be there and see what is it's retaining paths, and see if there are any unexpected paths.

Visigoth answered 24/3, 2014 at 21:23 Comment(1)
Thanks for your comment. I've checked out the video and I think I have located the location of the leaking objects. I've attached a screenshot to the question. It looks like its a whole chain of various objects being retained. I see references to Mongo and Restify. Almost all of them are "context in function *()" where the * is a function name related to the Mongo/Restify library. Are these related to the callbacks I'm doing? I really have no idea what I'm doing wrong.Dissatisfy
N
1

Not sure whether this helps, but could you try to remove unnecessary returns?

api/message.js

        // Send a message using external API
        messageController.sendMessage(token, account.email, function() {
            res.send(201, {});
            next(); // remove 'return'
        });

controllers/accounts.js

module.exports = function(db) {

    // Gets account by a token
    function getAccount(token, callback) {
        var ObjectID = require('mongodb').ObjectID;

        var collection = db.collection('accounts');

        collection.findOne({
            token: token
        }, function(error, account) {

            if (error) {
                callback(error); // remove 'return'
            } else if (account) {
                callback('', account); // remove 'return'
            } else {
                callback('Account not found'); // remove 'return'
            }
        });
    }

    return { // I guess you missed to copy this onto the question.
        getAccount: getAccount
    };
};

controllers/messages.js

            // Do POST
            client.post('/external_api', values, function(err, req, res, obj) {
                callback(); // remove 'return'
            });
Nauru answered 31/3, 2014 at 0:8 Comment(0)
O
0

Your issue is in the getAccount mixed with how GC work's.

When you chain lots of function the GC only clears one at a time and the older something is on memory the less chances it has of being collected so on your get account you need at least that I can count 6 calls to global.gc() or auto executes before it can be collected by this time the GC assumes its something that it probably wont collect so it doesn't check it anyway.

collection{
   findOne{
      function(error, account){
         callback('', account)
            sendMessage(...)
               getAccessToken(){
                  Post
               }
            }
         }
      }
   }
}

as suggested by Gene remove this chaining.

PS: This is just a representation of how the GC works and depends on Implementation but you get the point.

Ona answered 31/3, 2014 at 10:50 Comment(1)
I have similar issue with my node.js web server - could you tell how exactly getAccount(token, callback) function should be written because I don't understand - if there are async calls (like to DB) there must be number of nested function calls...Ecumenical

© 2022 - 2024 — McMap. All rights reserved.