Mongoose connection pooling creates connections to Mongodb every time a new Lambda is invoked
Asked Answered
C

2

12

We are using Mongoose, Nodejs, Serverless, and AWS Lambda. For making use of the same connection instead of opening and closing the connection each time whenever required, I have created a connection pool of size 10 (Which seems to be sufficient for our use-case right now).

But the thing is, when I see the Cloudwatch logs for Lambda, it's not the same connection that is being used.

Every time a new Lambda is called, a new connection is created, while the subsequent calls to that Lambda use the same connection that was opened in the first call.

Resulting in an increase in the number of connections open at a time. At MongoDB Atlas, I can see the number of open connections is way much.

Below is the code I am using for creating a connection if there is no cached connection available. In case it is available, the cached one will be used and a new connection will not be created.

let cached_db;
exports.createConnection = async () => {
  if(cached_db == null){
    return await mongoose.connect(
    connection_uri,         
    { 'useUnifiedTopology': true , 
      'useNewUrlParser':  true, 
      'useFindAndModify': false , 
      'useCreateIndex': true,
      'socketTimeoutMS': 60000,
      'connectTimeoutMS': 60000,
      'poolSize': 10
    }
  ).then(conn => {
      cached_db = conn;
      return conn;
  }).catch((err) => {
      console.error('Something went wrong', err);
      throw err;
    });
  } else {
    console.log("Cached db in use.");
    return cached_db;
  }
}

Can the same connection be used across Lambdas? Is there a way to do it?

Clod answered 10/3, 2021 at 10:10 Comment(1)
As you mention poolSize=10 so 10 connection to the Mongodb create when Lambda triggers. After lambda executed that connection will expire after certain ideal timeout that you set. so keep poolSize is low as single connection can also do the Job for you. you can't reuse same connection on next Lambda call.Lemaceon
B
10

You should define the client to the MongoDB server outside the AWS Lambda handler function. Don't define a new MongoClient object each time you invoke your function. Doing so causes the driver to create a new database connection with each function call. This can be expensive and can result in your application exceeding database connection limits.

As an alternative, do the following:

  1. Create the MongoClient object once.
  2. Store the object so your function can reuse the MongoClient across function invocations.

Step 1

Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:

mongo-client.js:

const { MongoClient } = require('mongodb');

// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);

module.exports = client.connect();

Step 2

Import the new module and use it in function handlers to connect to database.

some-file.js:

const clientPromise = require('./mongodb-client');

// Handler
module.exports.handler = async function(event, context) {
  // Get the MongoClient by calling await on the connection promise. Because
  // this is a promise, it will only resolve once.
  const client = await clientPromise;
  
  // Use the connection to return the name of the connected database for example.
  return client.db().databaseName;
}

Pool Size

Connection pool size is a cache of database connections maintained so these connections can be reused when future requests to the database are required. Connection pools are used to enhance the performance of executing commands on a database.

Note: maxPoolSize and poolSize are the same, except they relate to whether you are using the useUnifiedTopology: true setting. If you are using useUnifiedTopology: true, maxPoolSize is the spec-compliant setting to manage how large connection pools can be. But if you are using useUnifiedTopology: false (or omits it), poolSize is the same thing but from before we had the unified topology.

Note: Each connection consumes about 1MB of RAM.

Value of the Pool Size

The connection pool is on a per-mongod/mongos basis, so when connecting to a 3-member replica there will be three connection pools (one per mongod), each with a maxPoolSize. Additionally, there is a required monitoring connection for each node as well, so you end up with (maxPoolSize+1)*number_of_nodes TCP connections.

In my opinion, if you don't care about CPU and RAM, you should use all available connections (why not if we already have them, right?).

For example: You have Atlas free cluster with 3 replica sets, that supports maximum number of 500 connections, and you have only one application that connects to it, give all connections to that one application. In order to set the value of poolSize, you can use above calculation of connections:

poolSize = (maximum_connections/number_of_nodes) - 1

poolSize = (500/3) - 1

poolSize = 165

If you would have 2 applications that will connect to that same cluster, give each application half of connections.

If you have limited RAM memory, check how much you can spear and calculate poolSize based on that (as I said in the note, you can assume that one connection will consume about 1MB of RAM).

Resources

For more info, check this official MongoDB Docs.

For connection pool, check this and this.

Believe answered 13/12, 2021 at 6:45 Comment(6)
Quick follow-up question: What would be an optimal value for the poolSize?Steamy
poolSize is number of connection that Node.js will hold as ideal so it doesn't need to connect when http call. PoolSize value is set based on user traffic on website. if you're using same database for different project then keep it value around 10.Lemaceon
I updated the answer with my opinion about the value of poolSize. @SteamyBelieve
@NenadMilosavljevic Your answer and the answer given below are contradictory. The solutions I have got in my search for the right one were also involving both of these answers. So, which is the correct one? One can share connections across lambdas or it is mandatory to close and open connection each time? and can this be achieved using poolSize and a handler function like in your answer?Clod
I linked MongoDB official best practice in the Resources part (first link). That is what they suggest - define the client to the MongoDB server outside the AWS Lambda handler function.Believe
Can I just const client = await client.connect() and export default client inside mongo-client.js? It would run only during the lambda invocation right? That way I could directly use the client without awaiting. Is this a bad approach?Bevash
L
-2

I found from this blog that Lambda may use same connection if restore the same snapshot and creates new connection if new snapshot generation.

So Lambda can't give assurance that to use same connection if we use outside the handle function.

So in my opinion best approach to optimise number of connection to Mongodb is to close connection before lambda complete so your other service can use free connection.

Use below method to close connection after database interaction finishes.

createConnection.close()
Lemaceon answered 20/12, 2021 at 6:23 Comment(8)
Currently, we are using this approach only. But is it correct way to open and close connection each time?Clod
Yes, it's correct way as Mongoose default approach to hold ideal connection is better for Monolithic architecture. But for Microservice architecture always need to close connection.Lemaceon
Your answer and the answer given above are contradictory. The solutions I have got in my search for the right one were also involving both of these answers. So, which is the correct one? One can share connections across lambdas or it is mandatory to close and open connection each time?Clod
As you mentioned Mongoose library and above answer is discussing for Mongodb module. which are different. So mongoose connection close is better way as i mentioned.Lemaceon
@Lemaceon Either way, not sharing the connection between lambda invocations seems like an anti-pattern. You could have just as easily put the createConnection outside the handler.Bevash
@Bevash Yes, mentioning createConnection outside of handle may work if lambda restore previous snapshot but in some case lambda may create new snapshot. So in Lambda we can't always assure that same connection will be used.Lemaceon
@Lemaceon If lambda creates new snapshot, doesn't it run the code outside the handler first? Thereby making sure that a new connection is created?Bevash
@sayandcode, on new snapshot creation lambda run outside code, it creates database connection if you put outside handle but thing is connection may expire after sometime so only better way is to use connection inside the handle.Lemaceon

© 2022 - 2024 — McMap. All rights reserved.