MongoDB connection problems on Azure
Asked Answered
S

2

13

We have an ASP.NET MVC application deployed to an Azure Website that connects to MongoDB and does both read and write operations. The application does this iteratively. A few thousand times per minute.

We initialize the C# driver using Autofac and we set the MaxConnectionIdleTime to 45 seconds as suggested in https://groups.google.com/forum/#!topic/mongodb-user/_Z8YepNHnbI and a few other places.

We are still getting a large number of the below error:

Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. Method Message:":{"ClassName":"System.IO.IOException","Message":"Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

We get this error while connecting to both a MongoDB instance deployed on a VM in the same datacenter/region on Azure and also while connecting to an external PaaS MongoDB provider.

I run the same code in my local computer and connect to the same DB and I don't receive these errors. It's only when I deploy the code to an Azure Website. Any suggestions?

Stanfield answered 24/2, 2015 at 12:31 Comment(12)
Looks like your server is out of connections?Smacker
I love bitcoin... but is it even allowed to put a BTC bounty on a question?Malloy
You open a new connection to the MongoDB server a few thousand times a minute? Am I reading that right?Melson
Could you post the entire stack trace of the exception?Kerikeriann
Sorry, but bitcoin bounties are not allowed. See here and here.Blandish
@MattJohnson No problem. I thought it would be deleted. But your references are about making this a feature on SO not about SO rules. Declined feature request doesn't mean it is not allow to do this manually. Does it? I could not see it on the Ts&Cs of the website but I didn't look too deep either. Mind providing references?Stanfield
Sorry, I linked to the same post twice. :) Here is the other referenceBlandish
A full stack trace and the inner exceptions are required to find a solution for this problem. How is the C# driver initialized? Are you sure there's only one MongoClient per application instance, or is there one per request? Or even worse, per method?Kerikeriann
@AntonAnsgar Have you debugged it? thingsondotnet.blogspot.in/2014/09/… and come to think of it, what does the MongoDB log say ?Bk
@AntonAnsgar What is the Azure idle timeout? see azure.microsoft.com/blog/2014/08/14/…Bk
A 200 rep bounty, but insufficient information and no responses to questions... great...Kerikeriann
I wonder if other users running into this are using the new 2.0 driver. Since shifting to the newer driver I know that we are running into this much more and seeing a lot more error logs pertaining to connection errors.Binucleate
C
4

A few thousand requests per minute is a big load, and the only way to do it right, is by controlling and limiting the maximum number of threads which could be running at any one time.

As there's not much information posted as to how you've implemented this. I'm going to cover a few possible circumstances.


Time to experiment...

The constants:

  • Items to process:
    • 50 per second, or in other words...
    • 3,000 per minute, and one more way to look at it...
    • 180,000 per hour

The variables:

  • Data transfer rates:

    • How much data you can transfer per second is going to play a role no matter what we do, and this will vary through out the day depending on the time of day.

      The only thing we can do is fire off more requests from different cpu's to distribute the weight of traffic we're sending back n forth.

  • Processing power:

    • I'm assuming you have this in a WebJob as opposed to having this coded inside the MVC site it's self. It's highly inefficient and not fit for the purpose that you're trying to achieve. By using a WebJob we can queue work items to be processed by other WebJobs. The queue in question is the Azure Queue Storage.

      Azure Queue storage is a service for storing large numbers of messages that can be accessed from anywhere in the world via authenticated calls using HTTP or HTTPS. A single queue message can be up to 64 KB in size, and a queue can contain millions of messages, up to the total capacity limit of a storage account. A storage account can contain up to 200 TB of blob, queue, and table data. See Azure Storage Scalability and Performance Targets for details about storage account capacity.

      Common uses of Queue storage include:

      • Creating a backlog of work to process asynchronously
      • Passing messages from an Azure Web role to an Azure Worker role

The issues:

  • We're attempting to complete 50 transactions per second, so each transaction should be done in under 1 second if we were utilising 50 threads. Our 45 second time out serves no purpose at this point.
  • We're expecting 50 threads to run concurrently, and all complete in under a second, every second, on a single cpu. (I'm exaggerating a point here, just to make a point... but imagine downloading 50 text files every single second. Processing it, then trying to shoot it back over to a colleague in the hopes they'll even be ready to catch it)
  • We need to have a retry logic in place, if after 3 attempts the item isn't processed, they need to be placed back in to the queue. Ideally we should be providing more time to the server to respond than just one second with each failure, lets say that we gave it a 2 second break on first failure, then 4 seconds, then 10, this will greatly increase the odds of us persisting / retrieving the data that we needed.
  • We're assuming that our MongoDb can handle this number of requests per second. If you haven't already, start looking at ways to scale it out, the issue isn't in the fact that it's a MongoDb, the data layer could have been anything, it's the fact that we're making this number of requests from a single source that is going to be the most likely cause of your issues.

The solution:

  1. Set up a WebJob and name it EnqueueJob. This WebJob will have one sole purpose, to queue items of work to be process in the Queue Storage.
  2. Create a Queue Storage Container named WorkItemQueue, this queue will act as a trigger to the next step and kick off our scaling out operations.
  3. Create another WebJob named DequeueJob. This WebJob will also have one sole purpose, to dequeue the work items from the WorkItemQueue and fire out the requests to your data store.
  4. Configure the DequeueJob to spin up once an item has been placed inside the WorkItemQueue, start 5 separate threads on each and while the queue is not empty, dequeue work items for each thread and attempt to execute the dequeued job.
    1. Attempt 1, if fail, wait & retry.
    2. Attempt 2, if fail, wait & retry.
    3. Attempt 3, if fail, enqueue item back to WorkItemQueue
  5. Configure your website to autoscale out to x amount of cpu's (note that your website and web jobs share the same resources)

Here's a short 10 minute video that gives an overview on how to utilise queue storages and web jobs.


Edit:

Another reason you may be getting those errors could be because of two other factors as well, again caused by it being in an MVC app...

If you're compiling the application with the DEBUG attribute applied but pushing the RELEASE version instead, you could be running into issues due to the settings in your web.config, without the DEBUG attribute, an ASP.NET web application will run a request for a maximum of 90 seconds, if the request takes longer than this, it will dispose of the request.

To increase the timeout to longer than 90 seconds you will need to change the [httpRuntime][3] property in your web.config...

<!-- Increase timeout to five minutes -->
<httpRuntime executionTimeout="300" />

The other thing that you need to be aware of is the request timeout settings of your browser > web app, I'd say that if you insist on keeping the code in MVC as opposed to extracting it and putting it into a WebJob, then you can use the following code to fire a request off to your web app and offset the timeout of the request.

string html = string.Empty;
string uri = "http://google.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Timeout = TimeSpan.FromMinutes(5);

using (HttpWebResponse response = (HttpWebResonse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
    html = reader.ReadToEnd();
}
Corody answered 3/3, 2015 at 2:51 Comment(2)
The issue is not about throttling. I run the same code in my local computer and connect to the same DB and I don't receive these errors. It's only when I deploy the code to an Azure Website.Stanfield
On your local machine, how are you executing the queries? Is it in a console app or in a ASP.NET MVC siteCorody
H
0

Are you using mongoDB in a VM? It seems to be a network problem. This kind of transient faults should occur, so the best you can do is implement a retry pattern or use a lib such as Polly to do that:

Policy
    .Handle<IOException>()
    .Retry(3, (exception, retryCount) =>
    {
        // do something 
    });

https://github.com/michael-wolfenden/Polly

Houseleek answered 24/2, 2015 at 13:27 Comment(6)
We're getting these errors while connecting to both a mongodb instance on a VM and also while connecting to a PaaS mongo provider in another datacenterStanfield
How would you use mongodb 2.0 with Polly in an MVC app? Would you have to wrap it around every call or does it inject something to handle all socket connection errors.Binucleate
@runxc1BretFerrier he already uses mongo and he is receiveing those errors. Polly will only wrap the call to mongodb client.Houseleek
Ahh I see, I use similar code but a library that woudn't have to be wrapped around every call would be very handyBinucleate
@runxc1BretFerrier then you just need to use a try /catch and a for. It's a retry pattern: msdn.microsoft.com/en-us/library/dn589788.aspxHouseleek
@runxc1BretFerrier you could base class your data source implementation and add a protecte Save(Action<T>) method that wraps the retry for you. Then your implementation/method would just build the request inside of that action as it is calling Save(...)Joy

© 2022 - 2024 — McMap. All rights reserved.