SqlClient connection pool maxed out when using async
Asked Answered
P

1

6

I have a busy ASP.NET 5 Core app (thousands of requests per second) that uses SQL Server. Recently we decided to try to switch some hot code paths to async database access and... the app didn't even start. I get this error:

The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

And I see the number of threads in the thread pool growing to 40... 50... 100...

The code pattern we use is fairly simple:

using (var cn = new SqlConnection(connenctionStrng))
{
    cn.Open();
    var data = await cn.QueryAsync("SELECT x FROM Stuff WHERE id=@id"); //QueryAsync is from Dapper
}

I made a process dump and all threads are stuck on the cn.Open() line, just sitting there and waiting.

This mostly happens during application "recycles" on IIS, when the app process is restarted and HTTP requests are queued from one process to another. Resulting in tens of thousands requests in the queue, that need to be processed.

Well, yeah,, I get it. I think I know what's happening. async makes the app scale more. And while the database is busy responding to my query, the control is returned to other threads. Which try to open more, and more, and more connections in parallel. The connection pool maxes out. But why the closed connections are not returned to the pool immediately after the work is finished?

Switching from async to "traditional" code fixes the problem immediately.

What are my options?

  • Increasing max pool size from the default 100? Tried 200, didn't help. Should I try, like, 10000?
  • Using OpenAsync instead of Open? Didn't help.
  • I thought I'm running into this problem https://github.com/dotnet/SqlClient/issues/18 but nope, I'm on a newer version of SqlClient and it's said to be fixed. Supposedly.
  • Not use async with database access at all? Huh...
  • Do we really have to come up with our own throttling mechanisms when using async like this answer suggests? I'm surprised there's no built-in workaround...

P.S. Taking a closer look at the process dump - I checked the Tasks report and discovered literally tens of thousands blocked tasks in the waiting state. And there's exactly 200 db-querying tasks (which is the size of connection pool) waiting for queries to finish.

Porta answered 15/9, 2021 at 8:0 Comment(10)
Something that strikes me immediately is that you're not awaiting the QueryAsync call.Coccyx
@Coccyx I actually do, just forgot to copy-paste properly. Fixed.Porta
async doesn't flood the connection pool. High traffic combined with slow queries do. What do the actual queries do? Loading an entire table into memory is slow. Using long transactions causes blocking and thus increases the need for connections. Careless UPDATEs will block other writers and readersSeurat
Resulting in tens of thousands requests in the queue, that need to be processed. that means there's a need for tens of thousands of concurrent connections. That's not a SqlClient problem. That's the request wave problem that causes trouble in all architectures, languages and platforms. Throttling can only get so far. BTW throttling is available at several levels, from the load balancer to IIS (at least). The best way to handle such spikes though is to use queueing instead of direct HTTP calls, changing the number of queue workers to handle spikes as neededSeurat
Do you use TransactionScope? This would keep any connections opened under it open until the TransactionScope itself is disposed. That's the only way changes in those connections could be committed or rolled backSeurat
The spike is caused by IIS, when the application is recycled, the requests that were supposed to run on the "old" process are queued for the "new" process, and pile up until the "new" process is fully started (takes 4-5 seconds, but enough to pile a huge queue)Porta
@PanagiotisKanavos the actual queries are highly optimized anr fairly simple - get user by "ID", or "get recent messages" etc. We use partial indexes and keep the DB response times at a very low level.Porta
None of these things have anything to do with SqlClient and async. They are common problems in all platform. The same problem occurred long before async was introduced. If the app pool recycles, requests get queued. If the application can't handle all of them, it will get flooded, maybe start hitting 100% CPU which .... leads to another recycle. Blocking causes more CPU usage, which can lead to more recycles.Seurat
You can limit the number of concurrent requests through IIS's web.config or the IIS manager. The same can be done at the server level. The appConcurrentRequestLimit attribute specifies the maximum number of concurrent requests that can be queued for an application.Seurat
problem occurred long before async was introduced. If the app pool recycles, requests get queued. If the application can't handle all of them, it will get flooded not if I use sync code - it works just fine.Porta
P
1

Well, after a bit of digging, investigating source codes and tons of reading, it appears that async is not always a good idea for DB calls.

As Stephen Cleary (the god of async who wrote many books about it) has nailed it - and it really clicked with me:

If your backend is a single SQL server database, and every single request hits that database, then there isn't a benefit from making your web service asynchronous.

So, yes, async helps you free up some threads, but the first thing these threads do is rush back to the database.

Also this:

The old-style common scenario was client <-> API <-> DB, and in that architecture there's no need for asynchronous DB access

However if your database is a cluster or a cloud or some other "autoscaling" thing - than yes, async database access makes a lot of sense

Here's also an old archive.org article by RIck Anderson that I found useful: https://web.archive.org/web/20140212064150/http://blogs.msdn.com/b/rickandy/archive/2009/11/14/should-my-database-calls-be-asynchronous.aspx

Porta answered 15/9, 2021 at 14:35 Comment(6)
Actually, no. Stephen Cleary changed that advice. The benefits remain the same. You avoid blocking the request threads, allowing ASP.NET Core to use far fewer threads to serve the same traffic. That translates to far less CPU usage, far smaller cloud/power/infrastructure billsSeurat
is a cluster or cloud or some other "autoscaling" thing not at all. These things don't affect how or whether request threads are blocked. It's overall database delays that do. If the data access code suffers from excessive blocking, more requests will result in greater blocking, in turn using more connections until the connection pool is exhausted.Seurat
In fact, by allowing more requests to be handled by the same server, async may be causing exhaustion indirectly, not because of a bug but because too many slow queries are executed by the same serverSeurat
@PanagiotisKanavos answering your 1st comment - yes, exactly, you're freeing up threads, and the first thing these threads do is rush back to the databasePorta
You can limit the number of concurrent requests and threads using IIS settings. That's not SqlClient's job. If the default 5000 is too much, you can reduce it to what the application can handle. That's just a quick-fix though. If you use load balancing you can warm up a new app pool before bringing it online. If you recycle during deployment, you can actually deploy to a different virtual directory, warm it up, then switch between them. Octopus Deploy does just that.Seurat
In this case, async will help reduce CPU usage but won't make the database run any faster. If you still want that reduced CPU usage you'll have to configure throttling in front of the application. Or you should look at why the database can't keep up. Could this be solved using caching? Snapshot isolation to reduce blocking? Better queries? Perhaps the disks are too slow, or not enough RAM?Seurat

© 2022 - 2024 — McMap. All rights reserved.