I'm working in a ASP.NET Web API project (Full .NET Framework 4.6.1) and using Azure SQL Database, the API is deployed on an Azure AppService. Regarding the service tiers, we are using S2 in the case of the Azure SQL Database (50 DTU) and B1 in the case of the AppService where is deployed the API endpoint (1 Core and 1.75 GB of RAM). At this moment we are using 2 instances (2 VM with load balancer)
Our QA team is trying to find out the capacity of the platform in terms of performance. They have configured a performance test with JMeter which consist on launching 4000 requests during an interval of 60 seconds.
After the first executions of the performance tests the ratio of HTTP 500 errors was very high, after taking a look to the logs, we found a lot of exceptions like this:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
at System.Data.Common.ADP.ExceptionWithStackTrace(Exception e)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Data.Entity.SqlServer.DefaultSqlExecutionStrategy.<>c__DisplayClass4.<<ExecuteAsync>b__3>d__6.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Data.Entity.SqlServer.DefaultSqlExecutionStrategy.<ExecuteAsyncImplementation>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Data.Entity.Core.EntityClient.EntityConnection.<OpenAsync>d__8.MoveNext()
The first thing I thought was on a connection leak issue, we were reviewing the code, and monitoring the connections on SQL Server using the sp_who2 command, but the connections was being disposed as expected.
We are using an injection container that is creating a Entity Framework context (queries are async) each time a new request must be processed, the Entity Framework context is disposed automatically when the request ends (scoped dependencies).
The conclusion we reached was that we needed to increase the size of the connection pool to mitigate the timeouts in scenarios with huge traffic load.
Doing a quick search in the internet I found out that the default value of the Max Pool Size value is 100:
https://www.connectionstrings.com/all-sql-server-connection-string-keywords/
I decided to increase the value to 400:
Server=tcp:XXXX.database.windows.net,1433;Initial Catalog=XXXX;Persist Security Info=False;User ID=XXXX;Password=XXXXXXXXXXXX;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Max Pool Size=400;
After repeating the performance test our surprise was that we didn't notice any improvement since we were receiving the same ratio of HTTP 500 errors. We reached the conclusion that the Max Pool Size was being ignored.
The next thing we did, was to monitor the SQL Server during the performance test in order to find out how many sessions was opened from each host process, at this moment we are using the following SQL sentence for this purpose:
SELECT
COUNT(*) AS sessions,
host_name,
host_process_id,
program_name,
DB_NAME(database_id) AS database_name
FROM
sys.dm_exec_sessions AS s
WHERE
(is_user_process = 1) AND
(program_name = '.Net SqlClient Data Provider')
GROUP BY host_name, host_process_id, program_name, database_id
ORDER BY sessions DESC
After monitoring the opened sessions by each host process (the virtual machines where is deployed the API endpoint) we found out that only 128 database sessions was being created from each virtual machine.
At this point several options comes up to my mind that it could explain such weird behaviour:
- Bearing in mind that the connection pooling is a concept that belongs to the client side, the first thing that I thought was that some kind of parameter in the IIS Application Pool was being responsible of such behaviour.
- Another option it would be that only 128 sessions can be opened by each host process and database login. I didn't find anything in the internet that points to this .. but in other databases like Oracle this constraint can be configured in order to limit the amount of sessions opened by each login.
- The last option .. in some blogs and stackoverflow threads I have read that the exception that we are receiving (The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached) can be misleading and exists the possibility that other problem is causing the exception ..
The quick solution it would be to disable the pooling in the connection string, but this is the last thing that I would do ..
Another solution it would be to scale out the AppService in order to add more VM instances, but this is expensive in terms of money ..
Anyone knows if exists some kind of limitation in the Azure AppServices which explains why only 128 sessions are opened when the connection pooling is enabled?