I have a strange situation on a production server. Connection for asp.net get queued but the CPU is only at 40%. Also the database runs fine at 30% CPU.
Some more history as requested in the comments:
- In the peak hours the sites gets around 20,000 visitors an hour.
- The site is an asp.net webforms application with a lot of AJAX/POSTs
- The site uses a lot of User generated content
- We measure the performance of the site with a testpage which does hit the database and the webservices used by the site. This page get served within a second on normal load. Whe define the application as slow when the request takes more than 4 seconds.
- From the measurements we can see that the connectiontime is fast, but the processing time is large.
- We can't pinpoint the slowresponse the a single request, the site runs fine during normal hours but gets slow during peak hours
- We had a problem that the site was CPU bound (aka running at 100%), we fixed that
- We also had problems with exceptions maken the appdomain restart, we fixed that do
- During peak hours I take a look at the asp.net performance counters. We can see behaviour that we have 600 current connections with 500 queued connections.
- At peak times the CPU is around 40% (which makes me the think that it is not CPU bound)
- Physical memory is around 60% used
- At peak times the DatabaseServer CPU is around 30% (which makes me think it is not Database bound)
My conclusion is that something else is stopping the server from handling the requests faster. Possible suspects
- Deadlocks (!syncblk only gives one lock)
- Disk I/O (checked via sysinternals procesexplorer: 3.5 mB/s)
- Garbage collection (10~15% during peaks)
- Network I/O (connect time still low)
To find out what the proces is doing I created to minidumps.
I managed to create two MemoryDumps 20 seconds apart. This is the output of the first:
!threadpool
CPU utilization 6%
Worker Thread: Total: 95 Running: 72 Idle: 23 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1
--------------------------------------
Number of Timers: 64
and the output of the second:
!threadpool
CPU utilization 9%
Worker Thread: Total: 111 Running: 111 Idle: 0 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1589
As you can see there are a lot of Request in Queue.
Question 1: what does it mean that there are 1589 requests in queue. Does it mean something is blocking?
The !threadpool list contains mostly these entries: Unknown Function: 6a2aa293 Context: 01cd1558 AsyncTimerCallbackCompletion TimerInfo@023a2cb0
If I you into depth with the AsyncTimerCallbackCompletion
!dumpheap -type TimerCallback
Then I look at the objects in the TimerCallback and most of them are of types:
System.Web.SessionState.SessionStateModule
System.Web.Caching.CacheCommon
Question 2: Does it make any sense that those Objects hava a timer, and so much? Should I prevent this. And how?
Main Question do I miss any obvious problems why I'm queueing connections and not maxing out the CPU?
I succeeded in making a crashdump during a peak. Analyzing it with debugdiag gave me this warning:
Detected possible blocking or leaked critical section at webengine!g_AppDomainLock owned by thread 65 in Hang Dump.dmp
Impact of this lock
25.00% of threads blocked
(Threads 11 20 29 30 31 32 33 39 40 41 42 74 75 76 77 78 79 80 81 82 83)
The following functions are trying to enter this critical section
webengine!GetAppDomain+c9
The following module(s) are involved with this critical section
\\?\C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\webengine.dll from Microsoft Corporation
A quick google search doesn't give me any results. Does somebody has a clue?