TL;DR: A Server GC enabled application shows dozens and dozens of special GC Threads and hangs overtime. What can explain that?
I'm stuck these days on a weird multi-threading/contention issue that occurs on a .NET service. The symptoms are the following:
- The program hangs for a long period of time (seconds to minutes)
- The number of threads is abnormally high
- There is a peak of contention just when the program stops to respond (cf. following graph)
- The same program is deployed on different servers, and some instances doesn't have the problem at all (same hardware/OS/CLR)
I immediately suspected an issue in our code that would have caused the managed thread pool to start a huge number of threads over time, all trying to share one or several common resources. It appeared that we had a very small and very controlled usage of the ThreadPool.
I managed to get a dump file of a non-yet hanging service that already had a very high number of threads (more that 100, when it should be around 20 in a normal state)
Using windbg + sos, we established that the ThreadPool size was OK:
0:000> !threadpool
CPU utilization: 0%
Worker Thread: Total: 8 Running: 1 Idle: 7 MaxLimit: 32767 MinLimit: 32
Work Request in Queue: 0
--------------------------------------
Number of Timers: 1
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 64 CurrentLimit: 1 MaxLimit: 1000 MinLimit: 32
Only 8 worker threads... Then I listed all managed thread stacks and found a lot of them that I couldn't recognize. See below for one example:
0:000> !eestack
(...)
Thread 94
Current frame: ntdll!NtWaitForSingleObject+0xa
Child-SP RetAddr Caller, Callee
0000008e25b2f770 000007f8f5a210ea KERNELBASE!WaitForSingleObjectEx+0x92, calling ntdll!NtWaitForSingleObject
0000008e25b2f810 000007f8ece549bf clr!CLREventBase::WaitEx+0x16c, calling kernel32!WaitForSingleObjectEx
0000008e25b2f820 000007f8f5a2152c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent
0000008e25b2f850 000007f8ece54977 clr!CLREventBase::WaitEx+0x103, calling clr!CLREventBase::WaitEx+0x134
0000008e25b2f8b0 000007f8ece548f8 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventBase::WaitEx+0xe4
0000008e25b2f8e0 000007f8ed06526d clr!SVR::gc_heap::gc1+0x323, calling clr!SVR::GCStatistics::Enabled
0000008e25b2f940 000007f8ecfbe0b3 clr!SVR::gc_heap::bgc_thread_function+0x83, calling clr!CLREventBase::Wait
0000008e25b2f980 000007f8ecf3d5b6 clr!Thread::intermediateThreadProc+0x7d
0000008e25b2fd00 000007f8ecf3d59f clr!Thread::intermediateThreadProc+0x66, calling clr!_chkstk
0000008e25b2fd40 000007f8f8281832 kernel32!BaseThreadInitThunk+0x1a
0000008e25b2fd70 000007f8f8aad609 ntdll!RtlUserThreadStart+0x1d
(...)
Using !threads -special
command, I finally found that these threads were special GC threads:
0:000> !threads -special
ThreadCount: 81
UnstartedThread: 0
BackgroundThread: 49
PendingThread: 0
DeadThread: 21
Hosted Runtime: no
(...)
OSID Special thread type
1 804 DbgHelper
2 f48 GC
3 3f8 GC
4 1380 GC
5 af4 GC
6 1234 GC
7 fac GC
8 12e4 GC
9 17fc GC
10 644 GC
11 16e0 GC
12 6cc GC
13 9d4 GC
14 f7c GC
15 d5c GC
16 d74 GC
17 8d0 GC
18 1574 GC
19 8e0 GC
20 5bc GC
21 82c GC
22 e4c GC
23 129c GC
24 e28 GC
25 45c GC
26 340 GC
27 15c0 GC
28 16d4 GC
29 f4c GC
30 10e8 GC
31 1350 GC
32 164 GC
33 1620 GC
34 1444 Finalizer
35 c2c ProfilingAPIAttach
62 50 Timer
64 14a8 GC
65 145c GC
66 cdc GC
67 af8 GC
68 12e8 GC
69 1398 GC
70 e80 GC
71 a60 GC
72 834 GC
73 1b0 GC
74 2ac GC
75 eb8 GC
76 ec4 GC
77 ea8 GC
78 28 GC
79 11d0 GC
80 1700 GC
81 1434 GC
82 1510 GC
83 9c GC
84 c64 GC
85 11c0 GC
86 1714 GC
87 1360 GC
88 1610 GC
89 6c4 GC
90 cf0 GC
91 13d0 GC
92 1050 GC
93 1600 GC
94 16c4 GC
95 1558 GC
96 1b74 IOCompletion
97 ce4 ThreadpoolWorker
98 19a4 ThreadpoolWorker
99 1a00 ThreadpoolWorker
100 1b64 ThreadpoolWorker
101 1b38 ThreadpoolWorker
102 1844 ThreadpoolWorker
103 1b90 ThreadpoolWorker
104 1a10 ThreadpoolWorker
105 1894 Gate
More than 60 "GC" threads... So I checked the settings of my different service instances, and figured out that the problematic ones were configured with GC Server
, while the others were not.
Some more info:
- We use .NET 4.5
- We use Windows 2012 Server on all machines
- We run on bi octo-core servers (2 CPU, 16 physical cores, 32 logical cores)
What I'm trying to do now:
- I'm trying to get others dumps (when the program will have even more threads, when the program will hang, etc.)
- I'll try to disable the
GC Server
setting on the problematic instances, but the issue can take some time to occur.
So here are my questions:
- Is it normal for a GC server configured .NET program to have so many GC threads? I thought Server GC only had one GC thread per processor.
- Can this be related to the problem I'm seeing on these services, i.e. hundreds of threads over time, with huge process freeze due to contention?