Very high GC thread count in a ServerGC app
Asked Answered
B

2

8

TL;DR: A Server GC enabled application shows dozens and dozens of special GC Threads and hangs overtime. What can explain that?


I'm stuck these days on a weird multi-threading/contention issue that occurs on a .NET service. The symptoms are the following:

  • The program hangs for a long period of time (seconds to minutes)
  • The number of threads is abnormally high
  • There is a peak of contention just when the program stops to respond (cf. following graph)
  • The same program is deployed on different servers, and some instances doesn't have the problem at all (same hardware/OS/CLR)

Peak of contention

I immediately suspected an issue in our code that would have caused the managed thread pool to start a huge number of threads over time, all trying to share one or several common resources. It appeared that we had a very small and very controlled usage of the ThreadPool.

I managed to get a dump file of a non-yet hanging service that already had a very high number of threads (more that 100, when it should be around 20 in a normal state)

Using windbg + sos, we established that the ThreadPool size was OK:

0:000> !threadpool
CPU utilization: 0%
Worker Thread: Total: 8 Running: 1 Idle: 7 MaxLimit: 32767 MinLimit: 32
Work Request in Queue: 0
--------------------------------------
Number of Timers: 1
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 64 CurrentLimit: 1 MaxLimit: 1000 MinLimit: 32

Only 8 worker threads... Then I listed all managed thread stacks and found a lot of them that I couldn't recognize. See below for one example:

0:000> !eestack
(...)
Thread  94
Current frame: ntdll!NtWaitForSingleObject+0xa
Child-SP         RetAddr          Caller, Callee
0000008e25b2f770 000007f8f5a210ea KERNELBASE!WaitForSingleObjectEx+0x92, calling ntdll!NtWaitForSingleObject
0000008e25b2f810 000007f8ece549bf clr!CLREventBase::WaitEx+0x16c, calling kernel32!WaitForSingleObjectEx
0000008e25b2f820 000007f8f5a2152c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent
0000008e25b2f850 000007f8ece54977 clr!CLREventBase::WaitEx+0x103, calling clr!CLREventBase::WaitEx+0x134
0000008e25b2f8b0 000007f8ece548f8 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventBase::WaitEx+0xe4
0000008e25b2f8e0 000007f8ed06526d clr!SVR::gc_heap::gc1+0x323, calling clr!SVR::GCStatistics::Enabled
0000008e25b2f940 000007f8ecfbe0b3 clr!SVR::gc_heap::bgc_thread_function+0x83, calling clr!CLREventBase::Wait
0000008e25b2f980 000007f8ecf3d5b6 clr!Thread::intermediateThreadProc+0x7d
0000008e25b2fd00 000007f8ecf3d59f clr!Thread::intermediateThreadProc+0x66, calling clr!_chkstk
0000008e25b2fd40 000007f8f8281832 kernel32!BaseThreadInitThunk+0x1a
0000008e25b2fd70 000007f8f8aad609 ntdll!RtlUserThreadStart+0x1d
(...)

Using !threads -special command, I finally found that these threads were special GC threads:

0:000> !threads -special
ThreadCount:      81
UnstartedThread:  0
BackgroundThread: 49
PendingThread:    0
DeadThread:       21
Hosted Runtime:   no
(...)
 OSID Special thread type
  1  804 DbgHelper 
  2  f48 GC 
  3  3f8 GC 
  4 1380 GC 
  5  af4 GC 
  6 1234 GC 
  7  fac GC 
  8 12e4 GC 
  9 17fc GC 
 10  644 GC 
 11 16e0 GC 
 12  6cc GC 
 13  9d4 GC 
 14  f7c GC 
 15  d5c GC 
 16  d74 GC 
 17  8d0 GC 
 18 1574 GC 
 19  8e0 GC 
 20  5bc GC 
 21  82c GC 
 22  e4c GC 
 23 129c GC 
 24  e28 GC 
 25  45c GC 
 26  340 GC 
 27 15c0 GC 
 28 16d4 GC 
 29  f4c GC 
 30 10e8 GC 
 31 1350 GC 
 32  164 GC 
 33 1620 GC 
 34 1444 Finalizer 
 35  c2c ProfilingAPIAttach 
 62   50 Timer 
 64 14a8 GC 
 65 145c GC 
 66  cdc GC 
 67  af8 GC 
 68 12e8 GC 
 69 1398 GC 
 70  e80 GC 
 71  a60 GC 
 72  834 GC 
 73  1b0 GC 
 74  2ac GC 
 75  eb8 GC 
 76  ec4 GC 
 77  ea8 GC 
 78   28 GC 
 79 11d0 GC 
 80 1700 GC 
 81 1434 GC 
 82 1510 GC 
 83   9c GC 
 84  c64 GC 
 85 11c0 GC 
 86 1714 GC 
 87 1360 GC 
 88 1610 GC 
 89  6c4 GC 
 90  cf0 GC 
 91 13d0 GC 
 92 1050 GC 
 93 1600 GC 
 94 16c4 GC 
 95 1558 GC 
 96 1b74 IOCompletion 
 97  ce4 ThreadpoolWorker 
 98 19a4 ThreadpoolWorker 
 99 1a00 ThreadpoolWorker 
100 1b64 ThreadpoolWorker 
101 1b38 ThreadpoolWorker 
102 1844 ThreadpoolWorker 
103 1b90 ThreadpoolWorker 
104 1a10 ThreadpoolWorker 
105 1894 Gate 

More than 60 "GC" threads... So I checked the settings of my different service instances, and figured out that the problematic ones were configured with GC Server, while the others were not.

Some more info:

  • We use .NET 4.5
  • We use Windows 2012 Server on all machines
  • We run on bi octo-core servers (2 CPU, 16 physical cores, 32 logical cores)

What I'm trying to do now:

  • I'm trying to get others dumps (when the program will have even more threads, when the program will hang, etc.)
  • I'll try to disable the GC Server setting on the problematic instances, but the issue can take some time to occur.

So here are my questions:

  • Is it normal for a GC server configured .NET program to have so many GC threads? I thought Server GC only had one GC thread per processor.
  • Can this be related to the problem I'm seeing on these services, i.e. hundreds of threads over time, with huge process freeze due to contention?
Bunco answered 12/9, 2013 at 10:8 Comment(2)
Your stack trace is unusable, you'll need to enable the symbol server to get better symbols.Bourne
@HansPassant I just edited the stack trace.Bunco
M
1

With Server GC, there will be one thread per logical core (that is affinity set to that core). So in your case there should be at least 32 threads. If you have background GC turned on, then there could be more worker threads processing the graph for each heap (reference).

Also keep in mind that these GC threads will run at THREAD_PRIORITY_HIGHEST which can easily starve any threads that aren't already paused by the GC (reference).

Now, as far as your other threads, 500+ in a process is going to create a lot of contention regardless of garbage collector. So figuring out what those threads are will be important to your investigations.


Things to look at

  • See if Background GC is turned on, and if so, try running without it (this mode is supported in 4.5's Server GC).
  • Try reducing the max number of threads on your thread pools (32767 is an unhealthy maximum).

You can also use procdump.exe to help capture minidumps when performance degrades.

Marinetti answered 17/9, 2013 at 15:55 Comment(0)
A
0

I had similar problems on a NUMA server. Things that helped me:

  • Limit the thread pool
  • Limit the processor affinity mask for the managed process that hangs. It looks odd but reducing the number of processors for a process sometimes makes it work faster on a high concurrency load. I suspect spin locks (busy waits).
Alli answered 22/10, 2013 at 6:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.