xperf WinDBG C# .NET 4.5.2 Application - Understanding process dump
Asked Answered
G

1

64

Under a heavy load, our application is making a beefy server go to 100% CPU usage. Reading the process dump, looking at the threads, some of them are 10 minutes up. None of them give me any insight when using !CLRStack.

The !runaway is giving me:

0:030> !runaway
 User Mode Time
  Thread       Time
  53:2e804      0 days 0:10:04.703
  30:31894      0 days 0:07:51.593
  33:47100      0 days 0:07:24.890
  42:11e54      0 days 0:06:45.875
  35:35e18      0 days 0:06:07.578
  41:54464      0 days 0:05:49.796
  47:57700      0 days 0:05:45.000
  44:3c2d4      0 days 0:05:44.265
  32:3898c      0 days 0:05:43.593
  50:54894      0 days 0:05:41.968
  51:5bc58      0 days 0:05:40.921
  43:14af4      0 days 0:05:40.734
  48:35074      0 days 0:05:40.406
  ...

Calling !DumpStack on one of those threads, I am getting:

0000001ab442f900 00007ff9ef4c1148 KERNELBASE!WaitForSingleObjectEx+0x94, calling ntdll!NtWaitForSingleObject
0000001ab442f980 00007ff9e920beb2 clr!SVR::gc_heap::compute_new_dynamic_data+0x17b, calling clr!SVR::gc_heap::desired_new_allocation
0000001ab442f9a0 00007ff9e90591eb clr!CLREventWaitHelper2+0x38, calling kernel32!WaitForSingleObjectEx
0000001ab442f9b0 00007ff9e90e0d2c clr!WriteBarrierManager::UpdateEphemeralBounds+0x1c, calling clr!WriteBarrierManager::NeedDifferentWriteBarrier
0000001ab442f9e0 00007ff9e9059197 clr!CLREventWaitHelper+0x1f, calling clr!CLREventWaitHelper2
0000001ab442fa40 00007ff9e9059120 clr!CLREventBase::WaitEx+0x70, calling clr!CLREventWaitHelper
0000001ab442fa70 00007ff9ef4c149c KERNELBASE!SetEvent+0xc, calling ntdll!NtSetEvent
0000001ab442faa0 00007ff9e90ef1e1 clr!SVR::gc_heap::set_gc_done+0x22, calling clr!CLREventBase::Set
0000001ab442fad0 00007ff9e90e9331 clr!SVR::gc_heap::gc_thread_function+0x8a, calling clr!CLREventBase::WaitEx
0000001ab442fb00 00007ff9e92048e7 clr!SVR::gc_heap::gc_thread_stub+0x7a, calling clr!SVR::gc_heap::gc_thread_function
0000001ab442fb60 00007ff9e91a0318 clr!Thread::CLRSetThreadStackGuarantee+0x48, calling kernel32!SetThreadStackGuaranteeStub
0000001ab442fb90 00007ff9e91a01ef clr!Thread::CommitThreadStack+0x10, calling clr!Thread::CLRSetThreadStackGuarantee
0000001ab442fbd0 00007ff9e910df0b clr!ClrFlsSetValue+0x57, calling kernel32!SetLastErrorStub
0000001ab442fc00 00007ff9e92048dc clr!SVR::gc_heap::gc_thread_stub+0x6f, calling clr!_chkstk
0000001ab442fc40 00007ff9f0d316ad kernel32!BaseThreadInitThunk+0xd
0000001ab442fc70 00007ff9f1e54409 ntdll!RtlUserThreadStart+0x1d

What is it telling me? I see a lot of calls to the CLR, but I can't understand where would the problem be. After the .reload (suggested by Thomas) now I can see the GC calls.

Update 1

After running xperf, each w3wp.exe is consuming something about 45% of CPU. Filtering by one of them and grouping by Function, there is a function labeled as "?" that is responsible for 13.62%, the others are 2.67% or less. How do I manage to know what is this "?"?

Update 2

Ran xperf again, and function JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel is responsible for 12.31% of CPU usage. The "?" function still stays there.

Grouping by Stack:

Line #, Stack, Count, Weight (in view), TimeStamp, % Weight
2,   |- ?!?, 501191, 501222.365294, , 35.51
3,   |    |- clr.dll!JITutil_MonContention, 215749, 215752.552227, , 15.28
4,   |    |- clr.dll!JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel, 170804, 170777.100191, , 12.10

As you can see, those two are responsible for more than 27% of CPU usage (for each process, so it is significant).

Update 3

After using wpr.exe (suggestion by @magicandre1981):

wpr.exe -start cpu and wpr -stop result.etl

I found out that FormsAuthentication and some unnecessary calls to Ninject on critical path were contributing to around 16% of CPU usage. I still don't understand the threads running gor 10 minutes or more.

Update 4

Tried DebugDiag (suggestion from @leppie) and it just confirmed that the threads hanging are all similar to:

Thread ID: 53     Total CPU Time: 00:09:11.406     Entry Point for Thread: clr!Thread::intermediateThreadProc 
Thread ID: 35     Total CPU Time: 00:07:26.046     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 50     Total CPU Time: 00:07:01.515     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 29     Total CPU Time: 00:06:02.264     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 
Thread ID: 31     Total CPU Time: 00:06:41.281     Entry Point for Thread: clr!SVR::gc_heap::gc_thread_stub 

or due to StackExchange.Redis:

DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef)+e1 
[[InlinedCallFrame] (StackExchange.Redis.SocketManager.select)] StackExchange.Redis.SocketManager.select(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValueByRef) 
StackExchange.Redis.SocketManager.ReadImpl()+889 
StackExchange.Redis.SocketManager.Read()+66 

or

[[GCFrame]] 
[[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) 
mscorlib_ni!System.Threading.Monitor.Wait(System.Object, Int32)+19 
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+24f 
StackExchange.Redis.RedisBase.ExecuteSync[[System.__Canon, mscorlib]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1, StackExchange.Redis.ServerEndPoint)+77 
[[StubHelperFrame]] 
StackExchange.Redis.RedisDatabase.SetMembers(StackExchange.Redis.RedisKey, StackExchange.Redis.CommandFlags)+ee 
Gyatt answered 28/9, 2015 at 14:2 Comment(33)
Something that is waiting in WaitForSingleObjectEx() will not consume CPU, because it is waiting and not doing work. Threads involved in a deadlock will also not consume CPU, since they will all be waiting for resources.Disclose
@Thomas Any idea how can I get anything useful from this dump? Regardind the number of digitis for thread number, generally I have managed code runnig threads with 3 digits. I don't believe it is a problem.Gyatt
@Thomas After the .symfix I still have the same values. I was able to see managed code before, so I don't fix it would be a symbol problem.Gyatt
Let us continue this discussion in chat.Gyatt
Are you about to run the program in Visual studio and profile it?Ade
@Ade No, on production server ( no VS installed there)Gyatt
use ETW/xperf/WPA to trace CPU usage: channel9.msdn.com/Shows/Defrag-Tools/…Mcwilliams
@Thomas: after running xperf, DPC are on a almost constant duration of 2 (y-axis). I don't think it is good. By module name tcpip.sys and ntoskrnl.exe are the champions.Gyatt
use WPR, this also generates the PDBs for ngened DLLs: wpr.exe -start cpu and wpr -stop result.etl. Do you now see a better stack and not only "?" ?Mcwilliams
I would advise using WPA with the windows symbol server and your own symbols. Then you should do wait analysis, and see both the relevant function name in w3wp and which function in your module called it. Then you have some sort of idea what are you waiting for that w3wp is wasting time on. I'm not proficient enough in .NET but I imagine you could also look at FILE IO and Network IO activity, to see if this is because of some file / network call (which is likely the initiator of the CPU problem)Solicit
@Mcwilliams i will take a look at wpr results now.Gyatt
@Solicit I alerady tried to load my own symbols on WPA (added to Load Symbols path on WPA), with no success.Gyatt
@Mcwilliams still getting '?' as the name of the function responsible for most part of the processing.Gyatt
@Mcwilliams found useful stuff on stacks! I will update question later.Gyatt
if you still see ?, this means that is dynamically generated code.Mcwilliams
@Mcwilliams any advice on how to understand what those threads are doing to be more than 10 minutes running?Gyatt
analyze your GC data: channel9.msdn.com/Shows/Defrag-Tools/…, channel9.msdn.com/Shows/Defrag-Tools/…, channel9.msdn.com/Shows/Defrag-Tools/…, channel9.msdn.com/Shows/Defrag-Tools/…Mcwilliams
@Mcwilliams as soon as I get some time, I will perform this analysis and update here.Gyatt
If you can, you should be profiling in this situation and not analyzing memory dumps. #4427Archeozoic
My 2 cents, use DebugDiag to check for locks.Admeasurement
@Admeasurement the results of DebugDiag is above. Later I will try to see GC, but it seems that StackExchange.Redis has a known issue.Gyatt
Sorry for no feedback, but I am currently busy with other things here. Later I will return to this issue.Gyatt
add some logging to check that it isn't a deadlock which occurs under heavy load only (concurrent access, etc).Aoristic
@Dexion, i will give it a try, but all the logging that already exists does not point anything but what the dumps and proc analysis already showed.Gyatt
I would use a profiling tool to identify the issue. You can get a 14 day free trial of Redgates ANT which should give you enough time to put out the fire.... red-gate.com/products/dotnet-development/… Another good option is the EQATEC Profiler eqatec.com/tools/profilerOglethorpe
Typically "?" In WPA is because Symbols are not loaded. if using memory dumps for perf analysis you need to take multiple dmps, ProcDump SysInternals tool can auto trigger dmp on specified high CPU, ensure -ma option is used. In WinDbg You can use !eestack to check all CLR threads; you can also dmp all exceptions on heap, exceptions often lead to high CPU in ASP.NET blogs.msdn.com/b/tess/archive/2009/04/16/…Martyrology
In addition PAL tool pal.codeplex.com using ASP.NET template can give good clues where to investigateMartyrology
@MalcolmMcCaffery the ? is dynamically generated code. I've asked this Microsoft some time ago.Mcwilliams
Thanks for the tips. As soon as I refactor the authentication part (that seems to be related), I will run some load tests again and investigate what is going on again. We are updating StackExchange.Redis, as it is always timing out under this load.Gyatt
If you see the gc_thread_stub in the call stack, it is GC a thread. GC threads are created per virtual cpu, so it is expected to have many. The runaway gives you the total cpu time spent by the threads and not the duration since the thread was started. Something is probably causing too many GC's to happen, and you should find that. I would look for threads with GCFrame (hijacked by the GC) and see their stack traceWittie
We had a very similar issue recently, which turned out to be a DataTable.Select statement executed on a data table with 30000 records in a loop 200 times, and we detected the issue by finding the hijacked threads, and also by profiling with red gate performance profiler (because that code block was also consuming too much cpu naturally)Wittie
So is this only happening in production then? This is the point where I sit down with the developers and go through the code path seeing what the heck is happening that might cause high CPU. And if it's environment-specific it's usually a data problem. In my case they were using Func LINQ predicates, not Expression predicates, so bringing down prod data in memory and client-side filtering. DebugDiag hinted at that with tons of domain objects on the heap. If it's Ninject, check all the dependencies being injected and their ctors for non-trivial logic.Ludmilla
PerfView.exe is a great tool for understanding where all your CPU is going. It's particularly good for analyzing .net apps. single-exe deploy and suitable for running on prod servers. It looks a bit weird but if you watch a couple of tutorial videos you can perform simple CPU or memory analysis very easily and without having to wade through process dumps, symbol resolution problems, etc etc. channel9.msdn.com/Series/PerfView-Tutorial. I really can't recommend it highly enough!Anomalistic
C
2

Doing this by hand needs bravery ;) Please check this Official MS DebugDiag 2.2: https://www.microsoft.com/en-us/download/details.aspx?id=49924 it has come with analyzer so you don't have to do with your hand. With DebugDiag, I think you will find your problem faster then ever...

Cythera answered 24/1, 2016 at 21:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.