How to analyze <unclassified> memory usage in windbg
Asked Answered
O

5

23

This is a .NET v4 windows service application running on a x64 machine. At some point after days of running steadily the windows service memory consumption spikes up like crazy until it crashes. I was able to catch it at 1.2 GB and capture a memory dump. Here is what i get

If i run !address -summary in windbg on my dump file i get the follow result

!address -summary

--- Usage Summary ------ RgnCount ------- Total Size -------- %ofBusy  %ofTotal
Free                     821      7ff`7e834000 (   7.998 Tb)           99.98%
<unclassified>           3696       0`6eece000 (   1.733 Gb)  85.67%   0.02%
Image                    1851       0`0ea6f000 ( 234.434 Mb)  11.32%   0.00%
Stack                    1881       0`03968000 (  57.406 Mb)  2.77%    0.00%
TEB                      628        0`004e8000 (   4.906 Mb)  0.24%    0.00%
NlsTables                1          0`00023000 ( 140.000 kb)  0.01%    0.00%
ActivationContextData    3          0`00006000 (  24.000 kb)  0.00%    0.00%
CsrSharedMemory          1          0`00005000 (  20.000 kb)  0.00%    0.00%
PEB                      1          0`00001000 (   4.000 kb)  0.00%    0.00%
-
-
-
--- Type Summary (for busy) -- RgnCount ----- Total Size ----- %ofBusy %ofTotal
MEM_PRIVATE                        5837 0`7115a000 (  1.767 Gb)  87.34%  0.02%
MEM_IMAGE                          2185 0`0f131000 (241.191 Mb)  11.64%  0.00%
MEM_MAPPED                           40 0`01531000 ( 21.191 Mb)   1.02%  0.00%
-
-
--- State Summary ------------ RgnCount ------ Total Size ---- %ofBusy %ofTotal
MEM_FREE                            821 7ff`7e834000 (  7.998 Tb)        99.98%
MEM_COMMIT                         6127   0`4fd5e000 (  1.247 Gb) 61.66%  0.02%
MEM_RESERVE                        1935   0`31a5e000 (794.367 Mb) 38.34%  0.01%
-
-
--Protect Summary(for commit)- RgnCount ------ Total Size --- %ofBusy %ofTotal
PAGE_READWRITE                     3412 0`3e862000 (1000.383 Mb) 48.29%   0.01%
PAGE_EXECUTE_READ                   220 0`0b12f000 ( 177.184 Mb)  8.55%   0.00%
PAGE_READONLY                       646 0`02fd0000 (  47.813 Mb)  2.31%   0.00%
PAGE_WRITECOPY                      410 0`01781000 (  23.504 Mb)  1.13%   0.00%
PAGE_READWRITE|PAGE_GUARD          1224 0`012f2000 (  18.945 Mb)  0.91%   0.00%
PAGE_EXECUTE_READWRITE              144 0`007b9000 (   7.723 Mb)  0.37%   0.00%
PAGE_EXECUTE_WRITECOPY               70 0`001cd000 (   1.801 Mb)  0.09%   0.00%
PAGE_EXECUTE                          1 0`00004000 (  16.000 kb)  0.00%   0.00%
-
-
--- Largest Region by Usage ----Base Address -------- Region Size ----------
Free                            0`8fff0000        7fe`59050000 (   7.994 Tb)
<unclassified>                  0`80d92000        0`0f25e000 ( 242.367 Mb)
Image                           fe`f6255000       0`0125a000 (  18.352 Mb)
Stack                           0`014d0000        0`000fc000 (1008.000 kb)
TEB                             0`7ffde000        0`00002000 (   8.000 kb)
NlsTables                       7ff`fffb0000      0`00023000 ( 140.000 kb)
ActivationContextData           0`00030000        0`00004000 (  16.000 kb)
CsrSharedMemory                 0`7efe0000        0`00005000 (  20.000 kb)
PEB                             7ff`fffdd000      0`00001000 (   4.000 kb)

First, why would unclassified show up once as 1.73 GB and the other time as 242 MB. (This has been answered. Thank you)

Second, i understand that unclassified can mean managed code, however my heap size according to !eeheap is only 248 MB, which actually matches the 242 but not even close to the 1.73GB. The dump file size is 1.2 GB which is much higher than normal. Where do I go from here to find out what's using all the memory. Anything in the managed heap world is under 248 MB, but i'm using 1.2 GB.

Thanks

EDIT

If i do !heap -s i get the following

LFH Key                   : 0x000000171fab7f20
Termination on corruption : ENABLED
          Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                            (k)     (k)    (k)     (k) length      blocks cont. heap 
-------------------------------------------------------------------------------------
Virtual block: 00000000017e0000 - 00000000017e0000 (size 0000000000000000)
Virtual block: 0000000045bd0000 - 0000000045bd0000 (size 0000000000000000)
Virtual block: 000000006fff0000 - 000000006fff0000 (size 0000000000000000)
0000000000060000 00000002  113024 102028 113024  27343  1542    11    3    1c LFH
    External fragmentation  26 % (1542 free blocks)
0000000000010000 00008000      64      4     64      1     1     1    0    0      
0000000000480000 00001002    3136   1380   3136     20     8     3    0    0  LFH
0000000000640000 00041002     512      8    512      3     1     1    0    0      
0000000000800000 00001002    3136   1412   3136     15     7     3    0    0  LFH
00000000009d0000 00001002    3136   1380   3136     19     7     3    0    0  LFH
00000000008a0000 00041002     512     16    512      3     1     1    0    0      
0000000000630000 00001002    7232   3628   7232     18    53     4    0    0  LFH
0000000000da0000 00041002    1536    856   1536      1     1     2    0    0  LFH
0000000000ef0000 00041002    1536    944   1536      4    12     2    0    0  LFH
00000000034b0000 00001002    1536   1452   1536      6    17     2    0    0  LFH
00000000019c0000 00001002    3136   1396   3136     16     6     3    0    0  LFH
0000000003be0000 00001002    1536   1072   1536      5     7     2    0    3  LFH
0000000003dc0000 00011002     512    220    512    100    60     1    0    2      
0000000002520000 00001002     512      8    512      3     2     1    0    0      
0000000003b60000 00001002  339712 168996 339712 151494   976   116    0   18  LFH
    External fragmentation  89 % (976 free blocks)
    Virtual address fragmentation  50 % (116 uncommited ranges)
0000000003f20000 00001002      64      8     64      3     1     1    0      0      
0000000003d90000 00001002      64      8     64      3     1     1    0      0      
0000000003ee0000 00001002      64     16     64     11     1     1    0      0      
-------------------------------------------------------------------------------------
Ostrich answered 26/1, 2012 at 18:57 Comment(10)
Use the SOS dll and look at the heap that way. Since this is a .net program the heap allocations made by the .net framework don't show up in the unmanaged heap.Sweep
my heap size according to !eeheap (sos.dll) is only 248 MB. So i'm not sure that's the cause of the 1.2 GB process size, nor the cause of the 1.7GB in unclassified, unless i'm missing somethingOstrich
What does your service do? Does your service contain unmanaged or C++/CLI code? It looks like an unmanaged memory leak. What does the GDI, User Objects, Handles count say? In which call stacks are your threads stuck? ~*e!ClrStack and ~*e!DumpStack and ~*ekv are your friends to see what your threads were doing. Is one thread in the middle of allocating something?Adjutant
@AloisKraus thanks for your response. There is no direct unmanaged calls. It's tough to figure things out from the threads side of things because i've got so many (about 600). None specifically allocate any large ammounts of memory. Many are doing network related things (WMI, etc) which is expected. I'm not sure i know how to query for GDI or User Objects or what numbers for those are good/badOstrich
600 Threads? The CLR does use 1 MB of Stack space by default which is commited by default. In that case you would already use 600 MB of memory only for the thread stacks. But your memory dump only shows 57.406 which would be about 58 threads. Your largest thread stack was exactly 1 MB which could indicate a Stackoverflow. WMI uses COM like crazy. I could be very well be that you query WMI very often with some form of "WITHIN 0.1" which will produce large amounts of garbage COM objects.Adjutant
@AloisKraus !threads says i have 605 thread count and 599 backround threads and 5 dead threads. Are you saying since the stack size is only 57MB and my largest stack is 1 MB, this proves that its a stack overflow somehow? That would be VERY VERY useful to know!Ostrich
At least one thread is at the brink of Stackoverflow because of its 1 MB private memory stack space. If there was already a Stackoverflow exception I cannot tell. The !threads command displays the last thrown exception for each thread. But if you have unmanaged code running as well it could also be there a problem. With managed code you will not see a StackoverflowException since the CLR will terminate your process immediately except if you host the CLR by yourself and choose a different escalation policy.Adjutant
@AloisKraus is there anyways i can find the stack size of each thread? Once i have the thread i can maybe see what its up to?Ostrich
Did you already run !analyze -v (before that do a .symfix and .reload) to load the MS symbols. This will find an already happened StackOverflow.Adjutant
@AloisKraus i did and do not see a stack overflow exceptionOstrich
S
18

I've recently had a very similar situation and found a couple techniques useful in the investigation. None is a silver bullet, but each sheds a little more light on the problem.

1) vmmap.exe from SysInternals (http://technet.microsoft.com/en-us/sysinternals/dd535533) does a good job of correlating information on native and managed memory and presenting it in a nice UI. The same information can be gathered using the techniques below, but this is way easier and a nice place to start. Sadly, it doesn't work on dump files, you need a live process.

2) The "!address -summary" output is a rollup of the more detailed "!address" output. I found it useful to drop the detailed output into Excel and run some pivots. Using this technique I discovered that a large number of bytes that were listed as "" were actually MEM_IMAGE pages, likely copies of data pages that were loaded when the DLLs were loaded but then copied when the data was changed. I could also filter to large regions and drill in on specific addresses. Poking around in the memory dump with a toothpick and lots of praying is painful, but can be revealing.

3) Finally, I did a poor man's version of the vmmap.exe technique above. I loaded up the dump file, opened a log, and ran !address, !eeheap, !heap, and !threads. I also targeted the thread environment blocks listed in ~*k with !teb. I closed the log file and loaded it up in my favorite editor. I could then find an unclassified block and search to see if it popped up in the output from one of the more detailed commands. You can pretty quickly correlate native and managed heaps to weed those out of your suspect unclassified regions.

These are all way too manual. I'd love to write a script that would take the output similar to what I generated in technique 3 above and output an mmp file suitable for viewing the vmmap.exe. Some day.

One last note: I did a correlation between vmmap.exe's output with the !address output and noted these types of regions that vmmap couple identify from various sources (similar to what !heap and !eeheap use) but that !address didn't know about. That is, these are things that vmmap.exe labeled but !address didn't:

.data
.pdata
.rdata
.text
64-bit thread stack
Domain 1
Domain 1 High Frequency Heap
Domain 1 JIT Code Heap
Domain 1 Low Frequency Heap
Domain 1 Virtual Call Stub
Domain 1 Virtual Call Stub Lookup Heap
Domain 1 Virtual Call Stub Resolve Heap
GC
Large Object Heap
Native heaps
Thread Environment Blocks

There were still a lot of "private" bytes unaccounted for, but again, I'm able to narrow the problem if I can weed these out.

Hope this gives you some ideas on how to investigate. I'm in the same boat so I'd appreciate what you find, too. Thanks!

Savoirvivre answered 6/2, 2012 at 19:13 Comment(1)
"I'd love to write a script ... for viewing the vmmap": have you had the chance to do so?Pacifistic
S
2

“Usage summary” tells that you have 3696 regions of unclassified giving a total of 17.33 Gb

“Largest Region” tells that the largest of the unclassified regions is 242 Mb. The rest of the unclassified (3695 regions) together makes the difference up to 17.33 Gb.

Try to do a !heap –s and sum up the Virt col to see the size of the native heaps, I think these also falls into the unmanaged bucket. (NB earlier versions shows native heap explicit from !address -summary)

Schoenfelder answered 26/1, 2012 at 20:14 Comment(3)
Thanks for the explanation of Usage Summary vs Largest Region. I've done a !heap -s and totaled up the virt col and i get 359.12 MB. Does that say anything? I've added the results of !heap -sOstrich
No sorry, and since this is a 64 bits dump I have no practical experience. By the way, what version of .NET is the dump from? That might be of interest for other which might know moreSchoenfelder
I've added that info int he question description. It's .NET 4.0Ostrich
J
2

I keep a copy of Debugging Tools for Windows 6.11.1.404 which seems to be able to display something more meaningful for "unclassified"

With that version, I see a list of TEB addresses and then this:

0:000> !address -summary
 --------- PEB fffde000 not found ----
 TEB fffdd000 in range fffdb000 fffde000
 TEB fffda000 in range fffd8000 fffdb000
...snip...
 TEB fe01c000 in range fe01a000 fe01d000
 ProcessParametrs 002c15e0 in range 002c0000 003c0000
 Environment 002c0810 in range 002c0000 003c0000
-------------------- Usage SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage
   41f08000 ( 1080352) : 25.76%    34.88%    : RegionUsageIsVAD
   42ecf000 ( 1096508) : 26.14%    00.00%    : RegionUsageFree
    5c21000 (   94340) : 02.25%    03.05%    : RegionUsageImage
    c900000 (  205824) : 04.91%    06.64%    : RegionUsageStack
          0 (       0) : 00.00%    00.00%    : RegionUsageTeb
   68cf8000 ( 1717216) : 40.94%    55.43%    : RegionUsageHeap
          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap
          0 (       0) : 00.00%    00.00%    : RegionUsagePeb
          0 (       0) : 00.00%    00.00%    : RegionUsageProcessParametrs
          0 (       0) : 00.00%    00.00%    : RegionUsageEnvironmentBlock
       Tot: ffff0000 (4194240 KB) Busy: bd121000 (3097732 KB)

-------------------- Type SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   42ecf000 ( 1096508) : 26.14%   : <free>
    5e6e000 (   96696) : 02.31%   : MEM_IMAGE
    28ed000 (   41908) : 01.00%   : MEM_MAPPED
   b49c6000 ( 2959128) : 70.55%   : MEM_PRIVATE

-------------------- State SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   9b4d1000 ( 2544452) : 60.67%   : MEM_COMMIT
   42ecf000 ( 1096508) : 26.14%   : MEM_FREE
   21c50000 (  553280) : 13.19%   : MEM_RESERVE

Largest free region: Base bc480000 - Size 38e10000 (931904 KB)

With my "current" version (6.12.2.633) I get this from the same dump. Two things I note:

The data seems to be the sum of the HeapAlloc/RegionUsageHeap and VirtualAlloc/RegionUsageIsVAD).

The lovely EFAIL error which is no doubt in part responsible for the missing data!

I'm not sure how that'll help you with your managed code, but I think it actually answers the original question ;-)

0:000> !address -summary


Failed to map Heaps (error 80004005)

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unclassified>                         7171          aab21000 (   2.667 Gb)  90.28%   66.68%
Free                                    637          42ecf000 (   1.046 Gb)           26.14%
Stack                                   603           c900000 ( 201.000 Mb)   6.64%    4.91%
Image                                   636           5c21000 (  92.129 Mb)   3.05%    2.25%
TEB                                     201             c9000 ( 804.000 kb)   0.03%    0.02%
ActivationContextData                    14             11000 (  68.000 kb)   0.00%    0.00%
CsrSharedMemory                           1              5000 (  20.000 kb)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE                            7921          b49c6000 (   2.822 Gb)  95.53%   70.55%
MEM_IMAGE                               665           5e6e000 (  94.430 Mb)   3.12%    2.31%
MEM_MAPPED                               40           28ed000 (  40.926 Mb)   1.35%    1.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_COMMIT                             5734          9b4d1000 (   2.427 Gb)  82.14%   60.67%
MEM_FREE                                637          42ecf000 (   1.046 Gb)           26.14%
MEM_RESERVE                            2892          21c50000 ( 540.313 Mb)  17.86%   13.19%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                         4805          942bd000 (   2.315 Gb)  78.37%   57.88%
PAGE_READONLY                           215           3cbb000 (  60.730 Mb)   2.01%    1.48%
PAGE_EXECUTE_READ                        78           2477000 (  36.465 Mb)   1.21%    0.89%
PAGE_WRITECOPY                           74            75b000 (   7.355 Mb)   0.24%    0.18%
PAGE_READWRITE|PAGE_GUARD               402            3d6000 (   3.836 Mb)   0.13%    0.09%
PAGE_EXECUTE_READWRITE                   80            3b0000 (   3.688 Mb)   0.12%    0.09%
PAGE_EXECUTE_WRITECOPY                   80            201000 (   2.004 Mb)   0.07%    0.05%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unclassified>                                786000           17d9000 (  23.848 Mb)
Free                                        bc480000          38e10000 ( 910.063 Mb)
Stack                                        6f90000             fd000 (1012.000 kb)
Image                                        3c3c000            ebe000 (  14.742 Mb)
TEB                                         fdf8f000              1000 (   4.000 kb)
ActivationContextData                         190000              4000 (  16.000 kb)
CsrSharedMemory                             7efe0000              5000 (  20.000 kb)
Jap answered 9/2, 2012 at 14:51 Comment(0)
S
1

You're best bet would be to use the EEHeap and GCHandles commands in windbg (http://msdn.microsoft.com/en-us/library/bb190764.aspx) and try to see if you can find what might be leaking/wrong that way.

Unfortunately you probably won't be able to get the exact help you're looking for due to the fact that diagnosing these types of issues is almost always very time intensive and outside of the simplest cases requires someone to do a full analysis on the dump. Basically it's unlikely that someone will be able to point you towards a direct answer on Stack overflow. Mostly people will be able to point you commands that might be helpful. You're going to have to do a lot of digging to find out more information on what is happening.

Sweep answered 31/1, 2012 at 18:35 Comment(1)
It would be good enough to be pointed to the right direction. Like i said EEHeap seems to only describe 248 MB of the whole thing, so i'm not sure if the answer can be in there. I'll take a look at GCHandlesOstrich
H
0

I recently spent some time diagnosing a customers issue where their app was using 70GB before terminating (likely due to hitting an IIS App Pool recycling limit, but still unconfirmed). They sent me a 35 GB memory dump. Based on my recent experience, here are some observations I can make about what you've provided:

In the !heap -s output, 284 MB of the 1.247 GB is shown in the Commit column. If you were to open this dump in DebugDiag it would tell you that heap 0x60000 has 1 GB committed memory. You'll add up the commit size of the 11 segments reported and find that they only add up to about 102 MB and not 1GB. So annoying.

The "missing" memory isn't missing. It's actually hinted at in the !heap -s output as "Virtual block:" lines. Unfortunately, !heap -s sucks and doesn't show the end address properly and therefore reports size as 0. Check the output of the following commands:

!address 17e0000
!address 45bd0000
!address 6fff0000

It will report the proper end address and therefore an accurate "Region Size". Even better, it gives a succinct version of the region size. If you add the size of those 3 regions to 102 MB, you should be pretty close to 1 GB.

So what's in them? Well, you can look using dq. By spelunking you might find a hint at why they were allocated. Perhaps your managed code calls some 3rd party code which has a native side.

You might be able to find references to your heap by using !heap 6fff0000 -x -v. If there are references you can see what memory regions they live in by using !address again. In my customer issue I found a reference that lived on a region with "Usage: Stack". A "More info: " hint referenced the stack's thread which happened to have some large basic_string append/copy calls at the top.

Hoskinson answered 28/9, 2016 at 19:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.