Detailed memory usage analysis of windows crash dump file?

Asked 19/1, 2011 at 12:17 Answered 26/1, 2011 at 14:55

Solved c++debugging heap-memory windbg crash-dumps

We have received a native (full) crash dump file from a customer. Opening it in the Visual Studio (2005) debugger shows that we had a crash caused by a realloc call that tried to allocate a ~10MB block. The dump file was unusually large (1,5 GB -- normally they are more like 500 MB).

We therefore conclude that we have a memory "leak" or runaway allocations that either fully exhausted the memory of the process or at least fragmented it significantly enough for the realloc to fail. ^{(Note that this realloc was for an operation that allocated a logging buffer and we are not surprised it failed here, because 10MB in one go would be one of the larger allocations that we do apart from some very large pretty unchangeable buffers -- the problem itself likely has nothing to do with this specific allocation.)}

Edit: After the comments exchange wit Lex Li below, I should add: This is not reproducible for us (at the moment). It's just one customer dump clearly showing runaway memory consumption.

Main Question:

Now we have a dump file, but how can we locate what caused the excessive memory usage?

What we've done so far:

We have used the DebugDiag tool to analyze the dump file (the so called Memory Pressure Analyzer), and here's what we got:

Report for DumpFM...dmp

Virtual Memory Summary
----------------------
Size of largest free VM block   62,23 MBytes 
Free memory fragmentation       81,30% 
Free Memory                     332,87 MBytes   (16,25% of Total Memory) 
Reserved Memory                 0 Bytes   (0,00% of Total Memory) 
Committed Memory                1,67 GBytes   (83,75% of Total Memory) 
Total Memory                    2,00 GBytes 
Largest free block at           0x00000000`04bc4000 

Loaded Module Summary
---------------------
Number of Modules       114 Modules 
Total reserved memory   0 Bytes 
Total committed memory  3,33 MBytes 

Thread Summary
--------------
Number of Threads       56 Thread(s) 
Total reserved memory   0 Bytes 
Total committed memory  652,00 KBytes

This was just to get a bit context. Whats more interesting I believe is:

Heap Summary
------------
Number of heaps         26 Heaps 
Total reserved memory   1,64 GBytes 
Total committed memory  1,61 GBytes 

Top 10 heaps by reserved memory
-------------------------------
0x01040000           1,55 GBytes        
0x00150000           64,06 MBytes        
0x010d0000           15,31 MBytes        
...

Top 10 heaps by committed memory
--------------------------------                              
0x01040000       1,54 GBytes 
0x00150000       55,17 MBytes 
0x010d0000       6,25 MBytes  
...

Now, looking at heap 0x01040000 (1,5 GB) we see:

Heap 5 - 0x01040000 
-------------------
Heap Name          msvcr80!_crtheap 
Heap Description   This heap is used by msvcr80 
Reserved memory      1,55 GBytes 
Committed memory     1,54 GBytes (99,46% of reserved)  
Uncommitted memory   8,61 MBytes (0,54% of reserved)  
Number of heap segments             39 segments 
Number of uncommitted ranges        41 range(s) 
Size of largest uncommitted range   8,33 MBytes 
Calculated heap fragmentation       3,27% 

Segment Information
-------------------
Base Address | Reserved Size   | Committed Size  | Uncommitted Size | Number of uncommitted ranges | Largest uncommitted block | Calculated heap fragmentation 
0x01040640        64,00 KBytes      64,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x01350000     1.024,00 KBytes   1.024,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x02850000     2,00 MBytes       2,00 MBytes       0 Bytes            0                              0 Bytes                     0,00% 
...

What is this Segment Information anyway?

Looking at the allocations that are listed:

Top 5 allocations by size
-------------------------
Allocation Size - 336          1,18 GBytes     
Allocation Size - 1120004      121,77 MBytes    
...

Top 5 allocations by count
--------------------------
Allocation Size - 336    3760923 allocation(s) 
Allocation Size - 32     1223794 allocation(s)  
...

We can see that apparently the MSVCR80 heap holds 3.760.923 allocations at 336 bytes. This makes it pretty clear that we mopped up our memory with lots of small allocations, but how can we get some more info regarding where these allocation came from?

If we somehow could sample some of these allocation addresses and then check where in the process image these addresses are in use, then -- assuming that a large portion of these allocations are responsible for our "leak" -- we could maybe find out where these runaway allocations came from.

Unfortunately, I have really no idea how to get more info out of the dump at the moment.

How could I inspect this heap to see some of the "336" allocation addresses?

How can I search the dump for these addresses and how do I then find out which pointer variable (if any) in the dump hold on tho these addresses?

Any tips regarding usage of DebugDiag, WinDbg or any other tool could really help! Also, if you disagree with any of my analysis above, let us know! Thanks!

Stalk answered 19/1, 2011 at 12:17 Comment(3)

great question, thanks for the information and walkthrough, had a similar problem. BTW, DebugDiag is now at microsoft.com/en-us/download/details.aspx?id=40336 – Bung 20/2, 2014 at 16:25

I just noticed that the above mention version 2.0 doesnt support the analysis so one should get the 1.2 version: microsoft.com/en-us/download/details.aspx?id=26798 - if the installation fails, create a usergroup called "Users" – Bung 20/2, 2014 at 16:48

Updating once again, the 2.x version DO support analysis, they just split DebugDiag into multiple applications, namly DebugDiag.Analysis.exe. Furthermore, version 2.1 is now available at microsoft.com/en-us/download/details.aspx?id=42933 – Bung 28/1, 2015 at 14:38

You could:

look into these blocks of 336 bytes to see if the content tells you anything about what allocated them. To do that, I usually use windbg. First run the command !heap -stat -h 0x01040000 that will give you the size of the block, then pass this size to !heap -flt s size that will list all blocks of that size. You can then look into the block with any command that displays memory (like dc).
you cannot reproduce the problem, but you can look into another dump what allocates blocks of that size. First activate the stack backtrace feature using the gflags.exeutility (gflags -i your.exe +ust). Then run your application, get a dump, and use the !heap -flt s to list the blocks. Then the command !heap -p -a blockaddress will dump the stack of functions that allocated the block.

Selectee answered 26/1, 2011 at 14:55 Comment(2)

You first bullet got us on the right track: !heap -stat showed the 0x150 blocks I already mentioned in the question. !heap -flt s150 then dumped out a long list of addresses. Inspecting some of these user block addresses then showed that df ... displayed valid and consistent float values. So we knew it was some float arrays leaking and were able to track down the leak. – Stalk 27/1, 2011 at 13:13

I should have seen you answer sooner. I was always blocked with !heap -flt s, as this won't give any stack backtraces. But until I saw your answer, I was not knowing that, !heap -flt s command will display stack backtraces, only with the dumps taken after activating the stack backtrace feature using the gflags.exe(gflags -i your.exe +ust). Thanks a lot. It was really helpful. – Blanketing 24/8, 2020 at 8:41

In windbg, you can try using !heap -l which should crawl the heaps (takes a while, there may be a way to restrict the search to a specific heap to speed it up) and find all the busy blocks that are not referenced anywhere. From there open the memory window (alt+5) and take a look at some of the entries that match your allocation size that you suspect to be your leak. With some luck there could be some common patterns that can help you identify what the data is or better yet some ascii strings that you can place right away.

Unfortunately, I don't really know any other good ways except trying to reproduce it while turning on user mode stack traces with gflags and using umdh to take memory snapshots.

Haemostatic answered 19/1, 2011 at 15:40 Comment(0)

How many dumps do you have now?

The proper way to track memory leak is to make good use of DebugDiag's Memory and Handle Leak rule.

Then when DebugDiag works on the new dumps, it can tell more about the memory usage.

Gale answered 23/1, 2011 at 1:43 Comment(4)

We have exactly one dump. // What is "good use of DebugDiag's Memory and Handle Leak rule" supposed to mean? What further options do I have given more dumps? – Stalk 23/1, 2011 at 10:52

blogs.msdn.com/b/tess/archive/2010/01/14/… – Gale 24/1, 2011 at 3:32

Thanks, the article you link to [Debugging Native memory leaks with Debug Diag 1.1][blogs.msdn.com/b/tess/archive/2010/01/14/… is very useful. Unfortunately it doesn't help much when we have only one dump where no diagnostics were enabled. – Stalk 24/1, 2011 at 8:35

@Martin, recapture dumps with a leak rule in DebugDiag. You can carefully configure the leak rule so that it can generate several dumps for you. Try it out, and you will see. – Gale 26/1, 2011 at 3:28

Main Question:

What we've done so far:

Recommended topics

Hot tags