Realloc() does not correctly free memory in Windows

Asked 6/2, 2012 at 18:0 Answered 6/2, 2012 at 20:32

I am attempting to use realloc() in a Windows application. I am allocating a large block of memory, then using realloc() to shrink it down later once I know the correct size.

I am finding that although realloc() appears to work correctly (memory in Task Manager reflects what you would expect) the application eventually runs out of memory. From what I can tell, it's as though relloc() frees the memory but does not free the virtual address space associated with the memory. As a result, malloc() will eventually fail.

Here's a small console app that demonstrates the problem:

int _tmain(int argc, _TCHAR* argv[])
{
    static const DWORD dwAllocSize = (50 * 1024 * 1024);
    static const DWORD dwReallocSize = 10240;
    static const DWORD dwMaxIterations = 200;

    BYTE* arpMemory[dwMaxIterations];
    memset( arpMemory, 0, sizeof(arpMemory) );

    for( DWORD i = 0; i < dwMaxIterations; i++ )
    {
        arpMemory[i] = (BYTE*) malloc( dwAllocSize );
        if( !arpMemory[i] )
        {
            printf("OUT OF MEMORY after %d iterations!\n", i);
            return -1;
        }

        BYTE* pRealloc = (BYTE*) realloc( arpMemory[i], dwReallocSize );
        if( !pRealloc )
        {
            printf("Realloc FAILED after %d iterations!\n", i);
            return -1;
        }
        else if( pRealloc != arpMemory[i] )
        {
            printf("Warning: Pointer changed: 0x%08X -> 0x%08X\n", arpMemory[i], pRealloc);
            arpMemory[i] = pRealloc;
        }
    }

    printf("Success!\n");

    for( int i = 0; i < dwMaxIterations; i++ )
        free( arpMemory[i] );

    return 0;
}

The app repeatedly allocates 50 MB of memory and then immediately resizes it to be only 10K. If you run it, you will find that it fails with an OUT OF MEMORY error after only 38 iterations. This corresponds to 2GB of originally allocated memory -- which is the address space limit for Windows applications.

Interestingly, if you look in Task Manager, you will see the application taking almost no memory at all. Yet malloc() is failing. This is what leads me to believe the virtual address space is being exhausted.

(Another experiment to try is to comment out the reallocation, so no memory is freed or reallocated. The app fails in exactly the same place: After 38 iterations. The only difference is that this time Task Manager reflects the full 2GB in use.)

One final point of information: This same application works under Linux. So this realloc() problem is strictly Windows-only.

Any thoughts?

Extrabold answered 6/2, 2012 at 18:0 Comment(15)

My question? I guess "Is this behavior a known bug in Windows?" And obviously, is there any work-around for this problem that doesn't involve copying memory? – Extrabold 6/2, 2012 at 18:53

Anyway, you probably have a fundamental design problem, certainly for processes that will stress the 32 bit address space. You say "I am allocating a large block of memory, then using realloc() to shrink it down later once I know the correct size." That is just asking for trouble. It's very demanding to require very large contiguous blocks of address space. You are much better off allocating small chunks of memory and then piecing them together. – Johny 6/2, 2012 at 19:7

Good point, you're right. I am using Visual Studio 2008. (I've done some googling and it does not appear that anyone has mentioned this as an issue with this compiler -- Maybe I'm the first?) – Extrabold 6/2, 2012 at 19:9

fwiw, I took the code and put it into a Studio .NET 2010 C++ Console Application project with build config set to "Win32" and saw the same result. So compiler = VC++ that ships with Stuio .NET 2010 for me. Had to try this as I've not done anything with C++ in 15 years. :) – Remaremain 6/2, 2012 at 19:10

I'm not an expert on the low level activities of memory allocation but I would guess that you may be fragmenting the heap. After your 38 iterations, there is still lots of heap space left, just no contiguous blocks 50MB in size. – Biondo 6/2, 2012 at 19:12

@ArnoldSpence is probably right. I would not expect malloc implementations to be resilient to such abuse. Why would the malloc implementors attempt to support this usage? You are expected not to attack the heap this. Way you'll almost certainly be able to see the same effect on a Linux or Mac C compiler under 32 bits, although you may need to go to 4GB rather than 2GB. Bottom line is that you will need to re-work your code to avoid fragmenting the virtual address space like this. – Johny 6/2, 2012 at 19:19

Perhaps a bit of context would be useful. My code is part of a large-scale application that does video processing. We deal with a frame of data at a time, which could be very small (compressed at a low data rate) or very large (uncompressed 4K resolution). At the point in my code where I must allocate the memory, I do not know the size of the frame. So I must allocate the "worst case scenario". Later, once the entire frame data is received, I wish to shrink the memory region to the actual frame size, which is typically much smaller than the originally allocated size. ... – Extrabold 6/2, 2012 at 19:20

... This frame is then passed on to other modules for processing. There may be many frames, and it may be a while before the memory is released. The problem is we are seeing situations where we run out of memory while processing small-sized frames. (These out-of-memory errors would be reasonable for 4K frames, but not for small frames.) Since speed is important, we wish to avoid unnecessary memory copying. So, with this in mind, do you see any elegant solution? Or will my module require a "frame size hint" before allocating a memory buffer, something I was hoping to avoid? – Extrabold 6/2, 2012 at 19:20

Sounds to me like you need to allocate small blocks on demand and piece them together, or wait until you know how big the frame is before you do the allocation. Or switch to 64 bit process and then you can pretty much abuse the heap to your heart's content. – Johny 6/2, 2012 at 19:22

Thank you for the excellent answers! @ArnoldSpence is correct: It is indeed a case of memory fragmentation. I confirmed this by modifying my example above to allocate a series of 49 MB buffers after the reaching the "OUT OF MEMORY" condition. Sure enough, these all succeeded (up to the 2GB limit). – Extrabold 6/2, 2012 at 19:27

(FYI to anyone reading this thread: When I say "4K frames" I mean "4K resolution frames", not 4 kilobyte.) – Extrabold 6/2, 2012 at 19:30

@David Heffernan: Allocating small blocks and piecing them together would not work, as ultimately we need the frame memory to be contiguous. So at this point I see three solutions: 1) Do a one-time "maximum frame size" allocation for incoming video frame data, and suffer the performance hit of a memory copy into a smaller buffer when we are ready to pass the data on. 2) Come up with a reasonable "frame hint" method so that we don't over-allocate a ridiculous amount of memory. 3) As David Heffernan suggested, see if we can move our entire project to run in the 64-bit address space. – Extrabold 6/2, 2012 at 19:35

How does the above code work manage to succeed on Linux? (Presumably gcc and glibc runtime) The only guess I have is that he's running 64-bit linux and getting a 64-bit compile. (But his Visual Studio is defaulting to 32-bit EXEs). – Mirabel 6/2, 2012 at 20:6

@selbie: I'm not in a position to test it, but I don't see why that code shouldn't be expected to work on any implementation that doesn't explicitly put large requests in distinct address ranges. You'd get fragmentation if there were other allocations taking place between when the large allocation is made and when it is shrunk, but this isn't the case here (at least not in the sample code). – Cnemis 6/2, 2012 at 20:40

@Mirabel My guess is that the heap manager in Linux is simply more intelligently implemented. Notice in my example how I allocate 50 MB but then immediately shrink it down to 10K, with no other allocations between these two operations. A smart heap manager should place the next allocation after that 10K block. But what's happening under Windows is that the new allocation is placed at the end of the original 50 MB allocation. It may also be possible this is happening because there were other allocations occurring in the C runtime between my own allocations, causing the fragmentation. – Extrabold 6/2, 2012 at 22:24

You are fragmenting the heap doing this. Whatever you release with realloc() gets added to the list of free blocks. Never to be used again because you always ask for a new block that's larger than that. These free blocks will just accumulate taking up virtual memory until no more is left. Happens pretty quickly when you throw away almost 50 megabytes at a time.

You'll need to rethink your approach.

Messere answered 6/2, 2012 at 19:26 Comment(7)

Thanks for the excellent help! As I mentioned in the comment thread above, at this point I see three solutions: 1) Do a one-time "maximum size" allocation for incoming data, and suffer the performance hit of a memory copy into a smaller buffer (once we know the actual size) when we are ready to pass the data on. 2) Come up with a reasonable "buffer size hint" method so that we don't over-allocate a ridiculous amount of memory. 3) See if we can move our entire project to run in the 64-bit address space. – Extrabold 6/2, 2012 at 20:0

Just do it the other way around. Start small, realloc() when you need more. Doubling the allocation is the standard approach. – Messere 6/2, 2012 at 20:5

Hans, I don't think this issue is related to fragmentation. I tried reducing the size of each successive request, so that it would fit into the space freed from the previous one, and it made no difference. – Cnemis 6/2, 2012 at 20:33

@Harry - Well, by how much? There's a bunch of overhead associated with each block, you'll have to account for that. – Messere 6/2, 2012 at 20:37

@Hans: my current code reduces the allocation size by 1MB per iteration. I also noticed that when you walk the heap the separately allocated address blocks don't include any memory ranges marked as free (PROCESS_HEAP_UNCOMMITTED_RANGE) the way the main address block does. – Cnemis 6/2, 2012 at 22:0

@Harry: I am surprised you didn't get the same result as me when I did something similar. I modified my example to allocate a series of 49 MB buffers immediately after the reaching the "OUT OF MEMORY" condition. These all succeeded (up to the 2GB limit) because they all fit nicely into the "holes" of the current fragmented memory. – Extrabold 6/2, 2012 at 22:28

@asheffie, yes, that's odd. It may depend on the Windows version and/or architecture. I've posted my own test code and results for comparison if you're interested. At any rate, the upshot is the same: regardless of whether free space can be reused within virtual address reservations, the reservations themselves remain the same size which has the effect of artificially fragmenting the address space, because memory allocations never overlap multiple address reservations. Bottom line: you can't do that. :-) – Cnemis 6/2, 2012 at 23:10

After some experimentation, and reading between the lines of the documentation, I've reached the conclusion that large memory allocations (slightly less than 512k for 32-bit, slightly less than 1MB for 64-bit) use address space allocated using VirtualAlloc and reserved for that particular memory block.

If it is not practical to increase your buffer size on the fly as Hans suggests, and you don't want to copy the data, I think your only other option is to reserve a block of address space big enough for all your buffers and allocate space from it yourself. How complicated this would be depends on factors such as how your application uses and frees the buffers, whether more than one buffer is being written to at a time, and on how much data you will be processing over the lifetime of the application.

Additional: here is some test code showing what I see happening:

#include <windows.h>

#include <stdio.h>

DWORD reallocSize = 0x01000;    // 4K

void walkheap(HANDLE heap)
{
    PROCESS_HEAP_ENTRY phe;
    MEMORY_BASIC_INFORMATION mbi;

    phe.lpData = NULL;

    for (;;)
    {
        if (!HeapWalk(heap, &phe))
        {
            printf("HeapWalk: %u\n", GetLastError());
            return;
        }
        printf("%08x %08x %08x %08x %08x ", phe.lpData, phe.cbData, phe.cbOverhead, phe.iRegionIndex, phe.wFlags);
        if (VirtualQuery(phe.lpData, &mbi, sizeof(mbi)) != 0)
        {
            printf("--> %08x\n",mbi.AllocationBase);
        }
        else
        {
            printf("--> (error %u)\n", GetLastError());
        }
    }
}

void alloc(HANDLE heap, DWORD count, DWORD size)
{
    BYTE* ptr;
    BYTE* pRealloc;

    ptr = (BYTE *)HeapAlloc(heap, 0, size);
    printf("Pointer %u is %08x (%08x)\n", count, ptr, size);

    pRealloc = (BYTE*) HeapReAlloc( heap, 0, ptr, reallocSize);
    if( pRealloc != ptr)
    {
        printf("Pointer %u changed to %08x\n", count, pRealloc);
    }
}

int main(int argc, char ** argv)
{
    HANDLE heap;

    heap = HeapCreate(0, 0, 0);
    if (heap == NULL)
    {
        printf("HeapCreate: %u\n", GetLastError());
        return 1;
    }

    walkheap(heap);

    alloc(heap, 1, 0x08000);
    alloc(heap, 2, 0x08000);
    alloc(heap, 3, 0x08000);
    alloc(heap, 4, 0x08000);

    alloc(heap, 10, 0x20000000);
    alloc(heap, 11, 0x20000000);
    alloc(heap, 12, 0x20000000);
    alloc(heap, 13, 0x20000000);

    alloc(heap, 20, 0x10000000);
    alloc(heap, 21, 0x10000000);
    alloc(heap, 22, 0x10000000);
    alloc(heap, 23, 0x10000000);

    walkheap(heap);

    return 0;
}

and my results (see the PROCESS_HEAP_ENTRY structure):

Address  Alloc    Overhead Region   Flags        Virtual Address Range Base

00420000 00000588 00000000 00000000 00000001 --> 00420000
004207e8 000007f8 00000010 00000000 00000000 --> 00420000
00421000 0003f000 00000000 00000000 00000002 --> 00420000
HeapWalk: 259
Pointer 1 is 004207e0 (00008000)
Pointer 2 is 004217f8 (00008000)
Pointer 3 is 00422810 (00008000)
Pointer 4 is 00423828 (00008000)
Pointer 10 is 00740020 (20000000)
Pointer 11 is 20750020 (20000000)
Pointer 12 is 52580020 (20000000)
Pointer 13 is 00000000 (20000000)
Pointer 20 is 40760020 (10000000)
Pointer 21 is 00000000 (10000000)
Pointer 22 is 00000000 (10000000)
Pointer 23 is 00000000 (10000000)
00420000 00000588 00000000 00000000 00000001 --> 00420000
004207e0 00001000 00000018 00000000 00000004 --> 00420000
004217f8 00001000 00000018 00000000 00000004 --> 00420000
00422810 00001000 00000018 00000000 00000004 --> 00420000
00423828 00001000 00000018 00000000 00000004 --> 00420000
00424848 0000e798 00000010 00000000 00000000 --> 00420000
00433000 0002d000 00000000 00000000 00000002 --> 00420000
00740020 00001000 00000020 00000040 00000024 --> 00740000
20750020 00001000 00000020 00000040 00000024 --> 20750000
52580020 00001000 00000020 00000040 00000024 --> 52580000
40760020 00001000 00000020 00000040 00000024 --> 40760000
HeapWalk: 259

It can be seen that the small allocations are packed tightly, but the big allocations are all in separate virtual address allocations. The free space in the separate allocations isn't being used. Also, only the main virtual address allocation has any heap blocks that are flagged as free space (flag equal to 0 or 2).

Cnemis answered 6/2, 2012 at 20:32 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags