Maximum memory which malloc can allocate

Asked 9/5, 2010 at 16:31 Answered 31/8, 2023 at 19:55

Solved c memory-management operating-system malloc cpu-architecture

I was trying to figure out how much memory I can malloc to maximum extent on my machine (1 Gb RAM 160 Gb HD Windows platform).

I read that the maximum memory malloc can allocate is limited to physical memory (on heap).

Also when a program exceeds consumption of memory to a certain level, the computer stops working because other applications do not get enough memory that they require.

So to confirm, I wrote a small program in C:

int main(){  
    int *p;
    while(1){
        p=(int *)malloc(4);
        if(!p)break;
    }   
}

I was hoping that there would be a time when memory allocation would fail and the loop would break, but my computer hung as it was an infinite loop.

I waited for about an hour and finally I had to force shut down my computer.

Some questions:

Does malloc allocate memory from HD also?
What was the reason for above behaviour?
Why didn't loop break at any point of time?
Why wasn't there any allocation failure?

Prebend answered 9/5, 2010 at 16:31 Comment(4)

Anyway, why malloc(4), and why not malloc(4096), or malloc(8192), or else ? – Sinusoidal 9/5, 2010 at 18:56

ofcourse it can be anything which is multiple of sizeof int. Isn't it? – Prebend 9/5, 2010 at 19:4

don't cast the result of malloc in C – Geotaxis 31/12, 2016 at 7:28

No, there's a very significant difference between large mallocs like malloc(1ULL<<30) vs. many tiny mallocs. You'll run out of virtual address space the first way before you run out RAM+swap to store the bookkeeping info, instead of swap thrashing as you use up all the physical RAM. When your allocations are many pages, most of the pages are untouched even if malloc stores bookkeeping info at the start of each allocation. And tiny allocations use more space for bookkeeping and alignment than for the actual 4-byte allocation, so if you were counting total alloced size, huge overhead. – Heliotropin 25/8, 2022 at 13:4

I read that the maximum memory malloc can allocate is limited to physical memory (on heap).

Wrong: most computers/OSs support virtual memory, backed by disk space.

Some questions: does malloc allocate memory from HDD also?

malloc asks the OS, which in turn may well use some disk space.

What was the reason for above behavior? Why didn't the loop break at any time?

Why wasn't there any allocation failure?

You just asked for too little at a time: the loop would have broken eventually (well after your machine slowed to a crawl due to the large excess of virtual vs physical memory and the consequent super-frequent disk access, an issue known as "thrashing") but it exhausted your patience well before then. Try getting e.g. a megabyte at a time instead.

When a program exceeds consumption of memory to a certain level, the computer stops working because other applications do not get enough memory that they require.

A total stop is unlikely, but when an operation that normally would take a few microseconds ends up taking (e.g.) tens of milliseconds, those four orders of magnitude may certainly make it feel as if the computer had basically stopped, and what would normally take a minute could take a week.

Uppsala answered 9/5, 2010 at 16:38 Comment(3)

your memory size is 1GB doesnt mean that malloc will go all the way there. It really depends upon the amount of memory your OS assigns to your process. Which by looking at the code in this case will be very low. From there on it goes on to allocate memory on your virtual memory. – Puce 30/5, 2010 at 15:3

Actually on some platforms malloc might succeed even though the requested size exceeds RAM+swap size. On linux for example asking for memory means to map /dev/zero which in turn means just mark up pages as being zero - unless you change the content it doesn't have to consume much memory or swap space. – Trescott 31/7, 2015 at 14:0

But it will fail if the requested size exceeds the address space(on a 32 bit system that's ~4 GB; on most x86_64 systems that's ~16 TB). If you put something in the memory, it will have to allocate real memory, but it can be compressed if your OS supports compressed memory. – Spavined 1/6, 2017 at 16:50

I know this thread is old, but for anyone willing to give it a try oneself, use this code snipped

#include <stdlib.h>

int main() {
int *p;
while(1) {
    int inc=1024*1024*sizeof(char);
    p=(int*) calloc(1,inc);
    if(!p) break;
    }
}

run

$ gcc memtest.c
$ ./a.out

upon running, this code fills up ones RAM until killed by the kernel. Using calloc instead of malloc to prevent "lazy evaluation". Ideas taken from this thread: Malloc Memory Questions

This code quickly filled my RAM (4Gb) and then in about 2 minutes my 20Gb swap partition before it died. 64bit Linux of course.

Insectarium answered 19/1, 2011 at 13:30 Comment(4)

I just tried the same program on a machine with 192Gb memory/4Gb swap. Within a minute it consumed up to 175Gb, then the swap was slowly filled. When there were only 24kb of swap left, it got killed. – Insectarium 22/1, 2013 at 7:59

What you call "lazy evaluation" presumably allows the kernel to use a zero page for each page of allocated but unwritten memory. Compression (especially for swap) and even deduplication (as currently done by some hypervisors) may reduce actual memory required. Of course, malloc has storage overhead, page tables add overhead, the program has non-heap memory, the OS uses memory, etc. – Mooch 17/3, 2014 at 14:6

A good calloc(3) implementation doesn't touch the pages after it gets them from mmap(2), because they're already zeroed. The reason this actually eventually triggers the OOM killer is that malloc's extra bookkeeping info uses memory. If you strace it, you'll see mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4fc4d14000. The allocation size, 1052672, is 1MiB + 4096, and that extra page is presumably what glibc's malloc actually dirties. e.g. on my desktop with 5GiB of physical memory, I can calloc 16GiB (in 1MiB chunks) without disk activity. – Heliotropin 6/6, 2016 at 17:29

The untouched virtual pages are all still mapped to the same physical zeroed page. – Heliotropin 6/6, 2016 at 17:30

/proc/sys/vm/overcommit_memory controls the maximum on Linux

On Ubuntu 19.04 for example, we can easily see that malloc is implemented with mmap(MAP_ANONYMOUS by using strace.

Then man proc then describes how /proc/sys/vm/overcommit_memory controls the maximum allocation:

This file contains the kernel virtual memory accounting mode. Values are:

0: heuristic overcommit (this is the default)

1: always overcommit, never check

2: always check, never overcommit

In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed".

In mode 1, the kernel pretends there is always enough memory, until memory actually runs out. One use case for this mode is scientific computing applications that em‐ ploy large sparse arrays. In Linux kernel versions before 2.6.0, any nonzero value implies mode 1.

In mode 2 (available since Linux 2.6), the total virtual address space that can be allocated (CommitLimit in /proc/meminfo) is calculated as
CommitLimit = (total_RAM - total_huge_TLB) * overcommit_ratio / 100 + total_swap
where:

total_RAM is the total amount of RAM on the system;

total_huge_TLB is the amount of memory set aside for huge pages;

overcommit_ratio is the value in /proc/sys/vm/overcommit_ratio; and

total_swap is the amount of swap space.

For example, on a system with 16GB of physical RAM, 16GB of swap, no space dedicated to huge pages, and an overcommit_ratio of 50, this formula yields a Com‐ mitLimit of 24GB.

Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then CommitLimit is instead calculated as:
CommitLimit = overcommit_kbytes + total_swap
See also the description of /proc/sys/vm/admiin_reserve_kbytes and /proc/sys/vm/user_reserve_kbytes.

Documentation/vm/overcommit-accounting.rst in the 5.2.1 kernel tree also gives some information, although lol a bit less:

The Linux kernel supports the following overcommit handling modes

0 Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.

1 Always overcommit. Appropriate for some scientific applications. Classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.

2 Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.

Useful for applications that want to guarantee their memory allocations will be available in the future without having to initialize every page.

Minimal experiment

We can easily see the maximum allowed value with:

main.c

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *chars;
    size_t nbytes;

    /* Decide how many ints to allocate. */
    if (argc < 2) {
        nbytes = 2;
    } else {
        nbytes = strtoull(argv[1], NULL, 0);
    }

    /* Allocate the bytes. */
    chars = mmap(
        NULL,
        nbytes,
        PROT_READ | PROT_WRITE,
        MAP_SHARED | MAP_ANONYMOUS,
        -1,
        0
    );

    /* This can happen for example if we ask for too much memory. */
    if (chars == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }

    /* Free the allocated memory. */
    munmap(chars, nbytes);

    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run to allocate 1GiB and 1TiB:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out 0x40000000
./main.out 0x10000000000

We can then play around with the allocation value to see what the system allows.

I can't find a precise documentation for 0 (the default), but on my 32GiB RAM machine it does not allow the 1TiB allocation:

mmap: Cannot allocate memory

If I enable unlimited overcommit however:

echo 1 | sudo tee /proc/sys/vm/overcommit_memory

then the 1TiB allocation works fine.

Mode 2 is well documented, but I'm lazy to carry out precise calculations to verify it. But I will just point out that in practice we are allowed to allocate about:

overcommit_ratio / 100

of total RAM, and overcommit_ratio is 50 by default, so we can allocate about half of total RAM.

VSZ vs RSS and the out-of-memory killer

So far, we have just allocated virtual memory.

However, at some point of course, if you use enough of those pages, Linux will have to start killing some processes.

I have illustrated that in detail at: What is RSS and VSZ in Linux memory management

Discoid answered 28/8, 2019 at 7:51 Comment(0)

Try this

#include <stdlib.h>
#include <stdio.h>

main() {
    int Mb = 0;
    while (malloc(1<<20)) ++Mb;
    printf("Allocated %d Mb total\n", Mb);
}

Include stdlib and stdio for it.
This extract is taken from deep c secrets.

Dunton answered 9/9, 2012 at 6:18 Comment(3)

Darn... with a swap of like 100Gb you're going to wait quite a bit of time before you get your result. And better not have anything else running on your computer at that time! – Illa 7/5, 2014 at 21:3

On Linux, with the default virtual memory settings, your program will eventually be killed (with SIGKILL), rather than have malloc actually return NULL. – Heliotropin 6/6, 2016 at 17:33

Like others have noted, this won't work as expected. (I guess someone needs to write Deep 'Deep C Secrets' Secrets). It will be killed rather than returning a null pointer. It also may well use swap or even disk-backed space depending on your system. And if your system uses memory overcommit together with lazy evaluation of allocated memory, it can easily appear to support tens of thousands of gigabytes, etc., before triggering the OS logic that says to kill the process for out-of-memory reasons. – Fluker 28/1, 2018 at 19:17

malloc does its own memory management, managing small memory blocks itself, but ultimately it uses the Win32 Heap functions to allocate memory. You can think of malloc as a "memory reseller".

The windows memory subsystem comprises physical memory (RAM) and virtual memory (HD). When physical memory becomes scarce, some of the pages can be copied from physical memory to virtual memory on the hard drive. Windows does this transparently.

By default, Virtual Memory is enabled and will consume the available space on the HD. So, your test will continue running until it has either allocated the full amount of virtual memory for the process (2GB on 32-bit windows) or filled the hard disk.

Fearful answered 9/5, 2010 at 16:40 Comment(0)

As per C90 standard guarantees that you can get at least one object 32 kBytes in size, and this may be static, dynamic, or automatic memory. C99 guarantees at least 64 kBytes. For any higher limit, refer your compiler's documentation.

Also, malloc's argument is a size_t and the range of that type is [0,SIZE_MAX], so the maximum you can request is SIZE_MAX, which value varies upon implementation and is defined in <limits.h>.

Acknowledge answered 1/2, 2012 at 10:18 Comment(0)

I don't actually know why that failed, but one thing to note is that `malloc(4)" may not actually give you 4 bytes, so this technique is not really an accurate way to find your maximum heap size.

I found this out from my question here.

For instance, when you declare 4 bytes of memory, the space directly before your memory could contain the integer 4, as an indication to the kernel of how much memory you asked for.

Flange answered 9/5, 2010 at 16:38 Comment(2)

indeed, malloc usually give a multiple of 16 bytes. There is two reasons. One is that standard says malloc should return a pointer compatible with any data alignment. Thus addresses separated by less than 16 bytes coulnd't be returned. The other reason is that freed blocks usually store some data used for internal memory management and a block too short - say 4 bytes - couldn't store it. – Lambent 9/5, 2010 at 16:47

@Lambent [i] freed blocks usually store some data used for internal memory management and a block too short - say 4 bytes - couldn't store it.[/i] Can you mention what kind of data? – Prebend 9/5, 2010 at 17:24

Does malloc allocate memory from HD also?

Implementation of malloc() depends on libc implementation and operating system (OS). Typically malloc() doesn't always request RAM from the OS but returns a pointer to previously allocated memory block "owned" by libc.

In case of POSIX compatible systems, this libc controlled memory area is usually increased using syscall brk(). That doesn't allow releasing any memory between two still existing allocations which causes the process to look still using all the RAM after allocating areas A, B, C in sequence and releasing B. This is because areas A and C around the area B are still in use so the memory allocated from the OS cannot be returned.

Many modern malloc() implementations have some kind of heuristic where small allocations use the memory area reserved via brk() and "big" allocations use anonymous virtual memory blocks reserved via mmap() using MAP_ANONYMOUS flag. This allows immediately returning these big allocations when free() is later called. Typically the runtime performance of mmap() is slightly slower than using previously reserved memory which is the reason malloc() implements this heuristic.

Both brk() and mmap() allocate virtual memory from the OS. And virtual memory can be always backed up by swap which may be stored in any storage that the OS supports, including HDD.

In case you run Windows, the syscalls have different names but the underlying behavior is probably about the same.

What was the reason for above behaviour?

Since your example code never touched the memory, I'd guess you're seeing behavior where OS implements copy-on-write for virtual RAM and the memory is mapped to shared page with whole page filled with zeroes by default. Modern operating systems do this because many programs allocate more RAM than they actually need and using shared zero page by default for all memory allocations avoids needing to use real RAM for these allocations.

If you want to test how OS handles your loop and actually reserve true storage, you need to write something to the memory you allocated. For x86 compatible hardware you only need to write one byte per each 4096 byte segment because page size is 4096 and the hardware cannot implement copy-on-write behavior for smaller segments; once one byte is modified, the whole 4096 byte segment called page must be reserved for your process. I'm not aware of any modern CPU that would support smaller than 4096 byte pages. Modern Intel CPUs support 2 MB and 1 GB pages in addition to 4096 byte pages but the 1 GB pages are rarely used because the overhead of using 2 MB pages is small enough for any sensible RAM amounts. 1 GB pages might make sense if your system has hundreds of terabytes of RAM.

So basically your program only tested reserving virtual memory without ever using said virtual memory. Your OS probably has special optimization for this which avoids needing more than 4 KB of RAM to support this.

Unless your objective is to try to measure the overhead caused by your malloc() implementation, you should avoid trying to allocate memory block smaller than 16-32 bytes. For mmap() allocations the minimum possible overhead is 8 bytes per allocation on x86-64 hardware due the data needed to return the memory to the operating system so it really doesn't make sense for malloc() to use mmap() syscall for a single 4 byte allocation.

The overhead is needed to keep track of memory allocations because the memory is freed using void free(void*) so memory allocation routines must keep track of the allocated memory segment size somewhere. Many malloc() implementations also need additional metadata and if they need to keep track of any memory addresses, those need 8 bytes per address.

If you truly want to search for the limits of your system, you should probably do binary search for the limit where malloc() fails. In practice, you try to allocate ..., 1KB, 2KB, 4KB, 8KB, ..., 32 GB which then fails and you know that the real world limit is between 16 GB and 32 GB. You can then split this size in half and figure out the exact limit with additional testing. If you do this kind of search, it may be easier to always release any successful allocation and reserve the test block with a single malloc() call. That should also avoid accidentally accounting for malloc() overhead so much because you need only one allocation at any time at max.

Update: As pointed out by Peter Cordes in the comments, your malloc() implementation may be writing bookkeeping data about your allocations in the reserved RAM which causes real memory to be used and that can cause system to start swapping so heavily that you cannot recover it in any sensible timescale without shutting down the computer. In case you're running Linux and have enabled "Magic SysRq" keys, you could just press Alt+SysRq+f to kill the offending process taking all the RAM and system would run just fine again. It is possible to write malloc() implementation that doesn't usually touch the RAM allocated via brk() and I assumed you would be using one. (This kind of implementation would allocate memory in 2^n sized segments and all similarly sized segments are reserved in the same range of addresses. When free() is later called, the malloc() implementation knows the size of the allocation from the address and bookkeeping about free memory segments are kept in separate bitmap in single location.) In case of Linux, malloc() implementation touching the reserved pages for internal bookkeeping is called dirtying the memory, which prevents sharing memory pages because of copy-on-write handling.

Why didn't loop break at any point of time?

If your OS implements the special behavior described above and you're running 64-bit system, you're not going to run out of virtual memory in any sensible timescale so your loop seems infinite.

Why wasn't there any allocation failure?

You didn't actually use the memory so you're allocating virtual memory only. You're basically increasing the maximum pointer value allowed for your process but since you never access the memory, the OS never bothers the reserve any physical memory for your process.

In case you're running Linux and want the system to enforce virtual memory usage to match actually available memory, you have to write 2 to kernel setting /proc/sys/vm/overcommit_memory and maybe adjust overcommit_ratio, too. See https://unix.stackexchange.com/q/441364/20336 for details about memory overcommit on Linux. As far as I know, Windows implements overcommit, too, but I don't know how to adjust its behavior.

Oliy answered 25/8, 2022 at 9:22 Comment(9)

You discuss never touching the memory. That would make sense with larger allocations of multiple pages each. But the OP used malloc(4). There won't be untouched pages; they'll all be full of bookkeeping info (like the size of the allocation, typically stored in the bytes bytes before the pointer returned by malloc). Also maybe gaps for alignment; alignof(max_align_t) is only 8 on Windows x64, but IIRC malloc there usually returns memory aligned by 16 like on x86-64 Linux where max_align_t is 16 bytes. – Heliotropin 25/8, 2022 at 13:12

So what happened in the question is just swap thrashing as they allocate+dirty page after page, bringing the system to its knees and making the GUI unresponsive so they had a hard time killing this one. It is useful to describe what happens when you allocate lots of untouched pages (eventually running out of virtual address space), but the one mistake in your answer is saying that's what the code in the question was doing. – Heliotropin 25/8, 2022 at 13:14

Good point about malloc() internal bookkeeping potentially touching the memory and causing copy-on-write to create private pages! That would indeed cause trashing when system tries to push least-recently used pages to swap. – Oliy 25/8, 2022 at 13:28

Interesting, I'd wondered if malloc/free implementations were possible using the address to figure out the size, without per-allocation bookkeeping. Cool to hear that some actually do this, and yeah I see how that would work well for a few common sizes, especially small power-of-2 allocations. – Heliotropin 26/8, 2022 at 10:50

I have never checked but my understanding is that GNU libc implements malloc() like I described to mostly avoid bookkeeping per allocation. – Oliy 26/8, 2022 at 11:2

I've only looked much at larger allocations, especially above the mmap threshold. Then it typically returns a pointer that's p % 4096 == 16, so I assume it must put bookkeeping info (like a size) before the pointer it's going to return, which is aligned by 16 to satisfy alignof(max_align_t) == 16 in x86-64 SysV. This is always misaligned for AVX (32-byte vectors), unfortunately. I had assumed it would work the same way for small allocations, but maybe not. From the symptoms the OP describes, their Windows malloc did work that way. – Heliotropin 26/8, 2022 at 11:13

I tested with glibc 2.35 on x86-64 Arch Linux; it touches all the pages it allocates with brk, with 4-byte mallocs. I replaced the OP's infinite loop with one up to 10M, still with 4 byte allocations, then put a sleep(100) after it so I could observe the RSS. Compiling with -O1 or lower so the unused malloc doesn't get optimized away, RSS is 306.1MiB, VIRT is 307. So every page was touched, and it allocates about 32 bytes per 4-byte malloc. (306.1 MiB / 10e6 is 32.09). Probably 16 for bookkeeping + alignment for that, and another 16 for the allocation + alignment padding for that. – Heliotropin 26/8, 2022 at 11:28

Another test with malloc(16) had basically the same resident-set size, 305.9 MiB (amount of memory backed by physical RAM, i.e. how much is dirty if nothing's been swapped out). That was expected. More interesting is that malloc(17) also produces the same result, 32 bytes used per malloc(17) or malloc(24). malloc(25) bumps up to 458.6 MiB used, so 48 bytes per allocation. So bookkeeping overhead is 8 bytes, probably just a size. Again, this is all glibc 2.35 on x86-64 Linux. – Heliotropin 26/8, 2022 at 11:31

Great info! The actual implementation can be found here: github.com/lattera/glibc/blob/master/malloc/malloc.c – There's a lot of documentation in the comments and it seems to keep two size_t objects for every allocated segment. It contains interesting optimization that it keeps extra bookkeeping details in free memory segments (areas that are reserved from OS using brk() but not released). So it seems the current implementation doesn't implement the possible optimization I described above to avoid keeping track of size per allocation. – Oliy 26/8, 2022 at 11:35

I use the MemAvailable parameter in /proc/meminfo on Linux to make an initial estimate of the free RAM then use a binary chop to confirm. My intention is to determine the largest physical RAM that I can claim using malloc but it's still a bit hacky and could potentially cause swapping, though in practice it seems to have been behaving fairly well, at least on the 64 bit Raspberry Pi 4 systems I'm using. Have a look at https://github.com/gtoal/biggest-malloc and let me know if it works or doesn't for you and if not, which linux, what memory, and how did it fail? (post on github rather than here. Thanks.).

Thierry answered 31/8, 2023 at 19:55 Comment(0)

-3

when first time you allocate any size to *p, every next time you leave that memory to be unreferenced. That means

at a time your program is allocating memory of 4 bytes only

. then how can you thing you have used entire RAM, that's why SWAP device( temporary space on HDD) is out of discussion. I know an memory management algorithm in which when no one program is referencing to memory block, that block is eligible to allocate for programs memory request. That's why you are just keeping busy to RAM Driver and that's why it can't give chance to service other programs. Also this a dangling reference problem.

Ans : You can at most allocate the memory of your RAM size. Because no program has access to swap device.

I hope your all questions has got satisfactory answers.

Toxin answered 7/7, 2014 at 13:36 Comment(4)

That's simply incorrect. I tried to run 8Gb allocations in a loop, it managed to run for 16382 allocations, thats 128Tb - I don't have that much RAM at least. In fact not even that much swap (see my comment to the accepted answer). – Trescott 31/7, 2015 at 14:45

Because of compressed memory maybe? – Spavined 1/6, 2017 at 16:58

"you leave that memory to be unreferenced" there is no reference counting in place here. The memory is not released despite not having anything point to it. – Icao 3/5, 2019 at 7:50

@JustinCB: No, because of overcommit and lazy allocation: not actually allocating physical pages to back those virtual pages until/unless they're touched. And read-only access would just make a page copy-on-write mapped to a shared page of zeros. (IDK if "unreferenced" here was intended as "untouched", or to describe a memory leak.) Anyway, this answer isn't correct, untouched allocation is limited only by virtual address space (yeah, about 128 TiB is the low half of the canonical range on x86-64 with 48-bit virtual adds) as long as you allocate in large enough chunks, not physical RAM. – Heliotropin 25/8, 2022 at 12:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags