Does malloc allocate memory from HD also?
Implementation of malloc()
depends on libc
implementation and operating system (OS). Typically malloc()
doesn't always request RAM from the OS but returns a pointer to previously allocated memory block "owned" by libc
.
In case of POSIX compatible systems, this libc
controlled memory area is usually increased using syscall brk()
. That doesn't allow releasing any memory between two still existing allocations which causes the process to look still using all the RAM after allocating areas A, B, C in sequence and releasing B. This is because areas A and C around the area B are still in use so the memory allocated from the OS cannot be returned.
Many modern malloc()
implementations have some kind of heuristic where small allocations use the memory area reserved via brk()
and "big" allocations use anonymous virtual memory blocks reserved via mmap()
using MAP_ANONYMOUS
flag. This allows immediately returning these big allocations when free()
is later called. Typically the runtime performance of mmap()
is slightly slower than using previously reserved memory which is the reason malloc()
implements this heuristic.
Both brk()
and mmap()
allocate virtual memory from the OS. And virtual memory can be always backed up by swap which may be stored in any storage that the OS supports, including HDD.
In case you run Windows, the syscalls have different names but the underlying behavior is probably about the same.
What was the reason for above behaviour?
Since your example code never touched the memory, I'd guess you're seeing behavior where OS implements copy-on-write for virtual RAM and the memory is mapped to shared page with whole page filled with zeroes by default. Modern operating systems do this because many programs allocate more RAM than they actually need and using shared zero page by default for all memory allocations avoids needing to use real RAM for these allocations.
If you want to test how OS handles your loop and actually reserve true storage, you need to write something to the memory you allocated. For x86 compatible hardware you only need to write one byte per each 4096 byte segment because page size is 4096 and the hardware cannot implement copy-on-write behavior for smaller segments; once one byte is modified, the whole 4096 byte segment called page must be reserved for your process. I'm not aware of any modern CPU that would support smaller than 4096 byte pages. Modern Intel CPUs support 2 MB and 1 GB pages in addition to 4096 byte pages but the 1 GB pages are rarely used because the overhead of using 2 MB pages is small enough for any sensible RAM amounts. 1 GB pages might make sense if your system has hundreds of terabytes of RAM.
So basically your program only tested reserving virtual memory without ever using said virtual memory. Your OS probably has special optimization for this which avoids needing more than 4 KB of RAM to support this.
Unless your objective is to try to measure the overhead caused by your malloc()
implementation, you should avoid trying to allocate memory block smaller than 16-32 bytes. For mmap()
allocations the minimum possible overhead is 8 bytes per allocation on x86-64 hardware due the data needed to return the memory to the operating system so it really doesn't make sense for malloc()
to use mmap()
syscall for a single 4 byte allocation.
The overhead is needed to keep track of memory allocations because the memory is freed using void free(void*)
so memory allocation routines must keep track of the allocated memory segment size somewhere. Many malloc()
implementations also need additional metadata and if they need to keep track of any memory addresses, those need 8 bytes per address.
If you truly want to search for the limits of your system, you should probably do binary search for the limit where malloc()
fails. In practice, you try to allocate ..., 1KB, 2KB, 4KB, 8KB, ..., 32 GB which then fails and you know that the real world limit is between 16 GB and 32 GB. You can then split this size in half and figure out the exact limit with additional testing. If you do this kind of search, it may be easier to always release any successful allocation and reserve the test block with a single malloc() call. That should also avoid accidentally accounting for malloc()
overhead so much because you need only one allocation at any time at max.
Update: As pointed out by Peter Cordes in the comments, your malloc()
implementation may be writing bookkeeping data about your allocations in the reserved RAM which causes real memory to be used and that can cause system to start swapping so heavily that you cannot recover it in any sensible timescale without shutting down the computer. In case you're running Linux and have enabled "Magic SysRq" keys, you could just press Alt
+SysRq
+f
to kill the offending process taking all the RAM and system would run just fine again. It is possible to write malloc()
implementation that doesn't usually touch the RAM allocated via brk()
and I assumed you would be using one. (This kind of implementation would allocate memory in 2^n sized segments and all similarly sized segments are reserved in the same range of addresses. When free()
is later called, the malloc()
implementation knows the size of the allocation from the address and bookkeeping about free memory segments are kept in separate bitmap in single location.) In case of Linux, malloc()
implementation touching the reserved pages for internal bookkeeping is called dirtying the memory, which prevents sharing memory pages because of copy-on-write handling.
Why didn't loop break at any point of time?
If your OS implements the special behavior described above and you're running 64-bit system, you're not going to run out of virtual memory in any sensible timescale so your loop seems infinite.
Why wasn't there any allocation failure?
You didn't actually use the memory so you're allocating virtual memory only. You're basically increasing the maximum pointer value allowed for your process but since you never access the memory, the OS never bothers the reserve any physical memory for your process.
In case you're running Linux and want the system to enforce virtual memory usage to match actually available memory, you have to write 2
to kernel setting /proc/sys/vm/overcommit_memory
and maybe adjust overcommit_ratio
, too. See https://unix.stackexchange.com/q/441364/20336 for details about memory overcommit on Linux. As far as I know, Windows implements overcommit, too, but I don't know how to adjust its behavior.
malloc(1ULL<<30)
vs. many tiny mallocs. You'll run out of virtual address space the first way before you run out RAM+swap to store the bookkeeping info, instead of swap thrashing as you use up all the physical RAM. When your allocations are many pages, most of the pages are untouched even if malloc stores bookkeeping info at the start of each allocation. And tiny allocations use more space for bookkeeping and alignment than for the actual 4-byte allocation, so if you were counting total alloced size, huge overhead. – Heliotropin