Typical implementations of malloc
use brk
/sbrk
as the primary means of claiming memory from the OS. However, they also use mmap
to get chunks for large allocations. Is there a real benefit to using brk
instead of mmap
, or is it just tradition? Wouldn't it work just as well to do it all with mmap
?
(Note: I use sbrk
and brk
interchangeably here because they are interfaces to the same Linux system call, brk
.)
For reference, here are a couple of documents describing the glibc malloc
:
GNU C Library Reference Manual: The GNU Allocator
https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html
glibc wiki: Overview of Malloc
https://sourceware.org/glibc/wiki/MallocInternals
What these documents describe is that sbrk
is used to claim a primary arena for small allocations, mmap
is used to claim secondary arenas, and mmap
is also used to claim space for large objects ("much larger than a page").
The use of both the application heap (claimed with sbrk
) and mmap
introduces some additional complexity that might be unnecessary:
Allocated Arena - the main arena uses the application's heap. Other arenas use
mmap
'd heaps. To map a chunk to a heap, you need to know which case applies. If this bit is 0, the chunk comes from the main arena and the main heap. If this bit is 1, the chunk comes frommmap
'd memory and the location of the heap can be computed from the chunk's address.
[Glibc malloc is derived from ptmalloc
, which was derived from dlmalloc, which was started in 1987.]
The jemalloc manpage (http://jemalloc.net/jemalloc.3.html) has this to say:
Traditionally, allocators have used
sbrk(2)
to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. Ifsbrk(2)
is supported by the operating system, this allocator uses bothmmap(2)
and sbrk(2), in that order of preference; otherwise onlymmap(2)
is used.
So, they even say here that sbrk
is suboptimal but they use it anyway, even though they've already gone to the trouble of writing their code so that it works without it.
[Writing of jemalloc started in 2005.]
UPDATE: Thinking about this more, that bit about "in order of preference" gives me a line on inquiry. Why the order of preference? Are they just using sbrk
as a fallback in case mmap
is not supported (or lacks necessary features), or is it possible for the process to get into some state where it can use sbrk
but not mmap
? I'll look at their code and see if I can figure out what it's doing.
I'm asking because I'm implementing a garbage collection system in C, and so far I see no reason to use anything besides mmap
. I'm wondering if there's something I'm missing, though.
(In my case I have an additional reason to avoid brk
, which is that I might need to use malloc
at some point.)
mmap
to allocate a pool for thousands of smaller allocations, right? Not onemmap
per allocation like you'd do for large ones – Hardshipmalloc()
that usemmap()
. – Pyrographymmap
for large allocations. Thejemalloc
negative comments aboutbrk
apply most strongly to using it for everything, like ancient Unix history malloc implementations. (especially fragmentation: inability to give back memory to the kernel if there's a long-term small allocation after a short-term large allocation.) – Nuzzle