About sbrk() and malloc()

Asked 13/12, 2015 at 7:18 Answered 13/12, 2015 at 7:32

I've read the linux manual about sbrk() thoroughly:

sbrk() changes the location of the program break, which defines the end of the process's data segment (i.e., the program break is the first location after the end of the uninitialized data segment).

And I do know that user space memory's organization is like the following:

The problem is: When I call sbrk(1), why does it say I am increasing the size of heap? As the manual says, I am changing the end position of "data segment & bss". So, what increases should be the size of data segment & bss, right?

Majormajordomo answered 13/12, 2015 at 7:18 Comment(2)

Hint: Which end of your diagram corresponds to higher (numerically greater) addresses, and which end to lower addresses? Which way does the stack grow? Which way does the heap grow? – Linnealinnean 13/12, 2015 at 7:40

In practice, your picture is very naive and does not really show what is happening. See my answer – Allonge 13/12, 2015 at 7:47

The data and bss segments are a fixed size. The space allocated to the process after the end of those segments is therefore not a part of those segments; it is merely contiguous with them. And that space is called the heap space and is used for dynamic memory allocation.

If you want to regard it as 'extending the data/bss segment', that's fine too. It won't make any difference to the behaviour of the program, or the space that's allocated, or anything.

The manual page on Mac OS X indicates you really shouldn't be using them very much:

The brk and sbrk functions are historical curiosities left over from earlier days before the advent of virtual memory management. The brk() function sets the break or lowest address of a process's data segment (uninitialized data) to addr (immediately above bss). Data addressing is restricted between addr and the lowest stack pointer to the stack segment. Memory is allocated by brk in page size pieces; if addr is not evenly divisible by the system page size, it is increased to the next page boundary.

The current value of the program break is reliably returned by sbrk(0) (see also end(3)). The getrlimit(2) system call may be used to determine the maximum permissible size of the data segment; it will not be possible to set the break beyond the rlim_max value returned from a call to getrlimit, e.g. etext + rlp->rlim_max (see end(3) for the definition of etext).

It is mildly exasperating that I can't find a manual page for end(3), despite the pointers to look at it. Even this (slightly old) manual page for sbrk() does not have a link for it.

Quach answered 13/12, 2015 at 7:23 Comment(2)

Then, " the program break is the first location after the end of the uninitialized data segment" given by the manual is quite confusing.... Because uninitialized data segment refers to end of bss, right? – Majormajordomo 13/12, 2015 at 7:26

Here is the one from FreeBSD: end(3). – Quattlebaum 13/12, 2015 at 9:37

Notice that today sbrk(2) is rarely used. Most malloc implementations are using mmap(2) -at least for large allocations- to acquire a memory segment (and munmap to release it). Quite often, free simply marks a memory zone to be reusable by some future malloc (and does not release any memory to the Linux kernel).

^{(so practically, the heap of a modern linux process is made of several segments, so is more subtle than your picture; and multi-threaded processes have one stack per thread)}

Use proc(5), notably /proc/self/maps and /proc/$pid/maps, to understand the virtual address space of some process. Try first to understand the output of cat /proc/self/maps (showing the address space of that cat command) and of cat /proc/$$/maps (showing the address space of your shell). Try also to look at the maps pseudo-file for your web browser (e.g. cat /proc/$(pidof firefox)/maps or cat /proc/$(pidof iceweasel)/maps etc...); I have more than a thousand lines (so process segments) in it.

Use strace(1) to understand the system calls done by a given command or process.

Take advantage that on Linux most (and probably all) C standard library implementations are free software, so you can study their source code. The source code of musl-libc is quite easy to read.

Read also about ELF, ASLR, dynamic linking & ld-linux(8), and the Advanced Linux Programming book then syscalls(2)

Allonge answered 13/12, 2015 at 7:32 Comment(0)

Recommended topics

Hot tags