How does copy-on-write work in fork()?

Asked 27/11, 2014 at 0:52 Answered 27/3, 2015 at 20:54

I want to know how copy-on-write happens in fork().

Assuming we have a process A that has a dynamical int array:

int *array = malloc(1000000*sizeof(int));

Elements in array are initialized to some meaningful values. Then, we use fork() to create a child process, namely B. B will iterate the array and do some calculations:

for(a in array){
    a = a+1;
}

I know B will not copy the entire array immediately, but when does the child B allocate memory for array? during fork()?
Does it allocate the entire array all at once, or only a single integer for a = a+1?
a = a+1; how does this happen? Does B read data from A and write new data to its own array?

I wrote some code to explore how COW works. My environment: ubuntu 14.04, gcc4.8.2

#include <stdlib.h>
#include <stdio.h>
#include <sys/sysinfo.h>

void printMemStat(){
    struct sysinfo si;
    sysinfo(&si);
    printf("===\n");
    printf("Total: %llu\n", si.totalram);
    printf("Free: %llu\n", si.freeram);
}

int main(){
    long len = 200000000;
    long *array = malloc(len*sizeof(long));
    long i = 0;
    for(; i<len; i++){
        array[i] = i;
    }

    printMemStat();
    if(fork()==0){
        /*child*/
        printMemStat();

        i = 0;
        for(; i<len/2; i++){
            array[i] = i+1;
        }

        printMemStat();

        i = 0;
        for(; i<len; i++){
            array[i] = i+1;
        }

        printMemStat();

    }else{
        /*parent*/
        int times=10;
        while(times-- > 0){
            sleep(1);
        }
    }
    return 0;
}

After fork(), the child process modifies a half of numbers in array, and then modifies the entire array. The outputs are:

===
Total: 16694571008
Free: 2129162240
===
Total: 16694571008
Free: 2126106624
===
Total: 16694571008
Free: 1325101056
===
Total: 16694571008
Free: 533794816

It seems that the array is not allocated as a whole. If I slightly change the first modification phase to:

i = 0;
for(; i<len/2; i++){
    array[i*2] = i+1;
}

The outputs will be:

===
Total: 16694571008
Free: 2129924096
===
Total: 16694571008
Free: 2126868480
===
Total: 16694571008
Free: 526987264
===
Total: 16694571008
Free: 526987264

Llano answered 27/11, 2014 at 0:52 Comment(5)

The child doesn't "allocate" anything. The child starts out as an exact, complete copy of the parent process and just continues executing from there. – Charlsiecharlton 27/11, 2014 at 0:54

So, A and B share the array? – Llano 27/11, 2014 at 0:55

@KerrekSB: But it will get its own separate process memory, which should be copy-on-write. – These 27/11, 2014 at 0:56

COW is an implementation detail, it doesn't affect your program. This question would probably be better for unix.stackexchange.com or superuser.com. – Judgment 27/11, 2014 at 1:0

Depends on the Operating System, hardware architecture and libc. But yes in case of recent Linux with MMU the fork(2) will work with copy-on-write. It will only (allocate and) copy a few system structures and the page table, but the heap pages actually point to the ones of the parent until written.

More control over this can be exercised with the clone(2) call. And vfork(2) beeing a special variant which does not expect the pages to be used. This is typically used before exec().

As for the allocation: the malloc() has meta information over requested memory blocks (address and size) and the C variable is a pointer (both in process memory heap and stacks). Those two look the same for the child (same values because same underlying memory page seen in the address space of both processes). So from a C program point of view the array is already allocated and the variable initialized when the process comes into existence. The underlying memory pages are however pointing to the original physical ones of the parent process, so no extra memory pages are needed until they are modified.

If the child allocates a new array it depends if it fits into the already existing heap pages or if the brk of the process needs to be increased. In both cases only the modified pages get copied and the new pages get allocated only for the child.

This also means that the physical memory might run out after malloc(). (Which is bad as the program cannot check the error return code of "a operation in a random code line"). Some operating systems will not allow this form of overcommit: So if you fork a process it will not allocate the pages, but it requires them to be available at that moment (kind of reserves them) just in case. In Linux this is configurable and called overcommit-accounting.

Crosscut answered 27/11, 2014 at 0:57 Comment(5)

But when will the child process allocate memory for array? (only allocate memory not copy) – Llano 27/11, 2014 at 1:4

@MinFu Depends on what you mean with "memory" and "allocate" :) (I added some additonal explanation to the answer). – Crosscut 27/11, 2014 at 1:5

What happens if the child detaches it from the parent? I guess all pages of parent gets copied to child space before the child gets detached. If not, what happens when the parent dies/exists after the fork? – Tourney 15/5, 2017 at 14:53

A page shared between multiple processes has no owner, the pages do not have to be copied on detach. The page is alive as long as it is mapped to any process. – Crosscut 15/5, 2017 at 15:44

For those like me who didn't know what brk means: it's the end of the data section of the program. Increasing it effectively allocates more memory to the process. – Maurreen 5/10, 2017 at 21:42

Some systems have a system call vfork(), which was originally designed as a lower-overhead version of fork(). Since fork() involved copying the entire address space of the process, and was therefore quite expensive, the vfork() function was introduced (in 3.0BSD).

However, since vfork() was introduced, the implementation of fork() has improved drastically, most notably with the introduction of 'copy-on-write', where the copying of the process address space is transparently faked by allowing both processes to refer to the same physical memory until either of them modify it. This largely removes the justification for vfork(); indeed, a large proportion of systems now lack the original functionality of vfork() completely. For compatibility, though, there may still be a vfork() call present, that simply calls fork() without attempting to emulate all of the vfork() semantics.

As a result, it is very unwise to actually make use of any of the differences between fork() and vfork(). Indeed, it is probably unwise to use vfork() at all, unless you know exactly why you want to.

The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.

This means that the child process of a vfork() must be careful to avoid unexpectedly modifying variables of the parent process. In particular, the child process must not return from the function containing the vfork() call, and it must not call exit() (if it needs to exit, it should use _exit(); actually, this is also true for the child of a normal fork()).

Krona answered 27/3, 2015 at 20:54 Comment(0)

Recommended topics

Hot tags