Say we have a certain parent process with some arbitrary amount of data stored in memory and we use fork
to spawn a child process. I understand that in order for the OS to perform copy on write, the certain page in memory that contains the data that we are modifying will have its Read-only bit set, and the OS will use the exception that will result when the child tries to modify the data to copy the entire page into another area in memory so that the child gets it's own copy. What I don't understand is, if that specific section in memory is marked as Read-only, then the parent process, to whom the data originally belonged, would not be able to modify the data neither. So how can this whole scheme work? Does the parent lose ownership of its data and copy on write will have to be performed even when the parent itself tries to modify the data?
Right, if either process writes a COW page, it triggers a page fault.
In the page fault handler, if the page is supposed to be writeable, it allocates a new physical page and does a memcpy(newpage, shared_page, pagesize)
, then updates the page table of whichever process faulted to map the newpage to that virtual address. Then returns to user-space for the store instruction to re-run.
This is a win for something like fork
, because one process typically makes an execve
system call right away, after touching typically one page (of stack memory). execve
destroys all memory mappings for that process, effectively replacing it with a new process. The parent once again has the only copy of every page. (Except pages that were already copy-on-write, e.g. memory allocated with mmap
is typically COW-mapped to a single physical page of zeros, so reads can hit in L1d cache).
A smart optimization would be for fork
to actually copy the page containing the top of the stack, but still do lazy COW for all the other pages, on the assumption that the child process will normally execve
right away and thus drop its references to all the other pages. It still costs a TLB invalidation in the parent to temporarily flip all the pages to read-only and back, though.
execve
runs in the child process, all memory mappings are wiped out, so the reference count on all the parent's pages drops back to 1. (Unless they were already shared). The child's copy of any pages that COW actually copied are freed on execve
, whether it was the parent or child that cause the copy to happen. –
Fill bash
creates a subshell for a command like foo; (echo 1234 > file)& bar
, the subshell that runs echo > file
doesn't modify most of its pages, so it's worth a few extra pagefaults to avoid a bunch of copying work. (This was especially true back when Unix was designed, when mem copying was slow compared to pagefault) –
Fill Some UNIX implementations share the program text between the two since that cannot be modified. Alternatively, the child may share all of the parent’s memory, but in that case the memory is shared
copy-on-write
, which means that whenever either of the two wants to modify part of the memory, that chunk of memory is explicitly copied first to make sure the modification occurs in a private memory area.Excerpted from: Modern Operating Systems (4th Edition), Tanenbaum
© 2022 - 2025 — McMap. All rights reserved.