Is it possible to fork a process without inherit virtual memory space of parent process?
Asked Answered
C

5

11

As the parent process is using huge mount of memory, fork may fail with errno of ENOMEM under some configuration of kernel overcommit policy. Even though the child process may only exec low memory-consuming program like ls.

To clarify the problem, when /proc/sys/vm/overcommit_memory is configured to be 2, allocation of (virtual) memory is limited to SWAP + MEMORY * ration(default to 50%). When a process forks, virtual memory is not copied thanks to COW. But the kernel still need to allocate virtual memory space. As an analogy, fork is like malloc(virtual memory space size) which will not allocate physical memory and writing to shared memory will cause copy of virtual memory and physical memory is allocated. When overcommit_memory is configured to be 2, fork may fail due to virtual memory space allocation.

Is it possible to fork a process without inherit virtual memory space of parent process in the following conditions?

  1. if the child process calls exec after fork

  2. if the child process doesn't call exec and will not using any global or static variable from parent process. For example, the child process just do some logging then quit.

Cannular answered 24/7, 2015 at 7:10 Comment(2)
I don't really understand; isn't this shared virtual memory copy-on-write? Therefore any additional memory is actually private to the child process. By not sharing virtual memory won't you exacerbate the problem?Kweilin
@Kweilin when a process forks, virtual memory is not copied thanks to COW. But the kernel still need to allocate virtual memory space. As an analogy, fork is like malloc(virtual memory space size) which will not allocate physical memory and writing to shared memory will cause copy of virtual meory and physical memory is allocated. When /proc/sys/vm/overcommit_memory is 2, allocation of memory is limited to SWAP+MEMORY*ratio. As a result, fork may fail with ENOMEM.Cannular
A
9

As Basile Starynkevitch answered, it's not possible.

There is, however, a very simple and common solution used for this, that does not rely on Linux-specific behaviour or memory overcommit control: Use an early-forked slave process do the fork and exec.

Have the large parent process create an unix domain socket and fork a slave process as early as possible, closing all other descriptors in the slave (reopening STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO to /dev/null). I prefer a datagram socket for its simplicity and guarantees, although a stream socket will also work.

In some rare cases it is useful to have the slave process execute a separate dedicated small helper program. In most instances this is not necessary, and makes security design much easier. (In Linux, you can include SCM_CREDENTIALS ancillary messages when passing data using an Unix domain socket, and use the process ID therein to verify the identity/executable the peer is using the /proc/PID/exe pseudo-file.)

In any case, the slave process will block in reading from the socket. When the other end closes the socket, the read/receive will return 0, and the slave process will exit.

Each datagram the slave process receives, describes a command to execute. (Using a datagram allows using C strings, delimited with NUL characters, without any escaping etc.; using an Unix stream socket typically requires you to delimit the "command" somehow, which in turn means escaping the delimiters in the command component strings.)

The slave process creates one or more pipes, and forks a child process. This child process closes the original Unix socket, replaces the standard streams with the respective pipe ends (closing the other ends), and executes the desired command. I personally prefer to use an extra close-on-exec socket in Linux to detect successful execution; in an error case, the errno code is written to the socket, so that the slave-parent can reliably detect the failure and the exact reason, too. If success, the slave-parent closes the unnecessary pipe ends, replies to the original process about the success, with the other pipe ends as SCM_RIGHTS ancillary data. After sending the message, it closes the rest of the pipe ends, and waits for a new message.

On the original process side, the above process is sequential; only one thread may execute start executing an external process at a time. (You simply serialize the access with a mutex.) Several can run at the same time; it is only the request to and response from the slave helper that is serialized.

If that is an issue -- it should not be in typical cases -- you can for example multiplex the connections, by prefixing each message with an ID number (assigned by the parent process, monotonically increasing). In that case, you'll probably use a dedicated thread on the parent end to manage the communications with the slave, as you certainly cannot have multiple threads reading from the same socket at the same time, and expect deterministic results.

Further improvements to the scheme include things like using a dedicated process group for the executed processes, setting limits to them (by setting limits to the slave process), and executing the commands as dedicated users and groups by using a privileged slave.

The privileged slave case is where it is most useful to have the parent execute a separate helper process for it. In Linux, both sides can use SCM_CREDENTIALS ancillary messages via Unix domain sockets to verify the identity (PID, and with ID, the executable) of the peer, making it rather straightforward to implement robust security. (But note that /proc/PID/exe has to be checked more than once, to catch the attacks where a message is sent by a nefarious program, quickly executing the appropriate program but with command-line arguments that cause it to exit soon, making it occasionally look like the correct executable made the request, while a copy of the descriptor -- and thus the entire communications channel -- was in control of a nefariuous user.)

In summary, the original problem can be solved, although the answer to the posed question is No. If the executions are security-sensitive, for example change privileges (user accounts) or capabilities (in Linux), then the design has to be carefully considered, but in normal cases the implementation is quite straight-forward.

I'd be happy to elaborate if necessary.

Autogenesis answered 31/7, 2015 at 0:29 Comment(0)
M
8

No, it is not possible. You might be interested by vfork(2) which I don't recommend. Look also into mmap(2) and its MAP_NORESERVE flag. But copy-on-write techniques are used by the kernel, so you practically won't double the RAM consumption.

My suggestion is to have enough swap space to not being concerned by such an issue. So setup your computer to have more available swap space than the largest running process. You can always create some temporary swap file (e.g. with dd if=/dev/zero of=/var/tmp/swapfile bs=1M count=32768 then mkswap /var/tmp/swapfile) then add it as a temporary swap zone (swapon /var/tmp/swapfile) and remove it (swapoff /var/tmp/swapfile and rm /var/tmp/swapfile) when you don't need it anymore.

You probably don't want to swap on a tmpfs file system like /tmp/ often is, since tmpfs file systems are backed up by swap space!.

I dislike memory overcommitment and I disable it (thru proc(5)). YMMV.

Make answered 24/7, 2015 at 7:19 Comment(4)
Don't use vfork(). It has some serious problems - serious enough that it's fundamentally broken. On Linux, it only blocks the calling thread, so in a multithreaded process there will be two different processes running in the same address space. And if the child process receives a signal while running, it can corrupt the parent process. Some of the Linux implementation problems listed at ewontfix.com/7 have been addressed, such as race conditions with setuid processes, but the fundamental issues created by having another process run in the same address space remain.Delagarza
Page table overhead is 4kb per 2mb assuming heap is contiguous, and there's nothing wrong with vfork(), which is what allows /bin/sh and a lot of important stuff to go really fast. I understand not everyone agrees with how Linux is designed; they may want to consider Windows NT instead because Microsoft shares their views.Courtund
swap file and more free RAM won't help e.g. when a 32-bit app crashes with VSS=4G.Joanajoane
In such a case that 32-bit Linux application (crashing with VSS=4G) needs to be recompiled from source code, then debugged, as a 64-bits application. Read documentation of GCC and of GDBMake
Y
5

I'm not aware of any way to do (2), but for (1) you could try to use vfork which will fork a new process without copying the page tables of the parent process. But this generally isn't recommended for a number of reasons, including because it causes the parent to block until the child performs an execve or terminates.

Your answered 24/7, 2015 at 7:22 Comment(2)
it causes the parent to block until the child performs an execve or terminates. That's not always true - on Linux only the calling thread is blocked. Which might even be worse as it means two different processes are running simultaneously in the same address space.Delagarza
You're right @AndrewHenle. Note that combining fork with threads is dangerous and is best avoided.Your
O
2

This is possible on Linux. Use the clone syscall without the flag CLONE_THREAD and with the flag CLONE_VM. The parent and child processes will use the same mappings, much like a thread would; there is no COW or page table copying.

Oppression answered 30/3, 2019 at 7:39 Comment(0)
C
1
madvise(addr, size, MADV_DONTFORK)

Alternatively, you can call munmap() after fork() to remove the virtual addresses inherited from the parent process.

Courtund answered 9/7, 2019 at 11:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.