My experience of fork()
'ing within threads is really bad. The software generally fails pretty quickly.
I've found several solutions to the matter, although you may not like them much, I think these are generally the best way to avoid close to undebuggable errors.
Fork first
Assuming you know the number of external processes you need at the start, you can create them upfront and just have them sit there waiting for an event (i.e. read from a blocking pipe, wait on a semaphore, etc.)
Once you forked enough children you are free to use threads and communicate with those forked processes via your pipes, semaphores, etc. From the time you create a first thread, you cannot call fork anymore. Keep in mind that if you're using 3rd party libraries which may create threads, those have to be used/initialized after the fork()
calls happened.
Note that you can then start using threads within the main and fork()
'ed processes.
Know your state
In some circumstances, it may be possible for you to stop all of your threads to start a process and then restart your threads. This is somewhat similar to point (1) in the sense that you do not want threads running at the time you call fork()
, although it requires a way for you to know about all the threads currently running in your software (something not always possible with 3rd party libraries).
Remember that "stopping a thread" using a wait is not going to work. You have to join with the thread so it is fully exited, because a wait require a mutex and those need to be unlocked when you call fork()
. You just cannot know when the wait is going to unlock/re-lock the mutex and that's usually where you get stuck.
Choose one or the other
The other obvious possibility is to choose one or the other and not bother with whether you're going to interfere with one or the other. This is by far the simplest method if at all possible in your software.
Create Threads only when Necessary
In some software, one creates one or more threads in a function, use said threads, then joins all of them when exiting the function. This is somewhat equivalent to point (2) above, only you (micro-)manage threads as required instead of creating threads that sit around and get used when necessary. This will work too, just keep in mind that creating a thread is a costly call. It has to allocate a new task with a stack and its own set of registers... it is a complex function. However, this makes it easy to know when you have threads running and except from within those functions, you are free to call fork()
.
In my programming, I used all of these solutions. I used Point (2) because the threaded version of log4cplus
and I needed to use fork()
for some parts of my software.
As mentioned by others, if you are using a fork()
to then call execve()
then the idea is to use as little as possible between the two calls. That is likely to work 99.999% of the time (many people use system()
or popen()
with fairly good successes too and these do similar things). The fact is that if you do not hit any of the mutexes held by the other threads, then this will work without issue.
On the other hand, if, like me, you want to do a fork()
and never call execve()
, then it's not likely to work right while any thread is running.
What is actually happening?
The issue is that fork()
create a separate copy of only the current task (a process under Linux is called a task in the kernel).
Each time you create a new thread (pthread_create()
), you also create a new task, but within the same process (i.e. the new task shares the process space: memory, file descriptors, ownership, etc.). However, a fork()
ignores those extra tasks when duplicating the currently running task.
+-----------------------------------------------+
| Process A |
| |
| +----------+ +----------+ +----------+ |
| | thread 1 | | thread 2 | | thread 3 | |
| +----------+ +----+-----+ +----------+ |
| | |
+----------------------|------------------------+
| fork()
|
+----------------------|------------------------+
| v Process B |
| +----------+ |
| | thread 1 | |
| +----------+ |
| |
+-----------------------------------------------+
So in Process B, we lose thread 1 & thread 3 from Process A. This means that if either or both have a lock on mutexes or something similar, then Process B is going to lock up quickly. The locks are the worst, but any resources that either thread still has at the time the fork()
happens are lost (socket connection, memory allocations, device handle, etc.) This is where point (2) above comes in. You need to know your state before the fork()
. If you have a very small number of threads or worker threads defined in one place and can easily stop all of them, then it will be easy enough.