I have a Docker container that runs bash at PID1 which in turn runs a long-running (complex) service that sometimes produces zombie processes parented to the bash at PID1. These zombies are seemingly never reaped.
I'm trying to reproduce this issue in a minimal container so that I can test mitigations, such as using a proper init as PID1 rather than bash.
However, I have been unable to reproduce the zombie processes. The bash at PID1 seems to reap children, even those it inherited from another process.
Here is what I tried:
docker run -d ubuntu:14.04 bash -c \
'bash -c "start-stop-daemon --background --start --pidfile /tmp/sleep.pid --exec /bin/sleep -- 30; sleep 300"'
My expectation was that start-stop-daemon
would double-fork to create a process parented to the bash at PID1, then exec into sleep 30
, and when the sleep exits I expected the process to remain as a zombie. The sleep 300
simulates a long-running service.
However, bash reaps the process, and I can observe that by running strace
on the bash process (from the host machine running docker):
$ sudo strace -p 2051
strace: Process 2051 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 9
wait4(-1,
I am running docker 1.11.1-rc1, though I have the same experience with docker 1.9.
$ docker --version
Docker version 1.11.1-rc1, build c90c70c
$ uname -r
4.4.8-boot2docker
Given that strace shows bash reaping (orphaned) children, is bash a suitable PID1 in a docker container? What else might be causing the zombies I'm seeing in the more complex container? How can I reproduce?
Edit:
I managed to attach strace
to a bash PID1 on one of the live containers exhibiting the problem.
Process 20381 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11185
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11191
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11203
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11155
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11151
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11152
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 11154
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 11332
...
Not sure exactly what all those exiting processes are, but none of the PIDs match those of the few defunct zombie processes that were shown by docker exec $id ps aux | grep defunct
.
Maybe the trick is to catch it in action and see what wait4()
returns on a process that remains a zombie...