Why do processes spawned by cron end up defunct?
Asked Answered
M

6

15

I have some processes showing up as <defunct> in top (and ps). I've boiled things down from the real scripts and programs.

In my crontab:

* * * * * /tmp/launcher.sh /tmp/tester.sh

The contents of launcher.sh (which is of course marked executable):

#!/bin/bash
# the real script does a little argument processing here
"$@"

The contents of tester.sh (which is of course marked executable):

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

ps shows the following:

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

Note that tester.sh does not appear--it has exited after launching the background job.

Why does launcher.sh stick around, marked <defunct>? It only seems to do this when launched by cron--not when I run it myself.

Additional note: launcher.sh is a common script in the system this runs on, which is not easily modified. The other things (crontab, tester.sh, even the program that I run instead of sleep) can be modiified much more easily.

Matthias answered 1/10, 2009 at 22:45 Comment(2)
By the way, processes marked "<defunct>" are called "zombies".Houk
A possible solution is give in this thread: #3748932Pharmacopoeia
I
15

Because they haven't been the subject of a wait(2) system call.

Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the wait system call because it won't have the exit status or evidence of its existence any more.

When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.

But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.


Update:   Responding to comment... Hmm. I did manage to duplicate the issue:

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

So, what happened was, I think:

  • cron forks and cron child starts shell
  • shell (1636) starts sid and pgid 1636 and starts sleep
  • shell exits, SIGCHLD sent to cron 3562
  • signal is ignored or mishandled
  • shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
    • It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during daemon_fork(), and this could interfere with signal delivery on a quick exit by intermediate 1629

      Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)

Ignoramus answered 1/10, 2009 at 22:48 Comment(4)
It actually will stick around all day, not just until cron wakes up. Can you comment on that? The real program I run (not sleep) runs for hours and hours.Matthias
..and is there a proper solution to this? can the script do something to make sure it won't turn into a zombie when it finishes?Ondometer
Hi, can you tell me how to reproduce this problem?Immanent
Maybe the command which will produce output will cause the "cron" being a zombie? Just a guess.Hellenist
P
9

to my opinion it's caused by process CROND (spawned by crond for every task) waiting for input on stdin which is piped to the stdout/stderr of the command in the crontab. This is done because cron is able to send resulting output via mail to the user.

So CROND is waiting for EOF till the user command and all it's spawned child processes have closed the pipe. If this is done CROND continues with the wait-statement and then the defunct user command disappears.

So I think you have to explicitly disconnect every spawned subprocess in your script form the pipe (e.g. by redirecting it to a file or /dev/null.

so the following line should work in crontab :

* * * * * ( /tmp/launcher.sh /tmp/tester.sh &>/dev/null & ) 
Perpetua answered 25/3, 2014 at 16:29 Comment(1)
Thanks, this post gave me happiness in the middle of the night.Irritate
T
4

I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

Here's what I see (edited):

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

Notice that the sh and the sleep are in the same SESS.

Use the command setsid(1). Here's tester.sh:

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

Notice you don't need &, setsid puts it in the background.

Toxicity answered 1/10, 2009 at 23:29 Comment(5)
Doing this causes launcher.sh and tester.sh to stick around. I'd like them both to terminate (at least with my original situation, tester.sh does terminate--with setsid it doesn't, which I don't want).Matthias
That's odd, both launcher and tester terminate when I run it here. (Almost immediately -- I have yet to take a ps snapshot where I see them running.)Toxicity
I am using Ubuntu Hardy 64-bit. What about you?Matthias
Oh, and I have SHELL=/bin/bash at the top of my crontab.Matthias
Ubuntu jaunty 32. No bash in my crontab. cron 3.0pl1-105ubuntu1.1Toxicity
H
3

I’d recommend that you solve the problem by simply not having two separate processes: Have launcher.sh do this on its last line:

exec "$@"

This will eliminate the superfluous process.

Houk answered 1/10, 2009 at 23:0 Comment(4)
I think you're right, but I can't easily do that because launcher.sh is used by many things, some of which would break if I made this change. I might consider making a new launcher script that does exec and leaving the other version intact, but this is rather distasteful.Matthias
@John Zwinck: I cannot imagine in what circumstances things would break if you made this change. It's effectively the same thing with one less process.Houk
@Teddy: the thing that would break is that some people do this in an interactive shell: . launcher.sh foo bar If the launcher did exec, the user's shell would terminate upon completion of the launched program. I know it's a strange use case, but that's how it is in the existing system.Matthias
@John Zwinck: The script could be rewritten to detect if it was started or sourced, and act accordingly.Houk
P
2

I found this question while I was looking for a solution with a similar issue. Unfortunately answers in this question didn't solve my problem.

Killing defunct process is not an option as you need to find and kill its parent process. I ended up killing the defunct processes in the following way:

ps -ef | grep '<defunct>' | grep -v grep | awk '{print "kill -9 ",$3}' | sh

In "grep ''" you can narrow down the search to a specific defunct process you are after.

Peebles answered 22/10, 2011 at 0:42 Comment(0)
J
-3

I have tested the same problem so many times. And finally I've got the solution. Just specify the '/bin/bash' before the bash script as shown below.

* * * * * /bin/bash /tmp/launcher.sh /tmp/tester.sh
Jempty answered 21/3, 2012 at 2:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.