Not receiving SIGCHLD for processes executed with sudo
Asked Answered
G

1

11

I'm currently in the process of writing a shell. I execute processes and utilize a SIGCHLD signal handler to clean up (wait on them) when they are complete.

Everything has been working -- except when I execute processes which escalate privileges with sudo. In these cases, I never get a SIGCHLD signal -- so I never know that the process has completed executing.

When I receive a command such as sudo ls, I execute the program sudo and then provide ls as a parameter. I perform this execution with execvp.

If I take a look at ps -aux after my shell has executed sudo ls, I see the following:

root      4795  0.0  0.0   4496  1160 pts/29   S+   16:51   0:00 sudo ls
root      4796  0.0  0.0      0     0 pts/29   Z+   16:51   0:00 [ls] <defunct>

So, sudo ran and got assigned pid = 4795, with the child (ls) being assigned 4796. The child has completed its task and is now sitting in a zombie state. sudo doesn't seem to want to reap the zombie process and just sits there.

I would like to know what is causing this behavior -- I've tried different techniques to cleanup these zombie processes, such as running my shell under sudo and waiting directly on sudo and the PID which sudo executes (4796 in the above example). None of these techniques have worked.

As always, any advise is appreciated.

Gahan answered 22/11, 2011 at 1:5 Comment(5)
I'd suggest checking strace(1) output of sudo(8) when run by your shell and when run by a standard system shell. Because tracing fiddles with the setuid permissions on executables, you'll need to attach strace after sudo has started but before it does much work; sudo -k first will force sudo to re-prompt for a password, and while it is waiting, you can find its pid and run strace -o /tmp/out -f -p <pid>.Burette
Does this happen under other shells as well?Kurtiskurtosis
@bdonlan, by other shells, do you mean if I run sudo vi via bash? It does not happen there.Gahan
You could try to read source of bash (or other shell) and see it there is something different/some special behavior in place for sudo. Also, if sudo changes user id, could this prevent your process from "doing staff" with sudo?Rein
Request: Show us the smallest compilable code that reproduces the issue, and tell us the version of sudo since there is a bug on point. Question: Are you blocking SIGCHLD when you exec? sudo doesn't rely on CHLD exclusively, IIRC, but that certainly wouldn't help things.Prototype
F
4

My first thought is incorrect signal processing but there is not enough information in your post to write test code to replicate your failure. But I can give you some places to look. Pardon me if I cover a few signal basics you already know for future readers.

First of all I do not know if you are using the legacy signal() or the new POSIX sigaction() signal routines to catch signals. sigset() is a useful in between from GNU.

Legacy Signals -- signal()
It's near impossible, if not impossible, to guarantee an air-tight signal processor using the original signal processor in all environments.

  • On some UNIX systems entering the signal handler can reset the handler to the default condition. Subsequent signals are guaranteed to be lost unless the handler explicitly reset the signal.
  • signal() handlers must not assume they get called once for each signal.
    • Handlers must do a while( ( pid = waitpid( -1, &signal, WNOHANG ) ) > 0 ) loop, until no more signals are found as legacy signals set a bool condition indicating at least one signal is outstanding. The actual count is unknown.
    • Handlers must allow for no signals being found if a prior while() loop processed the signal.
  • Allow for signals from unknown processes... if the program you start also starts a grandchild process you may inherit that process if your child exits quickly.

Advice, hold your nose and flee from legacy signals.

Lack of a while() loop in a legacy handler and multiple SIGCHILDs, one from your sudo and one or more from unexpected grandchildren fired off by sudo. If only one SIGCHILD is handled when a grandchild signal comes in first, the expected program's signal will not be caught.

POSIX Signals -- sigaction()
POSIX signals can clean up all of the failures of legacy signals.

  • Set a handler, without a restore (restore is NOT part of POSIX signals and is often, at least in my mind, evil when you might get more than one signal to handle in the same way).
  • sigaction() signals are sticky... they live until expressly changed (wonderful!). None of this troublesome requirement of having to reset the signal handler again in the handler.
  • Set a mask to mask out the current signal when processing the signal. Paranoids will also mask any other signal passed to the same handler.

Lack of a mask can cause weird stuff like loosing track of a signal if you get a SIGCHILD while in a SIGCHILD handler.

GNU -- sigset()
GNU provides an useful in-between that has the same calling signatures as signal() but removes most of the problems. Some additional control functions are also available. Using sigset() is an easy fix for many signal problems.

Reminders
Think of signal handlers as threads in your program, even if you are not otherwise using threads in the code.

In days of old you needed to do absolutely minimal processing in signal handlers... no calling of library code, such as printf, that have side effects. I still follow this when having to use legacy signal handlers and always use multithread cautions in newer handlers.

Fibrilla answered 25/11, 2011 at 19:41 Comment(4)
Yea, you can nit-pick me on the behavior of legacy signal processing, but I have used many environments over decades and still write code to defend against them all when I must use legacy signals.Fibrilla
Not germane. The primary problem is that sudo is not noticing that ls has exited, and so in turn does not exit to signal the OP's code. Whether or not the OP's code would correctly detect and handle sudo's termination is a separate issue.Prototype
I still would like to see the signal processing... if sudo exits without picking up ls's SIGCHILD his custom shell will get both SIGCHLDS and in a single call to the signal handler. If the handler does not reap both processes on a single call of the signal handler you get a zombie. I've seen this MANY times. Not with sudo, but so many others.Fibrilla
I read your comment as suggesting that a process can get a SIGCHLD for a grandchild ("…his custom shell will get both SIGCHLDS…"), and that a process can reap (a-wait(2)) a grandchild ("If the hander does not reap both…"). Neither is correct — a process gets a SIGCHLD for, and can reap only, its immediate children. It may appear that reaping a terminated child also dismisses zombie grandchildren, but in truth it is init(1) that automatically adopts and reaps the newly "orphaned" grandchildren.Prototype

© 2022 - 2024 — McMap. All rights reserved.