Run command in golang and detach it from process
Asked Answered
U

1

9

Problem:

I'm writing program in golang on linux that needs to execute long running process so that:

  1. I redirect stdout of running process to file.
  2. I control the user of process.
  3. Process doesn't die when my program exits.
  4. The process doesn't become a zombie when it crashes.
  5. I get PID of running process.

I'm running my program with root permissions.

Attempted solution:

func Run(pathToBin string, args []string, uid uint32, stdLogFile *os.File) (int, error) {
    cmd := exec.Command(pathToBin, args...)

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Credential: &syscall.Credential{
            Uid: uid,
        },
    }

    cmd.Stdout = stdLogFile

    if err := cmd.Start(); err != nil {
        return -1, err
    }

    go func() {
        cmd.Wait() //Wait is necessary so cmd doesn't become a zombie
    }()


    return cmd.Process.Pid, nil
}

This solution seems to satisfy almost all of my requirements except that when I send SIGTERM/SIGKILL to my program the underlying process crashes. In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.

Other solutions on stackoverflow suggested to use cmd.Process.Release() for similar use cases, but it doesn't seem to work.

Solutions which are not applicable in my case:

  1. I have no control over code of process I'm running. My solution has to work for any process.
  2. I can't use external commands to run it, just pure go. So using systemd or something similar is not applicable.

I can in fact use library that is easily importable using import from github etc.

Unhandy answered 19/12, 2019 at 12:48 Comment(6)
Is it enough to set Setpgid: true? I seem to remember that this is necessary at least when running under bash (presumably you do this in development), because bash sends signals to all processes in the process group, not just your Go program. Plus you say you want an independent group anyway.Furuncle
I also tried this. But for some reason when I run it, I'm getting permission error. It's quite weird as I run it as root. Even running this command as root gives me same error. I tried workaround and used link syscall.Setpgid to modify the process after I've run it but I have the same problems.Unhandy
Do you have any info why it crashes? I believe you need to consider who has ownership of the log file's file descriptor. I believe cmd.ExtraFiles is for this purpose.Brutus
Also you should just call cmd.Process.Release() instead of waiting in a goroutine. The documentation of Wait() fails to mention this.Brutus
I'm quite surprised that many answers related to this question propose to use cmd.Process.Release(). On unix systems it seems to be... doing nothing.Unhandy
I know this is old, but possibly any value from this? #42471849Watchful
A
4

TLDR;

Just use https://github.com/hashicorp/go-reap

There is a great Russian expression which reads "don't try to give birth to a bicycle" and it means don't reinvent the wheel and keep it simple. I think it applies here. If I were you, I'd reconsider using one of:

This issue has already been solved ;)


Your question is imprecise or you are asking for non-standard features.

In fact I want my background process to be as separate as possible: it has different parent pid, group pid etc. from my program. I want to run it as daemon.

That is not how process inheritance works. You can not have process A start Process B and somehow change the parent of B to C. To the best of my knowledge this is not possible in Linux.

In other words, if process A (pid 55) starts process B (100), then B must have parent pid 55.

The only way to avoid that is have something else start the B process such as atd, crond, or something else - which is not what you are asking for.

If parent 55 dies, then PID 1 will be the parent of 100, not some arbitrary process.

Your statement "it has different parent pid" does not makes sense.

I want to run it as daemon.

That's excellent. However, in a GNU / Linux system, all daemons have a parent pid and those parents have a parent pid going all the way up to pid 1, strictly according to the parent -> child rule.

when I send SIGTERM/SIGKILL to my program the underlying process crashes.

I can not reproduce that behavior. See case8 and case7 from the proof-of-concept repo.

make case8
export NOSIGN=1; make build case7 
unset NOSIGN; make build case7

$ make case8
{ sleep 6 && killall -s KILL zignal; } &
./bin/ctrl-c &
sleep 2; killall -s TERM ctrl-c
kill with:
    { pidof ctrl-c; pidof signal ; } | xargs -r -t kill  -9 
main() 2476074
bashed 2476083 (2476081)
bashed 2476084 (2476081)
bashed 2476085 (2476081)
zignal 2476088 (2476090)
go main() got 23 urgent I/O condition
go main() got 23 urgent I/O condition
zignal 2476098 (2476097)
go main() got 23 urgent I/O condition
zignal 2476108 (2476099)
main() wait...
p  2476088
p  2476098
p  2476108
p  2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p  2476098
p  2476108
p  2476088
go main() got 15 terminated
sleep 1; killall -s TERM ctrl-c
p  2476098
p  2476108
p  2476088
Bash c 2476085 EXITs ELAPSED 4
go main() got 17 child exited
go main() got 23 urgent I/O condition
main() children done: 1 %!s(<nil>)
main() wait...
go main() got 15 terminated
go main() got 23 urgent I/O condition
sleep 1; killall -s KILL ctrl-c
p  2476098
p  2476108
p  2476088
balmora: ~/src/my/go/doodles/sub-process [main]
$ p  2476098
p  2476108
Bash _ 2476083 EXITs ELAPSED 6
Bash q 2476084 EXITs ELAPSED 8


The bash processes keep running after the parent is killed.

killall -s KILL ctrl-c;

All 3 "zignal" sub-processes are running until killed by

killall -s KILL zignal;

In both cases the sub-processes continue to run despite main process being signaled with TERM, HUP, INT. This behavior is different in a shell environment because of convenience reasons. See the related questions about signals. This particular answer illustrates a key difference for SIGINT. Note that SIGSTOP and SIGKILL cannot be caught by an application.


It was necessary to clarify the above before proceeding with the other parts of the question.

So far you have already solved the following:

  • redirect stdout of sub-process to a file
  • set owner UID of sub-process
  • sub-process survives death of parent (my program exits)
  • the PID of sub-process can be seen by the main program

The next one depends on whether the children are "attached" to a shell or not

  • sub-process survives the parent being killed

The last one is hard to reproduce, but I have heard about this problem in the docker world, so the rest of this answer is focused on addressing this issue.

  • sub-process survives if the parent crashes and does not become a zombie

As you have noted, the Cmd.Wait() is necessary to avoid creating zombies. After some experimentation I was able to consistency produce zombies in a docker environment using an intentionally simple replacement for /bin/sh. This "shell" implemented in go will only run a single command and not much else in terms of reaping children. You can study the code over at github.

The zombie solution

the simple wrapper which causes zombies

package main

func main()  {
Sh()
}

The reaper wrapper


package main

import (
    "fmt"
    "sync"

    "github.com/fatih/color"
    "github.com/hashicorp/go-reap"
)

func main() {

    if reap.IsSupported() {
        done := make(chan struct{})
        var reapLock sync.RWMutex
        pids := make(reap.PidCh, 1)

        errors := make(reap.ErrorCh, 1)
        go reap.ReapChildren(pids, errors, done, &reapLock)
        go report(pids, errors, done)

        Sh()

        close(done)
    } else {
        fmt.Println("Sorry, go-reap isn't supported on your platform.")
    }
}

func report(pids reap.PidCh, errors reap.ErrorCh, done chan struct{}) {

    sprintf := color.New(color.FgWhite, color.Bold).SprintfFunc()

    for ;; {
        select {
        case pid := <-pids:
            println(sprintf("raeper pid %d", pid))
        case err := <-errors:
            println(sprintf("raeper er %s", err))
        case <-done:
            return
        }
    }
}

The init / sh (pid 1) process which runs other commands


package main

import (
    "os"
    "os/exec"
    "strings"
    "time"

    "github.com/google/shlex"
    "github.com/tox2ik/go-poc-reaper/fn"
)


func Sh() {

    args := os.Args[1:]
    script := args[0:0]
    if len(args) >= 1 {
        if args[0] == "-c" {
            script = args[1:]
        }
    }
    if len(script) == 0 {
        fn.CyanBold("cmd: expecting sh -c 'foobar'")
        os.Exit(111)
    }

    var cmd *exec.Cmd
    parts, _ := shlex.Split(strings.Join(script, " "))
    if len(parts) >= 2 {
        cmd = fn.Merge(exec.Command(parts[0], parts[1:]...), nil)
    }
    if len(parts) == 1 {
        cmd = fn.Merge(exec.Command(parts[0]), nil)
    }

    if fn.IfEnv("HANG") {
        fn.CyanBold("cmd: %v\n      start", parts)
        ex := cmd.Start()
        if ex != nil {
            fn.CyanBold("cmd %v err: %s", parts, ex)
        }
        go func() {
            time.Sleep(time.Millisecond * 100)
            errw := cmd.Wait()
            if errw != nil {
                fn.CyanBold("cmd %v err: %s", parts, errw)
            } else {
                fn.CyanBold("cmd %v all done.", parts)
            }
        }()

        fn.CyanBold("cmd: %v\n      dispatched, hanging forever (i.e. to keep docker running)", parts)
        for {
            time.Sleep(time.Millisecond * time.Duration(fn.EnvInt("HANG", 2888)))
            fn.SystemCyan("/bin/ps", "-e", "-o", "stat,comm,user,etime,pid,ppid")
        }

    } else {

        if fn.IfEnv("NOWAIT") {
            ex := cmd.Start()
            if ex != nil {
                fn.CyanBold("cmd %v start err: %s", parts, ex)
            }
        } else {

            ex := cmd.Run()
            if ex != nil {
                fn.CyanBold("cmd %v run err: %s", parts, ex)
            }
        }
        fn.CyanBold("cmd %v\n      dispatched, exit docker.", parts)
    }
}

The Dockerfile


FROM scratch

# for sh.go
ENV HANG ""

# for sub-process.go
ENV ABORT ""
ENV CRASH ""
ENV KILL ""

# for ctrl-c.go, signal.go
ENV NOSIGN ""

COPY bin/sh          /bin/sh ## <---- wrapped or simple /bin/sh or "init"
COPY bin/sub-process /bin/sub-process
COPY bin/zleep       /bin/zleep
COPY bin/fork-if     /bin/fork-if


COPY --from=busybox:latest /bin/find    /bin/find
COPY --from=busybox:latest /bin/ls      /bin/ls
COPY --from=busybox:latest /bin/ps      /bin/ps
COPY --from=busybox:latest /bin/killall /bin/killall

Remaining code / setup can be seen here:

Case 5 (simple /bin/sh)

The gist of it is we start two sub-processes from go, using the "parent" sub-process binary. The first child is zleep and the second fork-if. The second one starts a "daemon" that runs a forever-loop in addition to a few short-lived threads. After a while, we kill the sub-procss parent, forcing sh to take over the parenting for these children.

Since this simple implementation of sh does not know how to deal with abandoned children, the children become zombies. This is standard behavior. To avoid this, init systems are usually responsible for cleaning up any such children.

Check out this repo and run the cases:

$ make prep build
$ make prep build2

The first one will use the simple /bin/sh in the docker container, and the socond one will use the same code wrapped in a reaper.

With zombies:

$ make prep build case5
(…)
main() Daemon away! 16 (/bin/zleep)
main() Daemon away! 22 (/bin/fork-if)
(…)
main() CRASH imminent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x49e45c]
goroutine 1 [running]:
main.main()
    /home/jaroslav/src/my/go/doodles/sub-process/sub-process.go:137 +0xfc
cmd [/bin/sub-process /log/case5 3 /bin/zleep 111 2 -- /dev/stderr 3 /bin/fork-if --] err: exit status 2
Child '1' done
thread done
STAT COMMAND          USER     ELAPSED PID   PPID
R    sh               0         0:02       1     0
S    zleep            3         0:02      16     1
Z    fork-if          3         0:02      22     1
R    fork-child-A     3         0:02      25     1
R    fork-child-B     3         0:02      26    25
S    fork-child-C     3         0:02      27    26
S    fork-daemon      3         0:02      28    27
R    ps               0         0:01      30     1
Child '2' done
thread done
daemon
(…)
STAT COMMAND          USER     ELAPSED PID   PPID
R    sh               0         0:04       1     0
Z    zleep            3         0:04      16     1
Z    fork-if          3         0:04      22     1
Z    fork-child-A     3         0:04      25     1
R    fork-child-B     3         0:04      26     1
S    fork-child-C     3         0:04      27    26
S    fork-daemon      3         0:04      28    27
R    ps               0         0:01      33     1
(…)

With reaper:

$ make -C ~/src/my/go/doodles/sub-process case5
(…)
main() CRASH imminent
(…)
Child '1' done
thread done
raeper pid 24
STAT COMMAND          USER     ELAPSED PID   PPID
S    sh               0         0:02       1     0
S    zleep            3         0:01      18     1
R    fork-child-A     3         0:01      27     1
R    fork-child-B     3         0:01      28    27
S    fork-child-C     3         0:01      30    28
S    fork-daemon      3         0:01      31    30
R    ps               0         0:01      32     1
Child '2' done
thread done
raeper pid 27
daemon
STAT COMMAND          USER     ELAPSED PID   PPID
S    sh               0         0:03       1     0
S    zleep            3         0:02      18     1
R    fork-child-B     3         0:02      28     1
S    fork-child-C     3         0:02      30    28
S    fork-daemon      3         0:02      31    30
R    ps               0         0:01      33     1
STAT COMMAND          USER     ELAPSED PID   PPID
S    sh               0         0:03       1     0
S    zleep            3         0:02      18     1
R    fork-child-B     3         0:02      28     1
S    fork-child-C     3         0:02      30    28
S    fork-daemon      3         0:02      31    30
R    ps               0         0:01      34     1
raeper pid 18
daemon
STAT COMMAND          USER     ELAPSED PID   PPID
S    sh               0         0:04       1     0
R    fork-child-B     3         0:03      28     1
S    fork-child-C     3         0:03      30    28
S    fork-daemon      3         0:03      31    30
R    ps               0         0:01      35     1
(…)

Here is a picture of the same output, which may be less confusing to read.

Zombies

Case5 - zombies

Reaper

Case5 - reaper

How to run the cases in the poc repo

Get the code

git clone https://github.com/tox2ik/go-poc-reaper.git

One terminal:

make tail-cases

Another terminal

make prep
make build
or make build2
make case0 case1
...

Related questions:

go

signals

Related discussions:

Relevant projects:

Relevant prose:

A zombie process is a process whose execution is completed but it still has an entry in the process table. Zombie processes usually occur for child processes, as the parent process still needs to read its child’s exit status. Once this is done using the wait system call, the zombie process is eliminated from the process table. This is known as reaping the zombie process.

from https://www.tutorialspoint.com/what-is-zombie-process-in-linux


Articular answered 24/10, 2021 at 14:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.