Bash: Capture output of command run in background
Asked Answered
B

6

78

I'm trying to write a bash script that will get the output of a command that runs in the background. Unfortunately I can't get it to work, the variable I assign the output to is empty - if I replace the assignment with an echo command everything works as expected though.

#!/bin/bash

function test {
    echo "$1"
}

echo $(test "echo") &
wait

a=$(test "assignment") &
wait

echo $a

echo done

This code produces the output:

echo

done

Changing the assignment to

a=`echo $(test "assignment") &`

works, but it seems like there should be a better way of doing this.

Branle answered 16/11, 2013 at 11:20 Comment(0)
R
81

Bash has indeed a feature called Process Substitution to accomplish this.

$ echo <(yes)
/dev/fd/63

Here, the expression <(yes) is replaced with a pathname of a (pseudo device) file that is connected to the standard output of an asynchronous job yes (which prints the string y in an endless loop).

Now let's try to read from it:

$ cat /dev/fd/63
cat: /dev/fd/63: No such file or directory

The problem here is that the yes process terminated in the meantime because it received a SIGPIPE (it had no readers on stdout).

The solution is the following construct

$ exec 3< <(yes)  # Save stdout of the 'yes' job as (input) fd 3.

This opens the file as input fd 3 before the background job is started.

You can now read from the background job whenever you prefer. For a stupid example

$ for i in 1 2 3; do read <&3 line; echo "$line"; done
y
y
y

Note that this has slightly different semantics than having the background job write to a drive backed file: the background job will be blocked when the buffer is full (you empty the buffer by reading from the fd). By contrast, writing to a drive-backed file is only blocking when the hard drive doesn't respond.

Process substitution is not a POSIX sh feature.

Here's a quick hack to give an asynchronous job drive backing (almost) without assigning a filename to it:

$ yes > backingfile &  # Start job in background writing to a new file. Do also look at `mktemp(3)` and the `sh` option `set -o noclobber`
$ exec 3< backingfile  # open the file for reading in the current shell, as fd 3
$ rm backingfile       # remove the file. It will disappear from the filesystem, but there is still a reader and a writer attached to it which both can use it.

$ for i in 1 2 3; do read <&3 line; echo "$line"; done
y
y
y

Linux also recently got added the O_TEMPFILE option, which makes such hacks possible without the file ever being visible at all. I don't know if bash already supports it.

UPDATE:

@rthur, if you want to capture the whole output from fd 3, then use

output=$(cat <&3)

But note that you can't capture binary data in general: It's only a defined operation if the output is text in the POSIX sense. The implementations I know simply filter out all NUL bytes. Furthermore POSIX specifies that all trailing newlines must be removed.

(Please note also that capturing the output will result in OOM if the writer never stops (yes never stops). But naturally that problem holds even for read if the line separator is never written additionally)

Recriminate answered 16/11, 2013 at 11:54 Comment(11)
Thanks, that looks great! Is there any way to get all the output from a reader without looping through every line?Branle
Am I correct in assuming that calling read line also acts as a call to "wait"?Branle
Caution: if you want to use this in real life scripts, you'll need to create the file backingfile with mktemp and even use a trap as the combo >/exec/rm is not atomic!Hierolatry
@Hierolatry Using it without the -d switch seemed to make it wait too - could it just be an impression caused by read line waiting for there to be at least one line in the buffer?Branle
@gniourf_gniourf: This is all irrelevant to the actual question.Recriminate
@gniourf_gniourf: No, -d '' is not equivalent to wait. It reads until it encounters the first NUL byte after at least one non-NUL byte has been read. Only wait is equivalent to wait.Recriminate
@gniourf_gniourf: No, there is no problem with atomicity. And no, a trap will not make anything more atomic.Recriminate
@rthur: I've updated the answer to show how to get the output.Recriminate
@rthur: read has nothing to do with wait. It's entirely process agnostic. It simply reads from an open fd into a shell variable.Recriminate
@JoSo trap will not render anything atomic, it will just help you remove the ugly backingfile in case something goes wrong before reaching the rm. Regarding wait vs read -d '' you're right, I'll delete my comments.Hierolatry
Not entirely clear on the conclusion of this discussion. Is there a missing call to wait? Also, after cat <&3 is exec 3<&- needed to close the file descriptor?Cocaine
H
48

One very robust way to deal with coprocesses in Bash is to use... the coproc builtin.

Suppose you have a script or function called banana you wish to run in background, capture all its output while doing some stuff and wait until it's done. I'll do the simulation with this:

banana() {
    for i in {1..4}; do
        echo "gorilla eats banana $i"
        sleep 1
    done
    echo "gorilla says thank you for the delicious bananas"
}

stuff() {
    echo "I'm doing this stuff"
    sleep 1
    echo "I'm doing that stuff"
    sleep 1
    echo "I'm done doing my stuff."
}

You will then run banana with the coproc as so:

coproc bananafd { banana; }

this is like running banana & but with the following extras: it creates two file descriptors that are in the array bananafd (at index 0 for output and index 1 for input). You'll capture the output of banana with the read builtin:

IFS= read -r -d '' -u "${bananafd[0]}" banana_output

Try it:

#!/bin/bash

banana() {
    for i in {1..4}; do
        echo "gorilla eats banana $i"
        sleep 1
    done
    echo "gorilla says thank you for the delicious bananas"
}

stuff() {
    echo "I'm doing this stuff"
    sleep 1
    echo "I'm doing that stuff"
    sleep 1
    echo "I'm done doing my stuff."
}

coproc bananafd { banana; }

stuff

IFS= read -r -d '' -u "${bananafd[0]}" banana_output

echo "$banana_output"

Caveat: you must be done with stuff before banana ends! if the gorilla is quicker than you:

#!/bin/bash

banana() {
    for i in {1..4}; do
        echo "gorilla eats banana $i"
    done
    echo "gorilla says thank you for the delicious bananas"
}

stuff() {
    echo "I'm doing this stuff"
    sleep 1
    echo "I'm doing that stuff"
    sleep 1
    echo "I'm done doing my stuff."
}

coproc bananafd { banana; }

stuff

IFS= read -r -d '' -u "${bananafd[0]}" banana_output

echo "$banana_output"

In this case, you'll obtain an error like this one:

./banana: line 22: read: : invalid file descriptor specification

You can check whether it's too late (i.e., whether you've taken too long doing your stuff) because after the coproc is done, bash removes the values in the array bananafd, and that's why we obtained the previous error.

#!/bin/bash

banana() {
    for i in {1..4}; do
        echo "gorilla eats banana $i"
    done
    echo "gorilla says thank you for the delicious bananas"
}

stuff() {
    echo "I'm doing this stuff"
    sleep 1
    echo "I'm doing that stuff"
    sleep 1
    echo "I'm done doing my stuff."
}

coproc bananafd { banana; }

stuff

if [[ -n ${bananafd[@]} ]]; then
    IFS= read -r -d '' -u "${bananafd[0]}" banana_output
    echo "$banana_output"
else
    echo "oh no, I took too long doing my stuff..."
fi

Finally, if you really don't want to miss any of gorilla's moves, even if you take too long for your stuff, you could copy banana's file descriptor to another fd, 3 for example, do your stuff and then read from 3:

#!/bin/bash

banana() {
    for i in {1..4}; do
        echo "gorilla eats banana $i"
        sleep 1
    done
    echo "gorilla says thank you for the delicious bananas"
}

stuff() {
    echo "I'm doing this stuff"
    sleep 1
    echo "I'm doing that stuff"
    sleep 1
    echo "I'm done doing my stuff."
}

coproc bananafd { banana; }

# Copy file descriptor banana[0] to 3
exec 3>&${bananafd[0]}

stuff

IFS= read -d '' -u 3 output
echo "$output"

This will work very well! the last read will also play the role of wait, so that output will contain the complete output of banana.

That was great: no temp files to deal with (bash handles everything silently) and 100% pure bash!

Hope this helps!

Hierolatry answered 16/11, 2013 at 12:34 Comment(11)
Thanks for the detailed reply. I ended up using Jo So's answer as it seemed like a simpler way of passing the output of a command to another fd - coproc does seem useful though.Branle
@user2352030 JoSo's answer is not simpler at all! it's even more complicated as it forces you to create a file and remove it. And if you want to use his answer in a secure way, you'll need to use mktemp... and even probably with a trap! don't get fooled by what seems simpler!Hierolatry
Maybe I'm wrong, but from what I understood his portable sh shell method forces me to do that, but using the bash construct exec 3< <(command) does not.Branle
@user2352030 another advantage of coproc is that you'll be able to determine whether the backgrounded process has finished by checking if the array bananafd is set.Hierolatry
@Branle yes you're right, as was talking about the last method.Hierolatry
coproc does seem to have it's uses, and it's probably a better solution for a more complex script - but in my case I'm only running two commands, and all that matters is that they start at the same time (or as close as I can get them to), and that I wait for both of them to complete before running them again with different input.Branle
You can actually wait for the process to be finished by accessing $NAME_PID variable created when coproc NAME ... gets initiated.Barela
Does coproc allow the banana output to be buffered up before being consumed? I was looking for a way to run something in the background, have its output saved in memory, and then consume its output after it was done (so I'm not synchronous stream piping).Cram
@CMCDragonkai: in this case, I would use a temporary file.Hierolatry
Hmm... This looked like the best solution for a problem I had, but the fact that officially only one coproc may run at once limits it. It happened to work for four coprocs in my case (I was running four expensive commands to initialize four different bash variables in my .bashrc, and didn't like spending 2.5 seconds waiting for them to run sequentially when parallel execution could get it down to around 0.9 seconds), but dumped warnings about the three coprocs launched after the first one (because the first one hadn't finished yet).Microsporophyll
While it's unlikely to happen: Note that there's a time between 'coproc bananafd { banana; }' and 'exec 3>&${bananafd[0]}'. If the gorilla is VERY quick, you lose its output!Amalamalbena
E
15

One way to capture background command's output is to redirect it's output in a file and capture output from file after background process has ended:

test "assignment" > /tmp/_out &
wait
a=$(</tmp/_out)
Equine answered 16/11, 2013 at 11:24 Comment(3)
Is there anyway to do this without using files?Branle
Yes, there is a (bash-only) way. See my answer.Recriminate
This was the only method that worked for me, thank you!Nagano
F
3

I also use file redirections. Like:

exec 3< <({ sleep 2; echo 12; })  # Launch as a job stdout -> fd3
cat <&3  # Lock read fd3

More real case If I want the output of 4 parallel workers: toto, titi, tata and tutu. I redirect each one to an different file descriptor (in fd variable). Then reading these file descriptor will block until EOF <= pipe broken <= command completed

#!/usr/bin/env bash

# Declare data to be forked
a_value=(toto titi tata tutu)
msg=""

# Spawn child sub-processes
for i in {0..3}; do
  ((fd=50+i))
  echo -e "1/ Launching command: $cmd with file descriptor: $fd!"
  eval "exec $fd< <({ sleep $((i)); echo ${a_value[$i]}; })"
  a_pid+=($!)  # Store pid
done

# Join child: wait them all and collect std-output
for i in {0..3}; do
  ((fd=50+i));
  echo -e "2/ Getting result of: $cmd with file descriptor: $fd!"
  msg+="$(cat <&$fd)\n"
  ((i_fd--))
done

# Print result
echo -e "===========================\nResult:"
echo -e "$msg"

Should output:

1/ Launching command:  with file descriptor: 50!
1/ Launching command:  with file descriptor: 51!
1/ Launching command:  with file descriptor: 52!
1/ Launching command:  with file descriptor: 53!
2/ Getting result of:  with file descriptor: 50!
2/ Getting result of:  with file descriptor: 51!
2/ Getting result of:  with file descriptor: 52!
2/ Getting result of:  with file descriptor: 53!
===========================
Result:
toto
titi
tata
tutu

Note1: coproc is supporting only one coprocess and not multiple

Note2: wait command is buggy for old bash version (4.2) and cannot retrieve the status of the jobs I launched. It works well in bash 5 but file redirection works for all versions.

Fiorenza answered 7/7, 2021 at 1:55 Comment(0)
L
1

Just group the commands, when you run them in background and wait for both.

{ echo a & echo b & wait; } | nl

Output will be:

     1  a
     2  b

But notice that the output can be out of order, if the second task runs faster than the first.

{ { sleep 1; echo a; } & echo b & wait; } | nl

Reverse output:

     1  b
     2  a

If it is necessary to separate the output of both background jobs, it is necessary to buffer the output somewhere, typically in a file. Example:

#! /bin/bash

t0=$(date +%s)                               # Get start time

trap 'rm -f "$ta" "$tb"' EXIT                # Remove temp files on exit.

ta=$(mktemp)                                 # Create temp file for job a.
tb=$(mktemp)                                 # Create temp file for job b.

{ exec >$ta; echo a1; sleep 2; echo a2; } &  # Run job a.
{ exec >$tb; echo b1; sleep 3; echo b2; } &  # Run job b.

wait                                         # Wait for the jobs to finish.

cat "$ta"                                    # Print output of job a.
cat "$tb"                                    # Print output of job b.

t1=$(date +%s)                               # Get end time

echo "t1 - t0: $((t1-t0))"                   # Display execution time.

The overall runtime of the script is three seconds, although the combined sleeping time of both background jobs is five seconds. And the output of the background jobs is in order.

a1
a2
b1
b2
t1 - t0: 3

You can also use a memory buffer to store the output of your jobs. But this works only, if your buffer is big enough to store the whole output of your jobs.

#! /bin/bash

t0=$(date +%s)

trap 'rm -f /tmp/{a,b}' EXIT
mkfifo /tmp/{a,b}

buffer() { dd of="$1" status=none iflag=fullblock bs=1K; }

pids=()
{ echo a1; sleep 2; echo a2; } > >(buffer /tmp/a) &
pids+=($!)
{ echo b1; sleep 3; echo b2; } > >(buffer /tmp/b) &
pids+=($!)

# Wait only for the jobs but not for the buffering `dd`.
wait "${pids[@]}" 

# This will wait for `dd`.
cat /tmp/{a,b}

t1=$(date +%s)

echo "t1 - t0: $((t1-t0))"

The above will also work with cat instead of dd. But then you can not control the buffer size.

Lindeman answered 9/7, 2021 at 12:12 Comment(0)
L
0

If you have GNU Parallel you can probably use parset:

myfunc() {
  sleep 3
  echo "The input was"
  echo "$@"
}
export -f myfunc
parset a,b,c myfunc ::: myarg-a "myarg  b" myarg-c
echo "$a"
echo "$b"
echo "$c"

See: https://www.gnu.org/software/parallel/parset.html

Lenee answered 16/7, 2021 at 5:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.