Incorrect results with bash process substitution and tail?
Asked Answered
I

1

13

Using bash process substitution, I want to run two different commands on a file simultaneously. In this example it is not necessary but imagine that "cat /usr/share/dict/words" was a very expensive operation such as uncompressing a 50gb file.

cat /usr/share/dict/words | tee >(head -1 > h.txt) >(tail -1 > t.txt) > /dev/null

After this command I would expect h.txt to contain the first line of the words file "A", and t.txt to contain the last line of the file "Zyzzogeton".

However what actually happens is that h.txt contains "A" but t.txt contains "argillaceo" which is about 5% into the file.

Why does this happen? It seems like either the "tail" process is terminating early or the streams are getting mixed up.

Running another similar command like this behaves as expected:

cat /usr/share/dict/words | tee >(grep ^a > a.txt) >(grep ^z > z.txt) > /dev/null

After this command I'd expect a.txt to contain all the words that begin with "a", while z.txt contains all of the words that begin with "z", which is exactly what happened.

So why doesn't this work with "tail", and with what other commands will this not work?

Ivon answered 17/12, 2015 at 17:27 Comment(1)
I think this is related to #4489639 which suggests that the processes with in the substitution exit as soon as the outer command finishes, but frankly I'm not able to demonstrate that is the current problem with any commands I've tried so farTarnetgaronne
T
11

Ok, what seems to happen is that once the head -1 command finishes it exits and that causes tee to get a SIGPIPE it tries to write to the named pipe that the process substitution setup which generates an EPIPE and according to man 2 write will also generate SIGPIPE in the writing process, which causes tee to exit and that forces the tail -1 to exit immediately, and the cat on the left gets a SIGPIPE as well.

We can see this a little better if we add a bit more to the process with head and make the output both more predictable and also written to stderr without relying on the tee:

for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null

which when I run it gave me the output:

1
Head done
2

so it got just 1 more iteration of the loop before everything exited (though t.txt still only has 1 in it). If we then did

echo "${PIPESTATUS[@]}"

we see

141 141

which this question ties to SIGPIPE in a very similar fashion to what we're seeing here.

The coreutils maintainers have added this as an example to their tee "gotchas" for future posterity.

For a discussion with the devs about how this fits into POSIX compliance you can see the (closed notabug) report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22195

If you have access to GNU version 8.24 they have added some options (not in POSIX) that can help like -p or --output-error=warn. Without that you can take a bit of a risk but get the desired functionality in the question by trapping and ignoring SIGPIPE:

trap '' PIPE
for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
trap - PIPE

will have the expected results in both h.txt and t.txt, but if something else happened that wanted SIGPIPE to be handled correctly you'd be out of luck with this approach.

Another hacky option would be to zero out t.txt before starting then not let the head process list finish until it is non-zero length:

> t.txt; for i in {1..10}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done"; while [ ! -s t.txt ]; do sleep 1; done) >(tail -1 > t.txt; date) >/dev/null
Tarnetgaronne answered 17/12, 2015 at 19:17 Comment(8)
POSIX-specified behavior for tee is for it to continue operating even if one of its readers exits -- so if you're seeing something contrary, that's actually a bug.Stovepipe
"If a write to any successfully opened file operand fails, writes to other successfully opened file operands and standard output shall continue, but the exit status shall be non-zero. Otherwise, the default actions specified in Utility Description Defaults apply." - pubs.opengroup.org/onlinepubs/9699919799/utilities/tee.htmlStovepipe
@CharlesDuffy well the above results are for an older version I guess, 8.5, I can try again. Also, I haven't delved in depth enough to know whether the process substitution presents as a close file handle or whether it actually raises SIGPIPE when that process ends, I'd have more work to do before submitting a bug report I guessTarnetgaronne
@CharlesDuffy I dug a little more and since the process substitution is using named pipes when tee does a write to a closed pipe it gets an EPIPE and receives a SIGPIPE. Perhaps this does violate the standard as you quotedTarnetgaronne
@CharlesDuffy Ok, submitted a bug report to GNU over thisTarnetgaronne
The response from that bug report (re: arranging for SIGPIPE to be ignored) might be worth propagating into the answer here.Stovepipe
Thank you for the explanation and discussion. The "trap" workaround works for me. @EricRenouf can you provide a link to the bug report?Ivon
@UncleLongHair it's in the answer: debbugs.gnu.org/cgi/bugreport.cgi?bug=22195Tarnetgaronne

© 2022 - 2024 — McMap. All rights reserved.