Variables getting reset after the while read loop that reads from a pipeline
Asked Answered
E

4

13
initiate () {
read -p "Location(s) to look for .bsp files in? " loc
find $loc -name "*.bsp" | while read
do
    if [ -f "$loc.bz2" ]
    then
        continue
    else
        filcount=$[$filcount+1]
        bzip $loc
    fi
    if [ "$scan" == "1" ]; then bzipint $loc
    fi
    echo $filcount    #Correct counting
    echo $zipcount    #Correct counting
    echo $scacount    #Correct counting
    echo $valid       #Equal to 1
done

echo $filcount    #Reset to 0
echo $zipcount    #Reset to 0
echo $scacount    #Reset to 0
echo $valid       #Still equal to 1
}

I'm writing a bash shell script to use bzip2 to zip up all .bsp files inside a directory. In this script I have several variables for counting totals (files, successful zips, successful integrity scans), however I seem to have run into a problem.

When find $loc -name "*.bsp" runs out of files to give the while read and while read exits, it zeros out $filcount, $zipcount and $scacount (all of which are changed (increased) inside initiate (), bzip () (which is called during initiate ()) or bzipint () (which is also called in initiate ()).

In order to test if it's something to do with variables changing inside initiate () or other functions accessed from it, I used echo $valid, which is defined outside of initiate () (like $filcount, $zipcount, etc.), but is not changed from another function inside initiate () or inside initiate () itself.

Interestingly enough, $valid does not get reset to 0 like the other variables inside initiate.

Can anyone tell me why my variables magically get reset when while read exits?

Estrade answered 5/9, 2011 at 23:13 Comment(0)
V
11

I ran into this problem yesterday.

The trouble is that you're doing find $loc -name "*.bsp" | while read. Because this involves a pipe, the while read loop can't actually be running in the same bash process as the rest of your script; bash has to spawn off a subprocess so that it can connect the the stdout of find to the stdin of the while loop.

This is all very clever, but it means that any variables set in the loop can't be seen after the loop, which totally defeated the whole purpose of the while loop I was writing.

You can either try to feed input to the loop without using a pipe, or get output from the loop without using variables. I ended up with a horrifying abomination involving both writing to a temporary file AND wrapping the whole loop in $(...), like so:

var="$(producer | while read line; do
    ...
    echo "${something}"
done)"

Which got me var set to all the things that had been echoed from the loop. I probably messed up the syntax of that example; I don't have the code I wrote handy at the moment.

Vacla answered 5/9, 2011 at 23:28 Comment(2)
Excellent answer, thanks for the clarification. Yeah, this is definitely going to cause some complications... I might have to end up using a temporary file or two to get around this, like you did. :PEstrade
Hello, people reading this in the future, ten years or more after this question was posted! Please use the process substitution version instead. Once you've grappled with the basic notion that "pipes are subshells" and understand what process substitutions do, it is much clearer when written that way. Even the ancient version of Bash that comes with macOS (3.2.something) supports this.Dialectic
I
12

if you use bash

while read
do
    if [ -f "$REPLY.bz2" ]
    then
        continue
    else
        filcount=$[$filcount+1]
        bzip $REPLY
    fi
    if [ "$scan" == "1" ]; then bzipint $REPLY
    fi
    echo $filcount    #Correct counting
    echo $zipcount    #Correct counting
    echo $scacount    #Correct counting
    echo $valid       #Equal to 1
done < <(find $loc -name "*.bsp")
Incondite answered 6/9, 2011 at 2:45 Comment(1)
+1 much cleaner than passing variables as output; this does essentially the same thing as the pipe version, but with the while loop in the main process.Musaceous
V
11

I ran into this problem yesterday.

The trouble is that you're doing find $loc -name "*.bsp" | while read. Because this involves a pipe, the while read loop can't actually be running in the same bash process as the rest of your script; bash has to spawn off a subprocess so that it can connect the the stdout of find to the stdin of the while loop.

This is all very clever, but it means that any variables set in the loop can't be seen after the loop, which totally defeated the whole purpose of the while loop I was writing.

You can either try to feed input to the loop without using a pipe, or get output from the loop without using variables. I ended up with a horrifying abomination involving both writing to a temporary file AND wrapping the whole loop in $(...), like so:

var="$(producer | while read line; do
    ...
    echo "${something}"
done)"

Which got me var set to all the things that had been echoed from the loop. I probably messed up the syntax of that example; I don't have the code I wrote handy at the moment.

Vacla answered 5/9, 2011 at 23:28 Comment(2)
Excellent answer, thanks for the clarification. Yeah, this is definitely going to cause some complications... I might have to end up using a temporary file or two to get around this, like you did. :PEstrade
Hello, people reading this in the future, ten years or more after this question was posted! Please use the process substitution version instead. Once you've grappled with the basic notion that "pipes are subshells" and understand what process substitutions do, it is much clearer when written that way. Even the ancient version of Bash that comes with macOS (3.2.something) supports this.Dialectic
D
9

To summarize options for using read at the end of [the conceptual equivalent of] a pipeline in POSIX-like shells:

To recap: in bash by default and in strictly POSIX-compliant shells always, all commands in a pipeline run in a subshell, so variables they create or modify won't be visible to the current shell (won't exist after the pipeline ends).

The following covers bash, ksh, zsh, and sh ([mostly] POSIX-features-only shells such as dash) and shows ways of avoiding the creation of a subshell so as to preserve the variables created / modified by read.

If no minimum version number is given, assume that even "pretty old" versions support it (the features in question have been around for a long time, but I don't know specifically when they were introduced.

Note that as a [POSIX-compliant] alternative to the solutions below you can always capture a command's output in a [temporary] file, and then feed it to read as < file, which also avoids subshells.


ksh, and zsh: NO workaround/configuration change needed at all:

The read builtin by default runs in the current shell when used as the last command in pipeline.

Seemingly, ksh and zsh by default run any command in the last stage of a pipeline in the current shell.
Observed in ksh 93u+ and zsh 5.0.5.
If you know specifically in what version this feature was introduced, let me know.

#!/usr/bin/env ksh
#!/usr/bin/env zsh

out= # initialize output variable

# Pipe multiple lines to the `while` loop and collect the values in the output variable.
printf '%s\n' one two three | 
 while read -r var; do
   out+="$var/"
 done

echo "$out" # -> 'one/two/three/'

bash 4.2+: use the lastpipe shell option

In bash version 4.2 or higher, turning on shell option lastpipe causes the last pipeline segment to run in the current shell, allowing read to create variables visible to the current shell.

#!/usr/bin/env bash

shopt -s lastpipe # bash 4.2+: make the last pipeline command run in *current* shell

out=
printf '%s\n' one two three | 
 while read -r var; do
   out+="$var/"
 done

echo "$out" # -> 'one/two/three/'

bash, ksh, zsh: use process substitution

Loosely speaking, a process substitution is a way to have a command's output act like a temporary file.

out=
while read -r var; do
  out+="$var/"
done < <(printf '%s\n' one two three) # <(...) is the process substitution

echo "$out" # -> 'one/two/three'

bash, ksh, zsh: use a here-string with a command substitution

out=
while read -r var; do
  out+="$var/"
done <<< "$(printf '%s\n' one two three)" # <<< is the here-string operator

echo "$out" # -> 'one/two/three'

Note the need to double-quote the command substitution to protect its output from shell expansions.


POSIX-compliant solution (sh): use a here-document with a command substitution

#!/bin/sh

out=
while read -r var; do
  out="$out$var/"
done <<EOF # <<EOF ... EOF is the here-doc
$(printf '%s\n' one two three)
EOF

echo "$out" # -> 'one/two/three'

Note that, by default, you need to place the ending delimiter - EOF, in this case - at the very beginning of the line, and that no characters must follow it.

Dunstable answered 7/3, 2015 at 17:31 Comment(5)
So, only shell builtins can affect the shell environment, and only shell builtins running in the current shell at that -- and shells have historically only run builtins directly from the first pipe stage. So some hypothetical marvelous shell could decide that if there's a single stage running a builtin, that's the stage that it should run directly, but for now you either run the builtin in the first stage or use bash's lastpipe option to have it use the last instead. Cool. Thanks! I kinda knew that before, but only kinda.Thalassography
@jthill: Turns out that ksh and zsh already have that magic built in for the last pipeline segment - see my update. The first stage of a pipeline is always run in a subshell, even in ksh and zsh.Dunstable
@jthill: Note that wanting the last stage to be in the current shell is the typical use case: you want to capture the results of a pipeline in variables visible to the current shell. Can you tell me what you mean by "if there's a single stage running a builtin"?Dunstable
To tap into the middle of the pipeline, say sort data | while read; do whatever here;: $((subtotal+=someresult)); echo $((someresult)); done | work on the munged stuff; doneThalassography
@jthill: Got it - that would be handy, though I suspect that the last stage is usually the one where you re-enter the "shell world", conceptually speaking.Dunstable
H
0

I landed here because I frequently use this paradigm:

find ${yourPath} -type f -print0 | while IFS= read -rd '' eachFile;do 
    echo "${eachFile}" #show all the files, add more code to do real work
done

and found out the hard way that variables within the while loop were in a subshell, so this would yield nothing:

find ${yourPath} -type f -print0 | while IFS= read -rd '' eachFile;do 
    echo "${eachFile}" #show all the files, add more code to do real work
done
echo "${eachFile}" #we cannot get the last value of eachFile!

This still doesn't work:

shopt -s lastpipe 
find ${yourPath} -type f -print0 | while IFS= read -rd '' eachFile;do 
    echo "${eachFile}" #show all the files, add more code to do real work
done
echo "${eachFile}" #we cannot get the last value of eachFile; even with lastPipe!

This is what works:

shopt -s lastpipe 
find ${yourPath} -type f -print0 | while IFS= read -rd '' eachFile;do 
    echo "${eachFile}" #show all the files, add more code to do real work
    latestFile="${eachFile}"
done
echo "${eachFile}" #we cannot get the last value of eachFile; even with lastPipe!
echo "${latestFile}" #but we can get this!

My guess is that the eachFile variable is still getting created within a subshell, despite the lastpipe option; but we can circumvent this by using a variable that is defined/set within the while-loop.

Humming answered 27/10, 2023 at 14:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.