While-loop subshell dilemma in Bash
Asked Answered
A

4

38

i want to compute all *bin files inside a given directory. Initially I was working with a for-loop:

var=0
for i in *ls *bin
do
   perform computations on $i ....
   var+=1
done
echo $var

However, in some directories there are too many files resulting in an error: Argument list too long

Therefore, I was trying it with a piped while-loop:

var=0
ls *.bin | while read i;
do
  perform computations on $i
  var+=1
done
echo $var

The problem now is by using the pipe subshells are created. Thus, echo $var returns 0.
How can I deal with this problem?
The original Code:

#!/bin/bash

function entropyImpl {
    if [[ -n "$1" ]]
    then
        if [[ -e "$1" ]]
        then
            echo "scale = 4; $(gzip -c ${1} | wc -c) / $(cat ${1} | wc -c)" | bc
        else
            echo "file ($1) not found"
        fi
    else
        datafile="$(mktemp entropy.XXXXX)"
        cat - > "$datafile"
        entropy "$datafile"
        rm "$datafile"
    fi

    return 1
}
declare acc_entropy=0
declare count=0

ls *.bin | while read i ;
do  
    echo "Computing $i"  | tee -a entropy.txt
    curr_entropy=`entropyImpl $i`
    curr_entropy=`echo $curr_entropy | bc`  
    echo -e "\tEntropy: $curr_entropy"  | tee -a entropy.txt
    acc_entropy=`echo $acc_entropy + $curr_entropy | bc`
    let count+=1
done

echo "Out of function: $count | $acc_entropy"
acc_entropy=`echo "scale=4; $acc_entropy / $count" | bc`

echo -e "===================================================\n" | tee -a entropy.txt
echo -e "Accumulated Entropy:\t$acc_entropy ($count files processed)\n" | tee -a entropy.txt
Acicula answered 5/12, 2012 at 15:34 Comment(1)
A for loop is evaluated in the shell, and thus by itself does not produce "argument list too long". Perhaps you can go back to that code and fix whatever else was wrong there. (The *ls looks mirplaced there; perhaps the original problem was a useless use of ls?)Disorganization
H
92

The problem is that the while loop is part of a pipeline. In a bash pipeline, every element of the pipeline is executed in its own subshell [ref]. So after the while loop terminates, the while loop subshell's copy of var is discarded, and the original var of the parent (whose value is unchanged) is echoed.

One way to fix this is by using Process Substitution as shown below:

var=0
while read i;
do
  # perform computations on $i
  ((var++))
done < <(find . -type f -name "*.bin" -maxdepth 1)

Take a look at BashFAQ/024 for other workarounds.

Notice that I have also replaced ls with find because it is not good practice to parse ls.

Herby answered 5/12, 2012 at 15:51 Comment(6)
This is correct for bash, zsh and ksh, but is not POSIX compliant. For example, set bash in POSIX mode set -o posix, and try such a command. You get : syntax error near unexpected token `<'Basidiospore
To say "the while loop is executed in a subshell" is correct but somewhat misleading in this context -- one might assume loops are generally in a subshell which is not the case. The issue here is that any command on the right hand side of a pipe would be in a subshell. That we have a compound command, namely a loop, is just a coincidence. (Although that coincidence is where the subshell issue manifests itself because only a compound command provides the opportunity to assign values to variables which are not persisted.)Iatrochemistry
Great answer, saved my life. While my code is little bit more complex I added an answer myself here: #55240479Perspire
I think ";" is not needed at the end of the "while read i"Composed
@Peter-ReinstateMonica: You are correct. But your comment applies not just for "right hand side of a pipe". In bash, every element of a pipeline, including the first element (i.e. not on the right hand side), is executed in its own subshell. For example: ideone.com/X41sYJ.Gates
This is black magic. Thank you for sharing.Taught
B
22

A POSIX compliant solution would be to use a pipe (p file). This solution is very nice, portable, and POSIX, but writes something on the hard disk.

mkfifo mypipe
find . -type f -name "*.bin" -maxdepth 1 > mypipe &
while read line
do
    # action
done < mypipe
rm mypipe

Your pipe is a file on your hard disk. If you want to avoid having useless files, do not forget to remove it.

Basidiospore answered 18/9, 2016 at 15:45 Comment(2)
trap 'rm -rf $TMPFIFODIR' EXIT; TMPFIFODIR=$(mktemp -d); mkfifo $TMPFIFODIR/mypipe at the beginning of the script, and reading/writing that fifo would take care of the "do not forget to remove it" issue.Frontwards
Alternatively, you can use an actual file, if you don't mind forcing everything to run sequentially, and capturing all the output in a file.Indreetloire
L
4

So researching the generic issue, passing variables from a sub-shelled while loop to the parent. One solution I found, missing here, was to use a here-string. As that was bash-ish, and I preferred a POSIX solution, I found that a here-string is really just a shortcut for a here-document. With that knowledge at hand, I came up with the following, avoiding the subshell; thus allowing variables to be set in the loop.

#!/bin/sh

set -eu

passwd="username,password,uid,gid
root,admin,0,0
john,appleseed,1,1
jane,doe,2,2"

main()
{
    while IFS="," read -r _user _pass _uid _gid; do
        if [ "${_user}" = "${1:-}" ]; then
            password="${_pass}"
        fi
    done <<-EOT
        ${passwd}
    EOT

    if [ -z "${password:-}" ]; then
        echo "No password found."
        exit 1
    fi

    echo "The password is '${password}'."
}

main "${@}"

exit 0

One important note to all copy pasters, is that the here-document is setup using the hyphen, indicating that tabs are to be ignored. This is needed to keep the layout somewhat nice. It is important to note, because stackoverflow doesn't render tabs in 'code' and replaces them with spaces. Grmbl. SO, don't mangle my code, just cause you guys favor spaces over tabs, it's irrelevant in this case!

This probably breaks on different editor(settings) and what not. So the alternative would be to have it as:

    done <<-EOT
${passwd}
EOT
Lozengy answered 21/8, 2019 at 18:25 Comment(0)
H
-2

This could be done with a for loop, too:

var=0;
for file in `find . -type f -name "*.bin" -maxdepth 1`; do 
    # perform computations on "$i"
    ((var++))
done 
echo $var
Hybris answered 12/5, 2016 at 9:21 Comment(5)
No, this would produce the "argument list too long" error they were trying to avoid in the first place, and additionally break (possibly with serious security implications) on file names with whitespace.Disorganization
with find you won't get an "argument list too long" error, only with ls.Hybris
I stand corrected, this seems to avoid the "argument list too long" error at least with reasonably recent versions of Bash. The whitespace error is still a problem.Disorganization
for file in *.bin; do won't produce an "argument list too long" error in bash either. So the find command doesn't bring anything extra to the party. The argument list too long error is from the kernel. Since for is a shell builtin, there is no exec'ing a command with a long argument list. See #19355370Frontwards
To clarify, my "shell builtin" sentence is better like this: Since for is a shell builtin, the shell will not exec a command, so the argument list limitation does not apply.Frontwards

© 2022 - 2024 — McMap. All rights reserved.