Is there any difference in bash between "while read -r line; do ...; done < file` and `cat file | while IFS= read -r line; do ...; done`?
Asked Answered
A

2

5

I'm learning bash and I found a tutorial on internet that says these are the same:

while read -r line;
do
    ...
done < file

$ cat file | while IFS= read -r line;
do
    ...
done

Are there any subtle differences in these two loops are are they really the same?

Admetus answered 20/10, 2014 at 17:39 Comment(5)
my suggestion: Don't parse the output of cat command.Wheelhouse
To amplify what chepner has already touched on: In general cat foo | bar as opposed to bar <foo (1) is less inefficient for two reasons: (1a) creation of a pipeline requires an extra fork(); (1b) causing foo to first be read by cat, then written into a pipeline, then read from the pipeline by bar, is less efficient than simply letting content in foo be directly read by bar. (2) in cases where bar is a program which has access to the seek() call (not typically available for bash unless you've written a C extension), giving it a pipeline rather than a file prevents use of this.Blackmon
Were both loops supposed to have IFS=?Parenteau
@thatotherguy no, just the second loop.Admetus
What does the IFS= do in the second loop anyway?Admetus
I
10

The biggest difference is that in the pipeline, the while loop executes in a subshell, so if you change the values of any variables in the body of the while, those will be lost after the pipeline completes.

$ foo=5
$ cat file | while IFS= read -r line; do
>    foo=$line  # assume $line is not 5
> done
$ echo $foo
5
$ while IFS= read -r line; do
>  foo=$line
> done < file  # Assume one line with the word foo
$ echo $foo
foo

In bash 4.2, this can be mitigated by using the lastpipe option, which allows the last command in a pipeline to be executed in the current shell instead of a subshell.

Aside from that, the version using input redirection is more efficient, since it does not require extra processes to be started.

Integrated answered 20/10, 2014 at 17:44 Comment(3)
Good answer! I often forget about that variable scope thing when I just mention "useless use of cat" :)Bravo
But they both have while loops?Admetus
Sorry, I wasn't explicit about which construct required a subshell; edited.Integrated
P
1

In addition to chepner's observation about subshells, one of the loops uses IFS= and one does not.

read uses this variable to split up words. With one variable, this affects leading and trailing whitespace.

With IFS=, it's preserved:

$ IFS= read -r line <<< "   test   "
$ printf "<%s>\n" "$line"
<   test   >

Otherwise, it's stripped:

$ read -r line <<< "   test   "
$ printf "<%s>\n" "$line"
<test>

You can imagine how much havoc the first non-IFS= loop would wreck on e.g. a Python file.

Parenteau answered 21/10, 2014 at 17:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.