How to use while read line with tail -n

Asked 5/11, 2015 at 2:16 Answered 5/4, 2020 at 6:12

Problem: I have a CSV dump file - with excess of 250,000 lines. When I use while read - it takes a while (no pun intended). I would like to go back to the last 10,000 lines to do what I need to do instead of the 250,000 lines.

Code Snippet: My current code is this:

IFS=","
while read line
do

    awk_var=`echo "$line" | awk -F" " '{print $0}'`

    var_array=($awk_var)

    read -a var_array <<< "${awk_var}"

    echo "${var_array[1]}"


done </some_directory/directory/file_in_question.csv

Question: How can I use tail -n10000 with while read line when reading the file_in_question.csv with a bash script?

Scarface answered 5/11, 2015 at 2:16 Comment(3)

{print $0} is the same as {print} to awk which is the same as not using awk at all. What were you trying to do there? And the time here is likely from the 250,000 calls to awk (once per-loop). Avoid those if you can. – Guinna 5/11, 2015 at 2:31

@EtanReisner Well the first parameter is a Unix Timestamp in seconds, so, I calculate a boundary (two dates/timestamps), and extract the data inbetween the aforementioned boundary. What would be an alternative to do something like this that could make my code faster? – Scarface 5/11, 2015 at 4:15

My point was that awk_var=$(echo "$line" | awk -F " " '{print $0}') is exactly the same as awk_var=$(echo "$line") which is exactly the same as awk_var=$line which is the same as just using $line in the first place only with two fewer external commands, one less sub-shell and a few fewer lines of code. Also var_array=($awk_var) is wrong and you overwrite awk_var with the read a line later. – Guinna 5/11, 2015 at 13:47

Replace:

done </some_directory/directory/file_in_question.csv

with:

done < <(tail -n10000 /some_directory/directory/file_in_question.csv)

The <(...) construct is called process substitution. It creates a file-like object that bash can read from. Thus, this replaces reading from some_directory/directory/file_in_question.csv directly with reading from tail -n10000 /some_directory/directory/file_in_question.csv.

Using process substitution like this allows you to keep your while loop in the main shell, not a subshell. Because of this, variables that you create in the while loop will retain their value after the loop exits.

Speeding up the original code

The code as shown prints the second column of a CSV file. If that is all that the code is supposed to do, then it can be replaced with:

awk -F, '{print $2}' /some_directory/directory/file_in_question.csv

Proceeding answered 5/11, 2015 at 2:28 Comment(4)

Thanks John, Just a quick question - Is using MyVar=$(some_command) ie GetTodaysDate=$(date +%F) also considered process substitution? – Scarface 5/11, 2015 at 3:21

@Scarface No, that is command substitution. The difference is that command substitution captures the stdout of a process while process substitution makes a file-like object. – Proceeding 5/11, 2015 at 3:23

Oh, that's correct, my apologies - I got my substitutions mixed up :S ! – Scarface 5/11, 2015 at 4:12

How can I exit from while read in this case? Edit: nvm, just add "break" command inside the loop. – Ethiop 12/8, 2020 at 9:36

Something like:

IFS=","
tail /var/log/httpd/error_log | while read foo bar
do
    echo $foo
done

I recommend you do the splitting in bash with read, instead of calling awk inefficiently there. Obviously rewriting the whole thing as an awk script will be faster than shell, but awk is harder less common language.

Cthrine answered 5/11, 2015 at 2:27 Comment(0)

Or this one.

while : 
do read l || { sleep 1 ; continue; }
   echo "==> $l"
done < /var/log/httpd/error_log

Florafloral answered 5/4, 2020 at 6:12 Comment(0)

Speeding up the original code

Recommended topics

Hot tags