How to parse a CSV file in Bash?
Asked Answered
F

6

155

I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:


  cat myfile.csv|while read line
  do
    read -d, col1 col2 < <(echo $line)
    echo "I got:$col1|$col2"
  done

It's only printing the first column. As an additional test, I tried the following:

read -d, x y < <(echo a,b,)

And $y is empty. So I tried:

read x y < <(echo a b)

And $y is b. Why?

Fatima answered 26/11, 2010 at 15:20 Comment(7)
have you considered awk to use $1, $2, etc?Conto
as a sidenote: command < <(echo "string") ---> command <<< "string"Fourflush
The 'cut' command line program was designed for that: ss64.com/bash/cut.htmlSwiger
Possible duplicate of #36288482Sublet
You want to lose the useless use of catSublet
I’ll suggest awk if that helpsBignoniaceous
See whats-the-most-robust-way-to-efficiently-parse-csv-using-awk.Isolated
R
265

You need to use IFS instead of -d:

while IFS=, read -r col1 col2
do
    echo "I got:$col1|$col2"
done < myfile.csv

To skip a given number of header lines:

skip_headers=3
while IFS=, read -r col1 col2
do
    if ((skip_headers))
    then
        ((skip_headers--))
    else
        echo "I got:$col1|$col2"
    fi
done < myfile.csv

Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.

Rennin answered 26/11, 2010 at 16:9 Comment(13)
The proposed solution is fine for very simple CSV files, that is, if the headers and values are free of commas and embedded quotation marks. It is actually quite tricky to write a generic CSV parser (especially since there are several CSV "standards"). One approach to making CSV files more amenable to *nix tools is to convert them to TSV (tab-separated values), e.g. using Excel.Relent
It is interesting that I cannot do mkdir in the body. I'm getting command not found. Only the echo works.Bluing
@Zsolt: There's no reason that should be the case. You must have a typo or a stray non-printing character.Rennin
I figured it out. I called one of the variables PATH. Rookie mistakeBluing
@Bluing I recommend always using lowercase or mixed case variable names for that very reason.Rennin
@DennisWilliamson You should enclose the seperator e.g. when using ;: while IFS=";" read col1 col2; do ...Capello
@thomas.mc.work: That's true in the case of semicolons and other characters that are special to the shell. In the case of a comma, it's not necessary and I tend to prefer to omit characters that are unnecessary. For example, you could always specify variables for expansion using curly braces (e.g. ${var}), but I omit them when they're not necessary. To me, it looks cleaner.Rennin
I extended the case for the long row here https://mcmap.net/q/41347/-how-to-parse-csv-file-for-long-row-in-bash/54964Toname
@DennisWilliamson, From some time, bash source tree offer a loadable builtin csv parser! Have a look at my answer! Of course there are some limitations...Strut
Is there any option to skip header?Thigmotaxis
@AlwinJose: I edited my answer to show a way to do that.Rennin
@DennisWilliamson Can you please explain what does the line done < myfile.csv do?Rosebud
@MehdiCharife: It redirects the contents of the file into the while loop (where the fields are split and processed). It's equivalent to something like cat myfile.csv | while but it doesn't set up a subshell and so variables set within the loop retain their values when the loop is finished. See BashFAQ/024 for more information.Rennin
S
20

How to parse a CSV file in Bash?

Coming late to this question and as do offer new features, because this question stand about and because none of already posted answer show this powerful and compliant way of doing precisely this.

Parsing CSV files under bash, using loadable module

Conforming to RFC 4180, a string like this sample CSV row:

12,22.45,"Hello, ""man"".","A, b.",42

should be splitted as

1  12
2  22.45
3  Hello, "man".
4  A, b.
5  42

bash loadable .C compiled modules.

Under , you could create, edit, and use loadable compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)

Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:

accept        listen for and accept a remote network connection on a given port
asort         Sort arrays in-place
basename      Return non-directory portion of pathname.
cat           cat(1) replacement with no options - the way cat was intended.
csv           process one line of csv data and populate an indexed array.
dirname       Return directory portion of pathname.
fdflags       Change the flag associated with one of bash's open file descriptors.
finfo         Print file info.
head          Copy first part of files.
hello         Obligatory "Hello World" / sample loadable.
...
tee           Duplicate standard input.
template      Example template for loadable builtin.
truefalse     True and false builtins.
tty           Return terminal name.
uname         Print system information.
unlink        Remove a directory entry.
whoami        Print out username of current user.

There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!

Under Debian GNU/Linux based system, you may have to install bash-builtins package by

apt install bash-builtins

Using loadable bash-builtins:

Then:

enable -f /usr/lib/bash/csv csv

From there, you could use csv as a bash builtin.

With my sample: 12,22.45,"Hello, ""man"".","A, b.",42

csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[@]}" | cat -n
     1      12
     2      22.45
     3      Hello, "man".
     4      A, b.
     5      42

Then in a loop, processing a file.

while IFS= read -r line;do
    csv -a aVar "$line"
    printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv

This way is clearly the quickest and strongest than using any other combination of builtins or fork to any binary.

Unfortunely, depending on your system implementation, if your version of was compiled without loadable, this may not work...

Complete sample with multiline CSV fields.

Conforming to RFC 4180, a string like this single CSV row:

12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42

should be splitted as

1  12
2  22.45
3  Hello "man",
   This is a good day, today!
4  A, b.
5  42

Full sample script for parsing CSV containing multilines fields

Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.

Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21

And a small script able to parse this file correctly:

#!/bin/bash

enable -f /usr/lib/bash/csv csv

file="sample.csv"
exec {FD}<"$file"

read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[@]}"
numcols=${#headline[@]}

while read -ru $FD line;do
    while csv -a row "$line" ; (( ${#row[@]} < numcols )) ;do
        read -ru $FD sline || break
        line+=$'\n'"$sline"
    done
    printf "$fieldfmt\\n" "${row[@]}"
done

This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')

Id      : "1234"
Name    : "Cpt1023"
Desc    : "Energy\ counter"
Value   : "34213"

Id      : "2343"
Name    : "Sns2123"
Desc    : "$'Temperatur sensor\nto trigg for alarm'"
Value   : "48.4"

Id      : "42"
Name    : "Eye1412"
Desc    : "$'Solar sensor "Day /\nNight"'"
Value   : "12199.21"

You could find a full working sample there: csvsample.sh.txt or csvsample.sh.

Note:

In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).

Warning:

Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!

Read carefully csv.c source code comments and RFC 4180!

Note about quoted multi-line fields

In particular if multi-line field is located on last column, this method won't loop correctly upto second quote.

For this, you have to check quotes parity in $line before parsing using csv module.

Strut answered 10/10, 2021 at 10:51 Comment(1)
Of course, parsing csv under bash is not perfect: csv loadable won't be able to handle binary fields and you may encounter encoding issues and/or security issues... Read carefully RFC 4180!!!Strut
P
11

From the man page:

-d delim The first character of delim is used to terminate the input line, rather than newline.

You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.

Protocol answered 26/11, 2010 at 15:35 Comment(0)
S
7

We can parse csv files with quoted strings and delimited by say | with following code

while read -r line
do
    field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
    field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')

    echo "$field1 $field2"
done < "$csvFile"

awk parses the string fields to variables and tr removes the quote.

Slightly slower as awk is executed for each field.

Streaming answered 25/1, 2019 at 8:24 Comment(2)
Good, you can also use coma (,)Bosomy
Processing a line at a time with Awk is a gross antipattern. awk -F'|' '{ gsub(/"/, ""); print $1, $2 }' "$csvFile"Sublet
C
4

In addition to the answer from @Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:

{
  read
  while IFS=, read -r col1 col2
  do
    echo "I got:$col1|$col2"
  done 
} < myfile.csv
Cavin answered 27/3, 2021 at 8:47 Comment(2)
instead of just read (that will uselessly populate $REPLY variable), you could use _ as garbage variable: read _. Or even, you could store first lines as header which could be useful further in the script IFS=, read -ra headline. ( Have a look at my answer ;)Strut
And maybe could you be interested by this other post about parsing output of commands (Have a look how I parse df -k output).Strut
W
0

If you want to read CSV file with some lines, so this the solution.

while IFS=, read -ra line
do 
    test $i -eq 1 && ((i=i+1)) && continue
    for col_val in ${line[@]}
    do
        echo -n "$col_val|"                 
    done
    echo        
done < "$csvFile"
Wiburg answered 7/10, 2019 at 10:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.