How do I use awk under cygwin to print fields from an excel spreadsheet?
Asked Answered
B

1

2

We seem to be seeing more and more questions about executing awk on Excel spreadsheets so here is a Q/A on how to do that specific thing.

I have this information in an Excel spreadsheet "$D/staff.xlsx" (where "$D" is the path to my Desktop):

Name   Position
Sue    Manager
Bill   Secretary
Pat    Engineer

and I want to print the Position field for a given Name, e.g. output Secretary given the input Bill.

I can currently save as CSV from Excel to get:

$ cat "$D/staff.csv"
Name,Position
Sue,Manager
Bill,Secretary
Pat,Engineer

and then run:

$ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csv"
Secretary

but this is just a small part of a larger task and so I have to be able to do this automatically from a shell script without manually opening Excel to export the CSV file. How do I do that from a Windows PC running cygwin?

Battlefield answered 6/8, 2016 at 14:7 Comment(2)
There are better tools to handle conversions which are much more robust than a simple text parser. See unix.stackexchange.com/questions/23726/… and any other link related to file type conversion from command line.Ainsworth
Also, XLSX is merely a zip file, with xml files inside it. See #11082778Ainsworth
B
5

The combination of the following VBS and shell scripts create a CSV file for each sheet in the Excel spreadsheet:

$ cat xls2csv.vbs
csv_format = 6

Dim strFilename
Dim objFSO
Set objFSO = CreateObject("scripting.filesystemobject")
strFilename = objFSO.GetAbsolutePathName(WScript.Arguments(0))
If objFSO.fileexists(strFilename) Then
  Call Writefile(strFilename)
Else
  wscript.echo "no such file!"
End If
Set objFSO = Nothing

Sub Writefile(ByVal strFilename)
Dim objExcel
Dim objWB
Dim objws

Set objExcel = CreateObject("Excel.Application")
Set objWB = objExcel.Workbooks.Open(strFilename)

For Each objws In objWB.Sheets
  objws.Copy
  objExcel.ActiveWorkbook.SaveAs objWB.Path & "\" & objws.Name & ".csv", csv_format
  objExcel.ActiveWorkbook.Close False
Next

objWB.Close False
objExcel.Quit
Set objExcel = Nothing
End Sub

.

$ cat xls2csv
PATH="$HOME:$PATH"

# the original XLS input file path components
inXlsPath="$1"
inXlsDir=$(dirname "$inXlsPath")
xlsFile=$(basename "$inXlsPath")
xlsBase="${xlsFile%.*}"

# The tmp dir we'll copy the XLS to and run the tool on
# to get the CSVs generated
tmpXlsDir="/usr/tmp/${xlsBase}.$$"
tmpXlsPath="${tmpXlsDir}/${xlsFile}"
absXlsPath="C:/cygwin64/${tmpXlsPath}" # need an absolute path for VBS to work

mkdir -p "$tmpXlsDir"

trap 'rm -f "${tmpXlsDir}/${xlsFile}"; rmdir "$tmpXlsDir"; exit' 0

cp "$inXlsPath" "$tmpXlsDir"

cygstart "$HOME/xls2csv.vbs" "$absXlsPath"

printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be created:\n" >&2
while [ ! -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
    # VBS is done when this tmp file is created and later removed
    printf "." >&2
    sleep 1
done
printf " Done.\n" >&2

printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be removed:\n" >&2
while [ -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
    # VBS is done when this tmp file is removed
    printf "." >&2
    sleep 1
done
printf " Done.\n" >&2

numFiles=0
for file in "$tmpXlsDir"/*.csv
do
    numFiles=$(( numFiles + 1 ))
done

if (( numFiles >= 1 ))
then
    outCsvDir="${inXlsDir}/${xlsBase}.csvs"
    mkdir -p "$outCsvDir"
    mv "$tmpXlsDir"/*.csv "$outCsvDir"
fi

Now we execute the shell script which internally calls cygstart to run the VBS script to generate the CSV files (one per sheet) in a subdirectory under the same directory where the Excel file exists named based on the Excel file name (e.g. Excel file staff.xlsx produces CSVs directory staff.csvs):

$ ./xls2csv "$D/staff.xlsx"
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be created:
.. Done.
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be removed:
. Done.

There is only one sheet with the default name Sheet1 in the target Excel file "$D/staff.xlsx" so the output of the above is a file "$D/staff.csvs/Sheet1.csv":

$ cat "$D/staff.csvs/Sheet1.csv"
Name,Position
Sue,Manager
Bill,Secretary
Pat,Engineer

$ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csvs/Sheet1.csv"
Secretary

Also see What's the most robust way to efficiently parse CSV using awk? for how to then operate on those CSVs.

See also https://mcmap.net/q/41795/-gawk-3-1-6-1-on-windows-7-x64-pro-gets-0-return-code-using-system-even-on-failed-commands for how to do the opposite, i.e. call a cygwin bash command from a Windows batch file.

Battlefield answered 6/8, 2016 at 14:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.