I have a requirement to convert Excel (2010) files to csv. Currently I'm using Excel Interop to open and SaveAs csv, which works well. However the Interop has some issues in the environemt where we use it, so I'm looking for another solution.
I found the way to work with Excel files without interop is to use the OpenXML SDK. I got some code together to itterate through all the cells in each sheet and simply writes them to another file in CSV.
One problem I have is handling blank rows and cells. It seems that, with this code, blank rows and cells are completely non-existant so I have no way to know about them. Is there away to itterate through all rows and cells, including blanks?
string filename = @"D:\test.xlsx";
string outputDir = Path.GetDirectoryName(filename);
//--------------------------------------------------------
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filename, false))
{
foreach (Sheet sheet in document.WorkbookPart.Workbook.Descendants<Sheet>())
{
WorksheetPart worksheetPart = (WorksheetPart) document.WorkbookPart.GetPartById(sheet.Id);
Worksheet worksheet = worksheetPart.Worksheet;
SharedStringTablePart shareStringPart = document.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringItem[] items = shareStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
// Create a new filename and save this file out.
if (string.IsNullOrWhiteSpace(outputDir))
outputDir = Path.GetDirectoryName(filename);
string newFilename = string.Format("{0}_{1}.csv", Path.GetFileNameWithoutExtension(filename), sheet.Name);
newFilename = Path.Combine(outputDir, newFilename);
using (var outputFile = File.CreateText(newFilename))
{
foreach (var row in worksheet.Descendants<Row>())
{
StringBuilder sb = new StringBuilder();
foreach (Cell cell in row)
{
string value = string.Empty;
if (cell.CellValue != null)
{
// If the content of the first cell is stored as a shared string, get the text
// from the SharedStringTablePart. Otherwise, use the string value of the cell.
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
value = items[int.Parse(cell.CellValue.Text)].InnerText;
else
value = cell.CellValue.Text;
}
// to be safe, always use double quotes.
sb.Append(string.Format("\"{0}\",", value.Trim()));
}
outputFile.WriteLine(sb.ToString().TrimEnd(','));
}
}
}
}
If I have the following Excel file data:
one,two,three
,,
last,,row
I will get the following CSV (which is wrong):
one,two,three
last,row