How do you convert Excel to CSV using OpenXML SDK?
Asked Answered
U

3

7

I have a requirement to convert Excel (2010) files to csv. Currently I'm using Excel Interop to open and SaveAs csv, which works well. However the Interop has some issues in the environemt where we use it, so I'm looking for another solution.

I found the way to work with Excel files without interop is to use the OpenXML SDK. I got some code together to itterate through all the cells in each sheet and simply writes them to another file in CSV.

One problem I have is handling blank rows and cells. It seems that, with this code, blank rows and cells are completely non-existant so I have no way to know about them. Is there away to itterate through all rows and cells, including blanks?

string filename = @"D:\test.xlsx";
string outputDir = Path.GetDirectoryName(filename);
//--------------------------------------------------------

using (SpreadsheetDocument document = SpreadsheetDocument.Open(filename, false))
{

    foreach (Sheet sheet in document.WorkbookPart.Workbook.Descendants<Sheet>())
    {
        WorksheetPart worksheetPart = (WorksheetPart) document.WorkbookPart.GetPartById(sheet.Id);
        Worksheet worksheet = worksheetPart.Worksheet;

        SharedStringTablePart shareStringPart = document.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
        SharedStringItem[] items = shareStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();

        // Create a new filename and save this file out.
        if (string.IsNullOrWhiteSpace(outputDir))
            outputDir = Path.GetDirectoryName(filename);
        string newFilename = string.Format("{0}_{1}.csv", Path.GetFileNameWithoutExtension(filename), sheet.Name);
        newFilename = Path.Combine(outputDir, newFilename);

        using (var outputFile = File.CreateText(newFilename))
        {
            foreach (var row in worksheet.Descendants<Row>())
            {
                StringBuilder sb = new StringBuilder();
                foreach (Cell cell in row)
                {
                    string value = string.Empty;
                    if (cell.CellValue != null)
                    {
                        // If the content of the first cell is stored as a shared string, get the text
                        // from the SharedStringTablePart. Otherwise, use the string value of the cell.
                        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                            value = items[int.Parse(cell.CellValue.Text)].InnerText;
                        else
                            value = cell.CellValue.Text;
                    }

                    // to be safe, always use double quotes.
                    sb.Append(string.Format("\"{0}\",", value.Trim()));
                }
                outputFile.WriteLine(sb.ToString().TrimEnd(','));
            }
        }
    }
}

If I have the following Excel file data:

one,two,three
,,
last,,row

I will get the following CSV (which is wrong):

one,two,three
last,row
Unsaid answered 26/8, 2011 at 18:14 Comment(1)
This OpenXML code example from MS documentation Just Works. @user6748024 example below doesn't cover every datatype or use InnerText, which you have to.Earthaearthborn
N
6
//Xlsx to Csv
ConvertXlsxToCsv(@"D:\test.xlsx", @"C:\");

internal static void ConvertXlsxToCsv(string SourceXlsxName, string DestinationCsvDirectory)
{
    try
    {
        using (SpreadsheetDocument document = SpreadsheetDocument.Open(SourceXlsxName, false))
        {

            foreach (Sheet _Sheet in document.WorkbookPart.Workbook.Descendants<Sheet>())
            {
                WorksheetPart _WorksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(_Sheet.Id);
                Worksheet _Worksheet = _WorksheetPart.Worksheet;

                SharedStringTablePart _SharedStringTablePart = document.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
                SharedStringItem[] _SharedStringItem = _SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ToArray();

                if (string.IsNullOrEmpty(DestinationCsvDirectory))
                    DestinationCsvDirectory = Path.GetDirectoryName(SourceXlsxName);
                string newFilename = string.Format("{0}_{1}.csv", Path.GetFileNameWithoutExtension(SourceXlsxName), _Sheet.Name);
                newFilename = Path.Combine(DestinationCsvDirectory, newFilename);

                using (var outputFile = File.CreateText(newFilename))
                {
                    foreach (var row in _Worksheet.Descendants<Row>())
                    {
                        StringBuilder _StringBuilder = new StringBuilder();
                        foreach (Cell _Cell in row)
                        {
                            string Value = string.Empty;
                            if (_Cell.CellValue != null)
                            {
                                if (_Cell.DataType != null && _Cell.DataType.Value == CellValues.SharedString)
                                    Value = _SharedStringItem[int.Parse(_Cell.CellValue.Text)].InnerText;
                                else
                                    Value = _Cell.CellValue.Text;
                            }
                            _StringBuilder.Append(string.Format("{0},", Value.Trim()));
                        }
                        outputFile.WriteLine(_StringBuilder.ToString().TrimEnd(','));
                    }
                }
            }
        }
    }
    catch (Exception Ex)
    {
        throw Ex;
    }
}
Nadaba answered 23/8, 2016 at 13:3 Comment(4)
I don't know why this answer isn't at the top. Most reliable, up-to-date and requires no driver installing or having to talk to evil infrastructure people.Hedveh
This code reading large files you may get OutOfMemoryException cause is reading the entire excel into memory. To scale it it should use OpenXmlReader instead. Please take a look of this link: learn.microsoft.com/en-us/office/open-xml/…Guessrope
This has a number of bugs: It doesn't support "," values inside the values (doesn't encode CSV correctly) and it doesn't support missing cells in a row properlyCelik
@revobtz. How to do that? when I tried with reader the data i can see are some repeating numbers onlyByers
B
4

you can use oledb connection and query the excel file, convert the rows to csv format and save the results to a file

here is a simple example i tested for this it creates a different csv file unicode encoded, tab delimited for each sheet in the excel file

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Linq;
using System.Text;

namespace XlsTests
{
    class Program
    {
        static void Main(string[] args)
        {
            string _XlsConnectionStringFormat = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=NO;IMEX=1\"";
            string xlsFilename = @"C:\test.xlsx";
            using (OleDbConnection conn = new OleDbConnection(string.Format(_XlsConnectionStringFormat, xlsFilename)))
            {
                try
                {
                    conn.Open();

                    string outputFilenameHeade = Path.GetFileNameWithoutExtension(xlsFilename);
                    string dir = Path.GetDirectoryName(xlsFilename);
                    string[] sheetNames = conn.GetSchema("Tables")
                                              .AsEnumerable()
                                              .Select(a => a["TABLE_NAME"].ToString())
                                              .ToArray();
                    foreach (string sheetName in sheetNames)
                    {
                        string outputFilename = Path.Combine(dir, string.Format("{0}_{1}.csv", outputFilenameHeade, sheetName));
                        using (StreamWriter sw = new StreamWriter(File.Create(outputFilename), Encoding.Unicode))
                        {
                            using (DataSet ds = new DataSet())
                            {
                                using (OleDbDataAdapter adapter = new OleDbDataAdapter(string.Format("SELECT * FROM [{0}]", sheetName), conn))
                                {
                                    adapter.Fill(ds);

                                    foreach (DataRow dr in ds.Tables[0].Rows)
                                    {
                                        string[] cells = dr.ItemArray.Select(a => a.ToString()).ToArray();
                                        sw.WriteLine("\"{0}\"", string.Join("\"\t\"", cells));
                                    }
                                }
                            }
                        }
                    }
                }
                catch (Exception exp)
                {
                    // handle exception
                }
                finally
                {
                    if (conn.State != ConnectionState.Open)
                    {
                        try
                        {
                            conn.Close();
                        }
                        catch (Exception ex)
                        {
                            // handle exception
                        }
                    }
                }
            }
        }
    }
}
Biota answered 26/8, 2011 at 20:0 Comment(1)
Thanks @Adam, but this code does not write out the first row from the excel file. I guess it gets treated as column names? This is not what I want. Know any way to avoid that?Unsaid
C
3

I don't think OpenXml is the right tool for this problem. I would recommend getting the data out of the sheet with an OleDbConnection and then into a csv file with this method.

Once you've got the data in a datatable in memory, you've got a lot more control over the situation.

Crossed answered 26/8, 2011 at 18:52 Comment(6)
Doesn't the OleDbConnection require that Excel is installed? The advantage of Open XML SDK is that Excel is not required.Ld
Nope, not required. It will treat the file as a binary data store, which it is. I end up doing this about every two to three years, for some reason :).Crossed
I should also note that I do a lot of OpenXml work...this would be a case of using a nuke to kill a mosquito.Crossed
@Philipp I agree with Chris. I have done this several times, and always just use an OleDbConnection. You do not need excel, just the correct .net Framework.Shire
@Chris this may not require that Excel is installed, but looks like that OLEDB provider must be installed. is this something I would need to install as a pre-requisite to my application?Unsaid
Yes...if you're creating an installer, any version of the .Net framework merge module should include it as part of the System.Data stuff, including the Client Profile ones, I think.Crossed

© 2022 - 2024 — McMap. All rights reserved.