How to retrieve Tab names from excel sheet using OpenXML
Asked Answered
L

4

19

I have a spreadsheet document that has 182 columns in it. I need to place the spreadsheet data into a data table, tab by tab, but i need to find out as I'm adding data from each tab, what is the tab name, and add the tab name to a column in the data table.

This is how I set up the data table.

I then loop in the workbook and drill down to the sheetData object and walk through each row and column, getting cell data.

DataTable dt = new DataTable();
for (int i = 0; i <= col.GetUpperBound(0); i++)
{
    try
    {
        dt.Columns.Add(new DataColumn(col[i].ToString(), typeof(string)));
    }
    catch (Exception e)
    {
        MessageBox.Show("Uploader  Error" + e.ToString());
        return null;
    }
}

dt.Columns.Add(new DataColumn("SheetName", typeof(string)));

However at the end of the string array that I use for the Data Table, I need to add the tab name. How can I find out the tab name as I'm looping in the sheet in Open XML?

Here is my code so far:

using (SpreadsheetDocument spreadSheetDocument = 
           SpreadsheetDocument.Open(Destination, false))
{
    WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
    Workbook workbook = spreadSheetDocument.WorkbookPart.Workbook;

    Sheets sheets = 
        spreadSheetDocument
            .WorkbookPart
            .Workbook
            .GetFirstChild<DocumentFormat.OpenXml.Spreadsheet.Sheets>();

    OpenXmlElementList list = sheets.ChildElements;

    foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
    {
        Worksheet worksheet = worksheetpart.Worksheet;

        foreach (SheetData sheetData in worksheet.Elements<SheetData>())
        {
            foreach (Row row in sheetData.Elements())
            {
                string[] thisarr = new string[183];
                int index = 0;
                foreach (Cell cell in row.Elements())
                {
                    thisarr[(index)] = GetCellValue(spreadSheetDocument, cell);
                    index++;
                }
                thisarr[182] = ""; //need to add tabname here
                if (thisarr[0].ToString() != "")
                {
                    dt.Rows.Add(thisarr);
                }
            }
        }
    }
}

return dt;

Just a note: I did previously get the tab names from the InnerXML property of "list" in

OpenXmlElementList list = sheets.ChildElements;

however I noticed as I'm looping in the spreadsheet it does not get the tab names in the right order.

Lipkin answered 21/9, 2011 at 17:50 Comment(4)
msdn.microsoft.com/en-us/library/bb507946.aspxBollix
that works fine if i just want to pull the tab names out.. which i can, by parsing out the Inner/outer xml.. but I want to do it within my own For loop.. I am having trouble accessing Sheet while im within the SheetData level..Lipkin
You can't work with the "worksheet" object to get its name? The last code example on that page shows how to loop though a sheet's attributes: presumably the sheetname is one of those attributes (I have no experience myself though).Bollix
well there is a "name" property at the worksheet level, but it does not have the tab name. i just went ahead and did it this way: OpenXmlElementList list = sheets.ChildElements; foreach (OpenXmlElement elm in list) { string xml = elm.OuterXml;Lipkin
K
20

The sheet names are stored in the WorkbookPart in a Sheets element which has children of element Sheet which corresponds to each worksheet in the Excel file. All you have to do is grab the correct index out of that Sheets element and that will be the Sheet you are on in your loop. I added a snippet of code below to do what you want.

int sheetIndex = 0;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{                     
    Worksheet worksheet = worksheetpart.Worksheet;

    // Grab the sheet name each time through your loop
    string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(sheetIndex).Name;

    foreach (SheetData sheetData in worksheet.Elements<SheetData>())
    {

       ...
    }
    sheetIndex++;
}
Kandrakandy answered 23/9, 2011 at 15:30 Comment(4)
From my experience, using your code snippet, sheetName reads sheet names in correct order (as they are in the file), but sheetData is not read in the same order as they are in the Excel file. As a result code results in mixed names for all sheets.Prothonotary
It seems that the problem comes from the "excel_file.xlsx\xl_rels\workbook.xml.rels" file, which has spreadsheet names and references to the spreadsheet content stored in some random order. If you reorder them manually from 1 to N (Id="rId1", Id="rId2", ..., Id="rIdN"), the spreadsheet names will align with their content after reading the file. No idea how to handle this issue in code though.Prothonotary
As user @Prothonotary pointed out this solution is wrong! The correct value to look at is the relationshipId, which will not coincide with element index every time - it is just coincidence and luck that it is often the case. (See my answer for how to retrieve and use the relationshipId.)Insurgence
All comments above are true, this solution is not correct.Upright
I
49

Here is a handy helper method to get the Sheet corresponding to a WorksheetPart:

public static Sheet GetSheetFromWorkSheet
    (WorkbookPart workbookPart, WorksheetPart worksheetPart)
{
    string relationshipId = workbookPart.GetIdOfPart(worksheetPart);
    IEnumerable<Sheet> sheets = workbookPart.Workbook.Sheets.Elements<Sheet>();
    return sheets.FirstOrDefault(s => s.Id.HasValue && s.Id.Value == relationshipId);
}

Then you can get the name from the sheets Name-property:

Sheet sheet = GetSheetFromWorkSheet(myWorkbookPart, myWorksheetPart);
string sheetName = sheet.Name;

...this will be the "tab name" OP referred to.


For the record the opposite method would look like:

public static Worksheet GetWorkSheetFromSheet(WorkbookPart workbookPart, Sheet sheet)
{
    var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
    return worksheetPart.Worksheet;
}

...and with that we can also add the following method:

public static IEnumerable<KeyValuePair<string, Worksheet>> GetNamedWorksheets
    (WorkbookPart workbookPart)
{
    return workbookPart.Workbook.Sheets.Elements<Sheet>()
        .Select(sheet => new KeyValuePair<string, Worksheet>
            (sheet.Name, GetWorkSheetFromSheet(workbookPart, sheet)));
}

Now you can easily enumerate through all Worksheets including their name.

Throw it all into a dictionary for name-based lookup if you prefer that:

IDictionary<string, WorkSheet> wsDict = GetNamedWorksheets(myWorkbookPart)
    .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);

...or if you just want one specific sheet by name:

public static Sheet GetSheetFromName(WorkbookPart workbookPart, string sheetName)
{
    return workbookPart.Workbook.Sheets.Elements<Sheet>()
        .FirstOrDefault(s => s.Name.HasValue && s.Name.Value == sheetName);
}

(Then call GetWorkSheetFromSheet to get the corresponding Worksheet.)

Insurgence answered 8/1, 2016 at 11:40 Comment(0)
K
20

The sheet names are stored in the WorkbookPart in a Sheets element which has children of element Sheet which corresponds to each worksheet in the Excel file. All you have to do is grab the correct index out of that Sheets element and that will be the Sheet you are on in your loop. I added a snippet of code below to do what you want.

int sheetIndex = 0;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{                     
    Worksheet worksheet = worksheetpart.Worksheet;

    // Grab the sheet name each time through your loop
    string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(sheetIndex).Name;

    foreach (SheetData sheetData in worksheet.Elements<SheetData>())
    {

       ...
    }
    sheetIndex++;
}
Kandrakandy answered 23/9, 2011 at 15:30 Comment(4)
From my experience, using your code snippet, sheetName reads sheet names in correct order (as they are in the file), but sheetData is not read in the same order as they are in the Excel file. As a result code results in mixed names for all sheets.Prothonotary
It seems that the problem comes from the "excel_file.xlsx\xl_rels\workbook.xml.rels" file, which has spreadsheet names and references to the spreadsheet content stored in some random order. If you reorder them manually from 1 to N (Id="rId1", Id="rId2", ..., Id="rIdN"), the spreadsheet names will align with their content after reading the file. No idea how to handle this issue in code though.Prothonotary
As user @Prothonotary pointed out this solution is wrong! The correct value to look at is the relationshipId, which will not coincide with element index every time - it is just coincidence and luck that it is often the case. (See my answer for how to retrieve and use the relationshipId.)Insurgence
All comments above are true, this solution is not correct.Upright
B
6
    Using spreadsheetDocument As SpreadsheetDocument = spreadsheetDocument.Open("D:\Libro1.xlsx", True)

        Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart

        workbookPart.Workbook.Descendants(Of Sheet)()



        Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.Last
        Dim text As String



        For Each Sheet As Sheet In spreadsheetDocument.WorkbookPart.Workbook.Sheets
            Dim sName As String = Sheet.Name
            Dim sID As String = Sheet.Id

            Dim part As WorksheetPart = workbookPart.GetPartById(sID)
            Dim actualSheet As Worksheet = part.Worksheet

            Dim sheetData As SheetData = part.Worksheet.Elements(Of SheetData)().First

            For Each r As Row In sheetData.Elements(Of Row)()
                For Each c As Cell In r.Elements(Of Cell)()
                    text = c.CellValue.Text
                    Console.Write(text & " ")
                Next
            Next
        Next

    End Using

    Console.Read()
Brauer answered 6/4, 2013 at 12:17 Comment(2)
It would be great if you could add some discussion of why your method works in addition to the code.Amersfoort
This way makes more sense to me as you're searching for the sheet via the name, getting the id of that sheet, and then getting the worksheet based on the name (using linq means you don't need to loop at all). The reply marked as an answer uses an index that requires looping through until you've found what you want.Vulva
B
0
worksheet.GetAttribute("name","").Value
Bollix answered 21/9, 2011 at 20:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.