.NET OpenXML performance issues
Asked Answered
A

5

9

I am attempting to write out an Excel file from an ASP.NET web server using OpenXML. I have about 2100 records and its taking around 20-30 seconds to do this. Any way I can make it faster? Retrieving the 2100 rows from the db takes a fraction of a second. Not sure why manipulating them in memory would take any longer.

Note: ExcelWriter is our custom class, but all its methods are directly from code in this link, http://msdn.microsoft.com/en-us/library/cc861607.aspx

   public static MemoryStream CreateThingReport(List<Thing> things, MemoryStream template)
    {
        SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(template, true);
        WorksheetPart workSheetPart = spreadsheet.WorkbookPart.WorksheetParts.First();

        SharedStringTablePart sharedStringPart = spreadsheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();

        Cell cell = null;
        int index = 0;

        //create cell formatting for header text
        Alignment wrappedAlignment = new Alignment { WrapText = true };
               uint rowOffset = 2;

  foreach (Thing t in things)
        {
            //Received Date
            cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(t.CreateDate.ToShortDateString(), sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

            //Car Part Name
            cell = ExcelWriter.InsertCellIntoWorksheet("B", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(t.CarPart.Name, sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

  rowOffset++; 
   }

 workSheetPart.Worksheet.Save();

        spreadsheet.WorkbookPart.Workbook.Save();
        spreadsheet.Close();

        return template;
Anastase answered 10/5, 2012 at 21:55 Comment(0)
A
7

So it looks like someone in the MSDN community docs ran into similar performance implications. The code below is very inefficient. Someone recommended using a hash table.

For our solution we just removed the insertion of shared strings altogether and went from 1:03 seconds to 0:03 seconds in download time.

//Old: (1:03)
            cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(thing.CreateDate.ToShortDateString(), sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

 //New: (0:03)
             cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
             cell.CellValue = new CellValue(thing.CreateDate.ToShortDateString());
              cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.String);

MSDN Docs (slow solution, they should use a Hash Table instead)

      private static int InsertSharedStringItem(string text, SharedStringTablePart         shareStringPart)
  {
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
    shareStringPart.SharedStringTable = new SharedStringTable();
}

int i = 0;

// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
    if (item.InnerText == text)
    {
        return i;
    }

    i++;
}

// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();

return i;
 }  
Anastase answered 11/5, 2012 at 15:46 Comment(3)
I am facing same issue... I need to write 1000+ rows and in some cases 10000+ rows and it is getting slow like hell... Here you said you can use hash table can you give a example how? or if any other thing you used to improve performance...Psychotomimetic
I'm looking at 500K rows. Have you made other improvements since this post you could share? I have moved to the SAX method to minimize memory usage. And I see about 1000 rows per ~1.1 seconds. If you get faster than this please share.Elvera
This suggestion doesn't use the SharedStringTable, so while it goes faster, the file size increase dramatically when strings are reusedDiscriminant
E
5

@The Internet

Note that String data type is actually for formulas, for text should use InlineString. See 17.18.11 ST_CellType (Cell Type):

  • inlineStr (Inline String) - Cell containing an (inline) rich string, i.e., one not in the shared string table. If this cell type is used, then the cell value is in the is element rather than the v element in the cell (c element).
  • str (String) - Cell containing a formula string.
Enphytotic answered 19/5, 2014 at 20:36 Comment(0)
D
4

The big improment is more Save() function out of loop

 //Save data
        shareStringPart.SharedStringTable.Save();
        worksheetPart.Worksheet.Save();

For 500 records, for me it change from 10 mins to 1 min.

Drouin answered 12/2, 2015 at 10:43 Comment(2)
This is a really important part of working with OpenXML - since many of the subparts require a save, we tend to put the save inside singular action methods or loops instead of in the outside scope where the save is more performant.Wolpert
Removing excessive 'Save()' calls did improve performance but I found bypassing the Shared String Table altogether and saving all text as 'InLineString' to be the fastest for a couple thousand rows. Three hours down to less then a minute.Catalysis
E
3

@kunjee

If you want performance create all required objects upfront so that are not checked on each invocation of this method. This is why SharedStringTable is passed in as parameter instead of the part.

Dictionaries are for fast, indexed lookup, have better performance than a for loop. Are bit faster than hashtables because are strongly typed so don't require boxing. Being strongly typed is a great benefit anyway.

private static int InsertSharedStringItem(string sharedString, SharedStringTable sharedStringTable, Dictionary<string, int> sharedStrings)
{
    int sharedStringIndex;

    if (!sharedStrings.TryGetValue(sharedString, out sharedStringIndex))
    {
        // The text does not exist in the part. Create the SharedStringItem now.
        sharedStringTable.AppendChild(new SharedStringItem(new Text(sharedString)));

        sharedStringIndex = sharedStrings.Count;

        sharedStrings.Add(sharedString, sharedStringIndex);
    }

    return sharedStringIndex;
}
Enphytotic answered 19/5, 2014 at 20:30 Comment(0)
H
0

As mentioned by The Internet, they should have used a Hashtable and as proposed by zquanghoangz they should have moved the Save() out of the loop.

InlineString does work, but it gives MS Excel a headache when opening the generated file with uninformative error messages which can be repaired, but still gives an annoying pop-up.

static Cell AddCellWithSharedStringText(
    [NotNull]string text, 
    [NotNull]Hashtable texts, 
    [NotNull]SharedStringTablePart shareStringPart
)
{
    if (!texts.ContainsKey(text))
    {
        shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
        texts[text] = texts.Count;
    }
    var idx = (int)texts[text];
    Cell c1 = new Cell();
    c1.DataType = CellValues.SharedString;
    c1.CellValue = new CellValue(idx.ToString());
    return c1;
}

This solution brought the export time down from ~5 minutes to 6 seconds on a [9880 x 66] grid.

Hy answered 25/10, 2019 at 7:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.