Removing a formula from Excel using OpenXML
Asked Answered
H

2

6

I am trying to remove all formulas from a sheet using openxml. This what I am trying:

internal static void ReplaceFormulaWithValue()
{
    var res = _worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>();      

    foreach (Row row in _worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>())
    {
        foreach (Cell cell in row.Elements<Cell>())
        {
            if (cell.CellFormula != null &&
                  cell.CellValue != null)
            {
                string cellRef = cell.CellReference;
                CalculationChainPart calculationChainPart = _spreadSheet.WorkbookPart.CalculationChainPart;
                CalculationChain calculationChain = calculationChainPart.CalculationChain;
                var calculationCells = calculationChain.Elements<CalculationCell>().ToList();
                CalculationCell calculationCell = calculationCells.Where(c => c.CellReference == cellRef).FirstOrDefault();
                //CalculationCell calculationCell = calculationChain.Elements<CalculationCell>().Where(c => c.CellReference == cell.CellReference).FirstOrDefault();

                string value = cell.CellValue.InnerText;
                UpdateCell(cell, DataTypes.String, value);

                cell.CellFormula.Remove();
                calculationCell.Remove();                   
            }
        }
    }
    SaveChanges();
}

Upon opening the excel document, I am receiving the following error:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error192600_01.xml</logFileName><summary>Errors were detected in file 'C:\DEV\ExcelEditor\ExcelEditor\bin\Debug\Chart.xlsx'</summary>
<removedParts summary="Following is a list of removed parts:">
<removedPart>Removed Part: /xl/calcChain.xml part with XML error.  (Calculation properties) Catastrophic failure Line 1, column 138.</removedPart>
</removedParts></recoveryLog>

So I compare the old calcChain.xml file with the newly generated using OpenXML SDK Tool. The old file has the following:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<calcChain xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
 <c i="1" l="1" r="D2"/>
</calcChain>

and the new one after running my code:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<x:calcChain xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">  
</x:calcChain>

Could anyone help if I am missing something here.

Heid answered 25/4, 2017 at 10:2 Comment(0)
R
11

If you are deleting all the formulas, what is the need to have an empty calcChain.xml? Have you tried deleting it?

This works for me:

public static void ReplaceFormulasWithValue()
{
    try
    {
        CalculationChainPart calculationChainPart = _spreadSheet.WorkbookPart.CalculationChainPart;
        CalculationChain calculationChain = calculationChainPart.CalculationChain;
        var calculationCells = calculationChain.Elements<CalculationCell>().ToList();

        foreach (Row row in _worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>())
        {
            foreach (Cell cell in row.Elements<Cell>())
            {
                if (cell.CellFormula != null && cell.CellValue != null)
                {
                    string cellRef = cell.CellReference;                            
                    CalculationCell calculationCell = calculationCells.Where(c => c.CellReference == cellRef).FirstOrDefault();

                    UpdateCell(cell, DataTypes.String, cell.CellValue.InnerText);

                    cell.CellFormula.Remove();
                    if(calculationCell != null)
                    {                       
                        calculationCell.Remove();
                        calculationCells.Remove(calculationCell);
                    }
                    else
                    {
                        //Something is went wrong - log it
                    }                   
                }
                if (calculationCells.Count == 0)
                     _spreadSheet.WorkbookPart.DeletePart(calculationChainPart);

            }
            _worksheetPart.Worksheet.Save();
        }
    }
    catch(Exception ex)
    {
        Console.WriteLine(ex);
    }
}
Rooster answered 25/4, 2017 at 14:49 Comment(1)
This works. But why would an empty xml cause errors? Is there some relationship that needs to be taken care of?Heid
P
2

Recently, I to stumble across the same problem and searched for a solution. Thus, sorry for the late response.

The story of removing a formula with its CalculationCells from a OpenXML document is quite a little bit longer and you can find a full documentation in ISO-IEC 29500-1-2016-Office Open XML File Formats.Part 1-Fundamentals and Markup Language Reference, 18.6 Calculation Chain, pp 1742-1745 from the ISO - Publicly Available Standards site.

Identical cell references in the calculation chain can occur multiple times in different sheets and multiple times with different attributes within the same sheet. For the removal of a formular you have to remove all references in the calculation chain but for the affected worksheet only. Otherwise, you will get an error opening the spreadsheet with Excel.

Here is an extension method which takes this into account:

/// <summary>
/// Removes a formula from a cell
/// </summary>
/// <param name="c">Cell</param>
/// <param name="wsPart">Worksheet part</param>
/// <param name="wbPart">Workbook part for calculation chain</param>
/// <param name="dicCalculationCells">
/// Dictionary of calculation chain entries. This Parameter can be used for performance optimization 
/// as the dictionary can be used for multiple calls.</param>
/// <returns>Dictionary of calculation chain entries for subsequent calls</returns>
public static Dictionary<uint, List<CalculationCell>> RemoveFormula(this Cell c, WorksheetPart wsPart, WorkbookPart wbPart, Dictionary<uint, List<CalculationCell>> dicCalculationCells = null) {

    // Create dictionary data structure for calculation chain. The calculation chain is splitted by workbook sheets.
    // Identical cell references can occur multiple times in different sheets and multiple times with different attributes
    // within the same sheet.
    // The building of a custom data structure is neccesary as the sheet id can come from the previous CalculationCell in the list. (see
    // https://standards.iso.org/ittf/PubliclyAvailableStandards/c071691_ISO_IEC_29500-1_2016.zip , 18.6 Calculation Chain, pp 1742-1745) 
    // However, Excel saves the sheet id in every CalculationCell.
    Dictionary<uint, List<CalculationCell>> CreateCalculationCellsDictionary(CalculationChainPart ccPart) {
        Dictionary<uint, List<CalculationCell>> dic = new Dictionary<uint, List<CalculationCell>>();
        uint iSheet = default;
        foreach(CalculationCell cc in ccPart.CalculationChain.Elements<CalculationCell>()) {
            iSheet = !string.IsNullOrEmpty(cc.SheetId) && uint.TryParse(cc.SheetId, out uint i) ? i : iSheet;   // If SheetId is omitted, it is assumed to be the same as the i value of the previous cell.
            if(dic.ContainsKey(iSheet))
                dic[iSheet].Add(cc);
            else {
                List<CalculationCell> l = new List<CalculationCell>();
                l.Add(cc);
                dic[iSheet] = l;
            }
        }
        return dic;
    }

    // Remove formula and referrencing calculation chain elements
    if(c.CellFormula != null) {

        // Remove formula from cell
        c.CellFormula.Remove();

        // Create calculation chain data structure
        // REMARKS: Data structure is created only if needed. Without a formula, workload can be saved
        if(dicCalculationCells == null)
            dicCalculationCells = CreateCalculationCellsDictionary(wbPart.CalculationChainPart);

        // Update calculation chain
        uint iSheet = wbPart.GetSheetId(wsPart).GetValueOrDefault();
        if(dicCalculationCells.TryGetValue(iSheet, out List<CalculationCell> l)) {

            // Remove calculation chain entries
            foreach(CalculationCell cc in l.Where(cc => cc.CellReference == c.CellReference).ToList()) {
                cc.Remove();
                l.Remove(cc);
            }

            // Update data structures
            if(l.Count == 0)
                dicCalculationCells.Remove(iSheet);
            if(dicCalculationCells.Count == 0)
                wbPart.DeletePart(wbPart.CalculationChainPart);
        }
    }

    return dicCalculationCells;
}

The local function for creating the calculation chain dictionary is for separation concerns only. The Cell extension method uses another extension method for getting the sheet id of a worksheet part:

/// <summary>
/// Returns the sheet id of a WorksheetPart.
/// </summary>
/// <param name="wbPart">WorkbookPart of the OpenXML Document</param>
/// <param name="wsPart">WorksheetPart</param>
/// <returns>Sheet id as int value</returns>
public static uint? GetSheetId(this WorkbookPart wbPart, WorksheetPart wsPart) {
    string sId = wbPart.GetIdOfPart(wsPart);
    return wbPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().FirstOrDefault(s => s.Id == sId)?.SheetId;
}

You can use the extension method with the following code:

internal static void ReplaceFormulaWithValue() {
    Dictionary<uint, List<CalculationCell>> dicCalculationCells = null;
    foreach(Row row in _worksheetPart.Worksheet.GetFirstChild<SheetData>().Elements<Row>())
        foreach(Cell cell in row.Elements<Cell>())
            dicCalculationCells = cell.RemoveFormula(_worksheetPart, _spreadSheet, dicCalculationCells)
            
    SaveChanges();
}
Penknife answered 19/8, 2022 at 11:29 Comment(1)
I couldn't use this code. "wbPart.GetSheetId()" doesn't exist. However, I could remove formulae from my Excel cells by using: "if (wbPart.CalculationChainPart != null) { wbPart.DeletePart(wbPart.CalculationChainPart); }Digitoxin

© 2022 - 2024 — McMap. All rights reserved.