How to replace an Paragraph's text using OpenXML Sdk
Asked Answered
S

3

11

I am parsing some Openxml word documents using the .Net OpenXml SDK 2.0. I need to replace certain sentences with other sentences as part of the processing. While iterating over the paragraphs, I know when I've found something I need to replace, but I am stumped as to how I can replace it.

For example, lets say I need to replace the sentence "a contract exclusively for construction work that is not building work." with a html snippet to a Sharepoint Reusable content below.

<span class="ms-rtestate-read ms-reusableTextView" contentEditable="false" id="__publishingReusableFragment" fragmentid="/Sites/Sandbox/ReusableContent/132_.000" >a contract exclusively for construction work that is not building work.</span>

PS: I got the docx to Html conversion worked out using xslt, so that is kind of not a problem at this stage

The InnerText property of the Paragraph node gives me the proper text, but the inner text property itself is not settable. So Regex.Match(currentParagraph.InnerText, currentString).Success returns true and tells me that the current paragraph contains the text I want.

As I said, InnerText itself is not settable, so I tried created a new paragraph using outerxml is given below.

string modifiedOuterxml = Regex.Replace(currentParagraph.OuterXml, currentString, reusableContentString);
OpenXmlElement parent = currentParagraph.Parent;
Paragraph modifiedParagraph = new Paragraph(modifiedOuterxml);
parent.ReplaceChild<Paragraph>(modifiedParagraph, currentParagraph);

Even though I am not too concerned about the formatting at this level and it doesn't seem to have any, the outerXML seems to have extra elements that defeat the regex.

..."16" /><w:lang w:val="en-AU" /></w:rPr><w:t>a</w:t></w:r><w:proofErr w:type="gramEnd" /> <w:r w:rsidRPr="00C73B58"><w:rPr><w:sz w:val="16" /><w:szCs w:val="16" /><w:lang w:val="en-AU" /></w:rPr><w:t xml:space="preserve"> contract exclusively for construction work that is not building work.</w:t></w:r></w:p>

So in summary, how would I replace the text in a Paragraph of OpenXml with other text. Even at the expense of losing some of the formatting.

Snowflake answered 25/11, 2010 at 10:35 Comment(0)
S
18

Fixed it myself. The key was to remove all the runs and create new runs in the current paragraph

string modifiedString = Regex.Replace(currentParagraph.InnerText, currentString, reusableContentString);
currentParagraph.RemoveAllChildren<Run>();
currentParagraph.AppendChild<Run>(new Run(new Text(modifiedString)));
Snowflake answered 26/11, 2010 at 0:24 Comment(0)
C
4

All paragraphs have a text element inside so you just have to find the text element and update its text, for example:

var text = part.RootElement.Descendants<Text>().FirstOrDefault(e=>e.Text == "a contract exclusively for construction work that is not building work.");
if(text != null)
{
    text.Text = "New text here";
}
mainPart.Document.Save();
Combings answered 6/1, 2020 at 11:5 Comment(2)
This is a great approach for modifying text without losing the styling. I just tested going down from a table, to its rows, to a cell in a row, to the paragraphs in the cell, and then to the text using if (var text = paragraph.Descendants<Text>().FirstOrDefault(e => e.Text == "Company Name");. Anyone looking to simply replace text should be able to use this approach.Chaperone
This works fine if the text to replace is in a single Run. I need to replace tags marked with square brackets, though. Word includes the brackets in the Run elements seemingly at random - so sometimes I have a single Run with Text "[myTag]", sometimes I have 3 Runs: "[", "myTag" and "]". Any idea how to fix this?Grani
C
1

Using RemoveAllChildren() and then AppendChild() will indeed lose all styling elements unless you spend another big chunk of codes putting them back. Nick Hoang's and Goal Man's approaches is better without losing any styles.

Replacing text will work best if you use a well-accepted symbol as a placeholder such as '#' or '|' in a template docx, such that

var tag = pghBillAmount.Descendants<WordOpenXML.Text>().FirstOrDefault(p => p.Text == "#");
if (tag != null)
{
    tag.Text = order.BillAmount.ToString("C2");
}

Your bold or highlight styles, etc., will still be there.

Consumable answered 4/3, 2023 at 3:23 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.