Replace Text in Word document using Open Xml
Asked Answered
S

11

30

I have created a docx file from a word template, now I am accessing the copied docx file and want to replace certain text with some other data.

I am unable to get the hint as to how to access the text from the doument main part?

Any help would be appreciable.

Below is my code till now.

private void CreateSampleWordDocument()
    {
        string sourceFile = Path.Combine("D:\\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }

Now how to find certain text and replace the same? I am unable to get via Link, so some code hint would be appreciable.

Squinty answered 19/8, 2013 at 14:54 Comment(0)
S
28

Just to give you the idea of how to do it, please try:

  using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
            {
                var body = doc.MainDocumentPart.Document.Body;
                var paras = body.Elements<Paragraph>();

                foreach (var para in paras)
                {
                    foreach (var run in para.Elements<Run>())
                    {
                        foreach (var text in run.Elements<Text>())
                        {
                            if (text.Text.Contains("text-to-replace"))
                            {
                                text.Text = text.Text.Replace("text-to-replace", "replaced-text");
                            }
                        }
                    }
                }
            }
        }

Please note the text is case sensitive. The text formatting won't be changed after the replace. Hope this helps you.

Selfmortification answered 20/8, 2013 at 15:44 Comment(4)
I had asked you to give answer to my previous question as well as your link helped me, so post answer there as well.Squinty
@flowerking : If you have a a few mins could you help out with this? stackoverflow.com/questions/26307691Presume
this only replaces text in one run. However, text may be chopped up in different runs, which fisrt must be concatenated before replacement can be done.Barbera
See sersat's response below for a much more robust solution: https://mcmap.net/q/467923/-replace-text-in-word-document-using-open-xmlIsla
S
23

In addition to Flowerking's answer:

When your Word file has textboxes in it, his solution would not work. Because textbox has TextBoxContent element so it will not appear at foreach loop of Runs.

But when writing

using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
{
    var document = doc.MainDocumentPart.Document

    foreach (var text in document.Descendants<Text>()) // <<< Here
    {
        if (text.Text.Contains("text-to-replace"))
        {
            text.Text = text.Text.Replace("text-to-replace", "replaced-text");
        }
    } 
}
        

it will loop all the texts in document(whether it is in textbox or not) so it will replace the texts.

Note that if the text is split between Runs or Textboxes, this also won't work. You need a better solution for those cases. One solution to split texts could be fixing the "template", sometimes, simply deleting the placeholder and re-creating it works wonders.

Skite answered 26/7, 2015 at 23:50 Comment(8)
Note: you need to be using DocumentFormat.OpenXml.Wordprocessing (my intellisense was suggesting a bunch of other things instead).Museology
This also works: var text = doc.MainDocumentPart.Document.Descendants<Text>().Where(t => t.Text.Contains("text-to-replace")).FirstOrDefault();Biggin
It is very common for text to be split into multiple runs (despite each run having the same properties). Among other things, this is caused by the spelling/grammar checker & number of editing attempts. Text splitting is more common when a placeholder is de-limited eg [customer-name] etc. For replacing placeholders, rather than using standard text, it is better to use content-controls, rendered as <w:sdt> elements. Within content-controls, the tag-name text is never split and can be always found.Tolidine
@KevinSwann I tried inserting content controls but the tag name is still split sometimes, e.g. [tag] in a content control (both rtf and unformatted) will be split into three parts with 'tag' being one of them. Would you like to elaborate on how you made it work with content controls?Alphonso
Since the tags appearance were not that important in my scenario I just ended up using a single character as tags, just make sure they don't exist in the rest of the document. I used unusual ones (for English at least), such as Â, ß, Ç, Ð etc. They will not be split up.Alphonso
@Alphonso For a content-control, the tag name is different to the text. The tag name is never split. For a plain text content-control, in the XML, it is the field <sdt>...<w:tag w:val="customer-name"/>. The content-control can always be identified using the tag name. The content-control text can be split in the same way as text in paragraphs can be split into several runs.Tolidine
@KevinSwann How do you change/see the tag name and not just the text inside Word? When I add a content control I only see that I can add text.Alphonso
@Alphonso The tag-name can be seen 1) in the XML 2) in the properties of the content-control from the Word developer tag 3) when you find the content-control via OpenXml. The tag-name should not be changed - it is the ID to find the content-control and then process it, typically to replace text or act as a placeholder to insert text etc. There are more details in my answer here https://mcmap.net/q/472445/-how-do-you-change-the-content-of-a-content-control-in-word-2007-with-openxml-sdk-2-0Tolidine
T
10

My class for replacing long pharses in word document, that word split in different text blocks:

class itself:

using System.Collections.Generic;
using System.Text;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace WebBackLibrary.Service
{
    public class WordDocumentService
    {
        private class WordMatchedPhrase
        {
            public int charStartInFirstPar { get; set; }
            public int charEndInLastPar { get; set; }

            public int firstCharParOccurance { get; set; }
            public int lastCharParOccurance { get; set; }
        }

        public WordprocessingDocument ReplaceStringInWordDocumennt(WordprocessingDocument wordprocessingDocument, string replaceWhat, string replaceFor)
        {
            List<WordMatchedPhrase> matchedPhrases = FindWordMatchedPhrases(wordprocessingDocument, replaceWhat);

            Document document = wordprocessingDocument.MainDocumentPart.Document;
            int i = 0;
            bool isInPhrase = false;
            bool isInEndOfPhrase = false;
            foreach (Text text in document.Descendants<Text>()) // <<< Here
            {
                char[] textChars = text.Text.ToCharArray();
                List<WordMatchedPhrase> curParPhrases = matchedPhrases.FindAll(a => (a.firstCharParOccurance.Equals(i) || a.lastCharParOccurance.Equals(i)));
                StringBuilder outStringBuilder = new StringBuilder();
                
                for (int c = 0; c < textChars.Length; c++)
                {
                    if (isInEndOfPhrase)
                    {
                        isInPhrase = false;
                        isInEndOfPhrase = false;
                    }

                    foreach (var parPhrase in curParPhrases)
                    {
                        if (c == parPhrase.charStartInFirstPar && i == parPhrase.firstCharParOccurance)
                        {
                            outStringBuilder.Append(replaceFor);
                            isInPhrase = true;
                        }
                        if (c == parPhrase.charEndInLastPar && i == parPhrase.lastCharParOccurance)
                        {
                            isInEndOfPhrase = true;
                        }

                    }
                    if (isInPhrase == false && isInEndOfPhrase == false)
                    {
                        outStringBuilder.Append(textChars[c]);
                    }
                }
                text.Text = outStringBuilder.ToString();
                i = i + 1;
            }

            return wordprocessingDocument;
        }

        private List<WordMatchedPhrase> FindWordMatchedPhrases(WordprocessingDocument wordprocessingDocument, string replaceWhat)
        {
            char[] replaceWhatChars = replaceWhat.ToCharArray();
            int overlapsRequired = replaceWhatChars.Length;
            int overlapsFound = 0;
            int currentChar = 0;
            int firstCharParOccurance = 0;
            int lastCharParOccurance = 0;
            int startChar = 0;
            int endChar = 0;
            List<WordMatchedPhrase> wordMatchedPhrases = new List<WordMatchedPhrase>();
            //
            Document document = wordprocessingDocument.MainDocumentPart.Document;
            int i = 0;
            foreach (Text text in document.Descendants<Text>()) // <<< Here
            {
                char[] textChars = text.Text.ToCharArray();
                for (int c = 0; c < textChars.Length; c++)
                {
                    char compareToChar = replaceWhatChars[currentChar];
                    if (textChars[c] == compareToChar)
                    {
                        currentChar = currentChar + 1;
                        if (currentChar == 1)
                        {
                            startChar = c;
                            firstCharParOccurance = i;
                        }
                        if (currentChar == overlapsRequired)
                        {
                            endChar = c;
                            lastCharParOccurance = i;
                            WordMatchedPhrase matchedPhrase = new WordMatchedPhrase
                            {
                                firstCharParOccurance = firstCharParOccurance,
                                lastCharParOccurance = lastCharParOccurance,
                                charEndInLastPar = endChar,
                                charStartInFirstPar = startChar
                            };
                            wordMatchedPhrases.Add(matchedPhrase);
                            currentChar = 0;
                        }
                    }
                    else
                    {
                        currentChar = 0;

                    }
                }
                i = i + 1;
            }

            return wordMatchedPhrases;

        }

    }
}

And example of easy-to-use:

public void EditWordDocument(UserContents userContents)
        {
            string filePath = Path.Combine(userContents.PathOnDisk, userContents.FileName);
            WordDocumentService wordDocumentService = new WordDocumentService();
            if (userContents.ContentType.Contains("word") && File.Exists(filePath))
            {
                string saveAs = "modifiedTechWord.docx";
                //
                using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true)) //open source word file
                {
                    Document document = doc.MainDocumentPart.Document;
                    OpenXmlPackage res = doc.SaveAs(Path.Combine(userContents.PathOnDisk, saveAs)); // copy it
                    res.Close();
                }
                using (WordprocessingDocument doc = WordprocessingDocument.Open(Path.Combine(userContents.PathOnDisk, saveAs), true)) // open copy
                {
                    string replaceWhat = "{transform:CandidateFio}";
                    string replaceFor = "ReplaceToFio";
                    var result = wordDocumentService.ReplaceStringInWordDocumennt(doc, replaceWhat, replaceFor); //replace words in copy
                }
            }
        }
Tungusic answered 14/7, 2020 at 17:19 Comment(4)
Best answer I see today on stackoverflow. Can you use this on PDF?Baptlsta
Thanks, I guess you can, but I haven't tried :)Tungusic
Well, It seems like the OpenXML is for word document only. I have tried using this code on PDF and seems not to work. Getting an error that the file is corrupted. IF you have a piece of code that will work on editing the template file as pdf still searching.Baptlsta
Sorry, don't have any :( You can try something like this docs.aspose.com/pdf/net/create-pdf-document docs.aspose.com/pdf/net/manipulate-pdf-documentTungusic
Z
6

The easiest and most accurate way I have found so far is to use Open-Xml-PowerTools. Personally, i'm with dotnet core, so I use this nuget package.

using OpenXmlPowerTools;
// ...

protected byte[] SearchAndReplace(byte[] file, IDictionary<string, string> translations)
{
    WmlDocument doc = new WmlDocument(file.Length.ToString(), file);

    foreach (var translation in translations)
        doc = doc.SearchAndReplace(translation.Key, translation.Value, true);

    return doc.DocumentByteArray;
}

Example of use:

var templateDoc = File.ReadAllBytes("templateDoc.docx");
var generatedDoc = SearchAndReplace(templateDoc, new Dictionary<string, string>(){
    {"text-to-replace-1", "replaced-text-1"},
    {"text-to-replace-2", "replaced-text-2"},
});
File.WriteAllBytes("generatedDoc.docx", generatedDoc);

For more information, see Search and Replace Text in an Open XML WordprocessingML Document

Zakaria answered 14/8, 2020 at 21:5 Comment(2)
doesn't work, it says that param cannot be 0. Occured at doc = doc.SearchBowler
So this helped me, but I was able to do it much simpler. I didn't have to read the file into a byte array, and didn't have to write it out. Just opened it by the name, did the search and replace, and then saved it. It found everything (at least for my first try) including instances where part of the word was bold and in a different color.Melodize
M
3

Maybe this solution is easier:
1. a StreamReader reads all the text,
2. using a Regex you case-insensitively replace the new text instead of the old tex
3. a StreamWriter writes again the modified text into the document.

 using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        docText = sr.ReadToEnd();

    foreach (var t in findesReplaces)
        docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);

    using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        sw.Write(docText);
}
Mims answered 17/6, 2014 at 8:29 Comment(3)
@Roy do you think that now is better?Mims
Yes. Thanks for adding a description to your fine answerAuld
This won't work if I need to replace something like "{{FullName}}" with "Paul Jones"Dunbar
S
3

Here is a solution that can find and replace tags in an open xml (word) document across text runs (including text boxes)

namespace Demo
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text.RegularExpressions;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;

    public class WordDocumentHelper
    {
        class DocumentTag
        {
            public DocumentTag()
            {
                ReplacementText = "";
            }

            public string Tag { get; set; }
            public string Table { get; set; }
            public string Column { get; set; }
            public string ReplacementText { get; set; }

            public override string ToString()
            {
                return ReplacementText ?? (Tag ?? "");
            }
        }

        private const string TAG_PATTERN = @"\[(.*?)[\.|\:](.*?)\]";
        private const string TAG_START = @"[";
        private const string TAG_END = @"]";

        /// <summary>
        /// Clones a document template into the temp folder and returns the newly created clone temp filename and path.
        /// </summary>
        /// <param name="templatePath"></param>
        /// <returns></returns>
        public string CloneTemplateForEditing(string templatePath)
        {
            var tempFile = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName()) + Path.GetExtension(templatePath);
            File.Copy(templatePath, tempFile);
            return tempFile;
        }

        /// <summary>
        /// Opens a given filename, replaces tags, and saves. 
        /// </summary>
        /// <param name="filename"></param>
        /// <returns>Number of tags found</returns>
        public int FindAndReplaceTags(string filename)
        {
            var allTags = new List<DocumentTag>();

            using (WordprocessingDocument doc = WordprocessingDocument.Open(path: filename, isEditable: true))
            {
                var document = doc.MainDocumentPart.Document;

                // text may be split across multiple text runs so keep a collection of text objects
                List<Text> tagParts = new List<Text>();

                foreach (var text in document.Descendants<Text>())
                {
                    // search for any fully formed tags in this text run
                    var fullTags = GetTags(text.Text);

                    // replace values for fully formed tags
                    fullTags.ForEach(t => {
                        t = GetTagReplacementValue(t);
                        text.Text = text.Text.Replace(t.Tag, t.ReplacementText);
                        allTags.Add(t);
                    });

                    // continue working on current partial tag
                    if (tagParts.Count > 0)
                    {
                        // working on a tag
                        var joinText = string.Join("", tagParts.Select(x => x.Text)) + text.Text;

                        // see if tag ends with this block
                        if (joinText.Contains(TAG_END))
                        {
                            var joinTag = GetTags(joinText).FirstOrDefault(); // should be just one tag (or none)
                            if (joinTag == null)
                            {
                                throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                            }

                            joinTag = GetTagReplacementValue(joinTag);
                            allTags.Add(joinTag);

                            // replace first text run in the tagParts set with the replacement value. 
                            // (This means the formatting used on the first character of the tag will be used)
                            var firstRun = tagParts.First();
                            firstRun.Text = firstRun.Text.Substring(0, firstRun.Text.LastIndexOf(TAG_START));
                            firstRun.Text += joinTag.ReplacementText;

                            // replace trailing text runs with empty strings
                            tagParts.Skip(1).ToList().ForEach(x => x.Text = "");

                            // replace all text up to and including the first index of TAG_END
                            text.Text = text.Text.Substring(text.Text.IndexOf(TAG_END) + 1);

                            // empty the tagParts list so we can start on a new tag
                            tagParts.Clear();
                        }
                        else
                        {
                            // no tag end so keep getting text runs
                            tagParts.Add(text);
                        }
                    }

                    // search for new partial tags
                    if (text.Text.Contains("["))
                    {
                        if (tagParts.Any())
                        {
                            throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                        }
                        tagParts.Add(text);
                        continue;
                    }

                }

                // save the temp doc before closing
                doc.Save();
            }

            return allTags.Count;
        }

        /// <summary>
        /// Gets a unique set of document tags found in the passed fileText using Regex
        /// </summary>
        /// <param name="fileText"></param>
        /// <returns></returns>
        private List<DocumentTag> GetTags(string fileText)
        {
            List<DocumentTag> tags = new List<DocumentTag>();

            if (string.IsNullOrWhiteSpace(fileText))
            {
                return tags;
            }

            // TODO: custom regex for tag matching 
            // this example looks for tags in the formation "[table.column]" or "[table:column]" and captures the full tag, "table", and "column" into match Groups
            MatchCollection matches = Regex.Matches(fileText, TAG_PATTERN);
            foreach (Match match in matches)
            {
                try
                {

                    if (match.Groups.Count < 3
                        || string.IsNullOrWhiteSpace(match.Groups[0].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[1].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[2].Value))
                    {
                        continue;
                    }

                    tags.Add(new DocumentTag
                    {
                        Tag = match.Groups[0].Value,
                        Table = match.Groups[1].Value,
                        Column = match.Groups[2].Value
                    });
                }
                catch
                {

                }
            }

            return tags;
        }

        /// <summary>
        /// Set the Tag replacement value of the pasted tag
        /// </summary>
        /// <returns></returns>
        private DocumentTag GetTagReplacementValue(DocumentTag tag)
        {
            // TODO: custom routine to update tag Replacement Value

            tag.ReplacementText = "foobar";

            return tag;
        }
    }
}
Shanitashank answered 9/7, 2019 at 21:19 Comment(0)
M
1

I'm testing this out for doc generation, but my placeholders were split across run and text nodes. I didn't want to load the whole doc as a single string for regex find/replace, so I worked with the OpenXml api. My idea is to:

  1. clean up the placeholder nodes as a one time operation on the document
  2. find/replace by node value each time it's generated, now that the source is clean.

Testing showed that placeholders were split across runs and text nodes, but not paragraphs. I also found that subsequent placeholders didn't share text nodes, so I didn't handle that. Placeholders follow the pattern {{placeholder_name}}.

First, I needed to get all the text nodes in the paragraph (per @sertsedat):

    var nodes = paragraph.Descendants<Text>();

Testing showed that this function preserves order, which was perfect for my use case since I could iterate through the collection looking for start/stop indicators, and group those nodes that were part of my placeholders.

The grouping function looked for {{ and }} in the text node values to identify nodes that were part of the placeholder and should be deleted, and other nodes, which should be ignored.

Once the start of a node was found, all subsequent nodes, up to and including the termination, would need to be deleted (marked by adding to the TextNodes list), the value of those nodes included in the placeholder StringBuilder, and any text part of the first/last node that was not a part of the placeholder would also need to be saved (thus the string properties). Any incomplete groups when a new placeholder was found or at the end of the sequence should throw an error.

Finally, I used the grouping to update the original doc

foreach (var placeholder in GroupPlaceholders(paragraph.Descendants<Text>()))
{
    var firstTextNode = placeholder.TextNodes[0];
    if (placeholder.PrecedingText != null)
    {
        firstTextNode.Parent.InsertBefore(new Text(placeholder.PrecedingText), firstTextNode);
    }
    firstTextNode.Parent.InsertBefore(placeholder.PlaceholderText, firstTextNode);
    if (placeholder.SubsequentText != null)
    {
        firstTextNode.Parent.InsertBefore(new Text(placeholder.SubsequentText), firstTextNode);
    }
    foreach (var textNode in placeholder.TextNodes) {
        textNode.Remove();                      
    }
}
Magnification answered 5/5, 2021 at 19:54 Comment(0)
T
0

If the text you are looking for is placed between brackets and Word Splits your text in multiple runs...;

Search the text (ienumerable(of text))

for (int i = 0; i <= SearchIn.Count - 1; i++) {

    if (!(i + 2 > SearchIn.Count - 1)) {
        Text TXT = SearchIn(i);
        Text TXT1 = SearchIn(i + 1);
        Text TXT2 = SearchIn(i + 2);

        if (Strings.Trim(TXT.Text) == "[" & Strings.Trim(TXT2.Text) == "]") {
            TXT1.Text = TXT.Text + TXT1.Text + TXT2.Text;

            TXT.Text = "";
            TXT2.Text = "";
        }
    }
}
Thorrlow answered 15/9, 2017 at 11:52 Comment(0)
L
0
Dim doc As WordprocessingDocument = WordprocessingDocument.Open("Chemin", True, New OpenSettings With {.AutoSave = True})

Dim d As Document = doc.MainDocumentPart.Document

Dim txt As Text = d.Descendants(Of Text).Where(Function(t) t.Text = "txtNom").FirstOrDefault

If txt IsNot Nothing Then
 txt.Text = txt.Text.Replace("txtNom", "YASSINE OULARBI")
End If

doc.Close()
Lowpressure answered 15/10, 2019 at 8:56 Comment(0)
F
0

Most of the answers here are wrong for real-world documents.

There are two main solutions. If you have control over the source documents, use Mail Merge fields for find/replace instead of trying to use the text in the document.

If you cant use Mail Merge fields, the solution is to code your own text buffer which combines multiple Text fields. This will allow you to find/replace over text that is split between Text fields, which happens a lot.

Very hard to write correctly due to all the combination of splits that can occur! But it has worked for me for several years and millions of documents processed.

Forthwith answered 13/12, 2021 at 22:37 Comment(2)
This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From ReviewDemeanor
it indicates two solutions. hopefully reader reads this and they don't try some of the other answers which lead to trouble when they scale up in the real world. how can a reviewer know if my answer is good or not if they are not a domain expert in this topic?Forthwith
E
-2

here is solution from msdn.

Example from there:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}
Escalator answered 26/12, 2015 at 19:44 Comment(3)
This is basically useless if word splits the text you search for into multiple runs (or worse)...Cabriole
I am having this exact problem @Cabriole and even going throw the RUNs i don't know where Word splits the text so, it's giving me a huge headache.Incommunicative
@Eduardo My friend tried to solve it but ultimately had to manually go through all runs and try to compose the text. If the word file is under your control, you can edit its xml and fix the occurences you need to replace so that they do not span over multiple runs.Cabriole

© 2022 - 2024 — McMap. All rights reserved.