Duplicating Word document using OpenXml and C#
Asked Answered
F

6

12

I am using Word and OpenXml to provide mail merge functionality in a C# ASP.NET web application:

1) A document is uploaded with a number of pre-defined strings for substitution.

2) Using the OpenXML SDK 2.0 I open the Word document, get the mainDocumentPart as a string and perform the substitution using Regex.

3) I then create a new document using OpenXML, add a new mainDocumentPart and insert the string resulting from the substitution into this mainDocumentPart.

However, all formatting/styles etc. are lost in the new document.

I'm guessing I can copy and add the Style, Definitions, Comment parts etc.. individually to mimic the orginal document.

However is there a method using Open XML to duplicate a document allowing me to perform the substitutions on the new copy?

Thanks.

Freewheeling answered 17/7, 2009 at 11:59 Comment(2)
Why not File.Copy(docName, newName);?Dodecasyllable
Have a look at my answer below for an update on the options you have with the Open XML SDK since 2014/15.Meshwork
W
16

This piece of code should copy all parts from an existing document to a new one.

using (var mainDoc = WordprocessingDocument.Open(@"c:\sourcedoc.docx", false))
using (var resultDoc = WordprocessingDocument.Create(@"c:\newdoc.docx",
  WordprocessingDocumentType.Document))
{
  // copy parts from source document to new document
  foreach (var part in mainDoc.Parts)
    resultDoc.AddPart(part.OpenXmlPart, part.RelationshipId);
  // perform replacements in resultDoc.MainDocumentPart
  // ...
}
Wilhoit answered 31/3, 2010 at 16:53 Comment(4)
I've been banging my head against the wall for hours messing with that MemoryStream business... This works great and is much more concise. Many thanks!Genesis
Is there a way to do a similar thing, the only difference being that the content in the mainDoc needs to be appended to the end of an existing document?Phonoscope
Yes, though much more difficult, since a lot of data from parts of both documents needs to be merged. Thankfully, Eric White has built a set of PowerTools for OpenXML that handles this otherwise daunting task for you. In particular, take a look at the DocumentBuilder which I have used to append one document to another in the past. Worked like a charm!Wilhoit
But why don't you just copy sourcedoc.docx to newdoc.docx in file, then update newdoc.docx?Literator
K
8

I second the use of Content Controls recommendation. Using them to mark up the areas of your document where you want to perform substitution is by far the easiest way to do it.

As for duplicating the document (and retaining the entire document contents, styles and all) it's relatively easy:

string documentURL = "full URL to your document";
byte[] docAsArray = File.ReadAllBytes(documentURL);

using (MemoryStream stream = new MemoryStream)
{
    stream.Write(docAsArray, 0, docAsArray.Length);    // THIS performs doc copy
    using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
    {
        // perform content control substitution here, making sure to call .Save()
        // on any documents Part's changed.
    }
    File.WriteAllBytes("full URL of your new doc to save, including .docx", stream.ToArray());
}

Actually finding the content controls is a piece of cake using LINQ. The following example finds all the Simple Text content controls (which are typed as SdtRun):

using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{                    
    var mainDocument = doc.MainDocumentPart.Document;
    var contentControls = from sdt in mainDocument.Descendants<SdtRun>() select sdt;

    foreach (var cc in contentControls)
    {
        // drill down through the containment hierarchy to get to 
        // the contained <Text> object
        cc.SdtContentRun.GetFirstChild<Run>().GetFirstChild<Text>().Text = "my replacement string";
    }
}

The <Run> and <Text> elements may not already exist but creating them is a simple as:

cc.SdtContentRun.Append(new Run(new Text("my replacement string")));

Hope that helps someone. :D

Kersey answered 8/2, 2010 at 20:50 Comment(1)
Thank you so much!! I was pulling my hair out because every time i load my template and make changes, when i save as a new filename BOTH files would be updated! thank you for sharing this method of using a memorystream to hold a copy of the template, preventing the actual template file from being corrupted :D :DGrandiloquence
M
5

The original question was asked before a number of helpful features were added to the Open XML SDK. Nowadays, if you already have an opened WordprocessingDocument, you would simply clone the original document and perform whatever transformation on that clone.

// Say you have done this somewhere before you want to duplicate your document.
using WordprocessingDocument originalDoc = WordprocessingDocument.Open("original.docx", false);

// Then this is how you can clone the opened WordprocessingDocument.
using var newDoc = (WordprocessingDocument) originalDoc.Clone("copy.docx", true);

// Perform whatever transformation you want to do.
PerformTransformation(newDoc);

You can also clone on a Stream or Package. Overall, you have the following options:

OpenXmlPackage Clone()

OpenXmlPackage Clone(Stream stream)
OpenXmlPackage Clone(Stream stream, bool isEditable)
OpenXmlPackage Clone(Stream stream, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(string path)
OpenXmlPackage Clone(string path, bool isEditable)
OpenXmlPackage Clone(string path, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(Package package)
OpenXmlPackage Clone(Package package, OpenSettings openSettings)

Have a look at the Open XML SDK documentation for details on those methods.

Having said that, if you have not yet opened the WordprocessingDocument, there are at least faster ways to duplicate, or clone, the document. I've demonstrated this in my answer on the most efficient way to clone Office Open XML documents.

Meshwork answered 16/11, 2019 at 15:57 Comment(0)
S
2

I have done some very similar things, but instead of using text substitution strings, I use Word Content Controls. I have documented some of the details in the following blog post, SharePoint and Open Xml. The technique is not specific to SharePoint. You could reuse the pattern in pure ASP.NET or other applications.

Also, I would STRONGLY encourage you to review Eric White's Blog for tips, tricks and techniques regarding Open Xml. Specifically, check out the in-memory manipulation of Open Xml post, and the Word content controls posts. I think you'll find these much more helpful in the long run.

Hope this helps.

Saltine answered 23/7, 2009 at 20:1 Comment(0)
K
2

As an addenda to the above; what's perhaps more useful is finding content controls that have been tagged (using the word GUI). I recently wrote some software that populated document templates that contained content controls with tags attached. To find them is just an extension of the above LINQ query:

var mainDocument = doc.MainDocumentPart.Document;
var taggedContentControls = from sdt in mainDocument.Descendants<SdtElement>()
                            let sdtPr = sdt.GetFirstChild<SdtProperties>()
                            let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
                            where (tag != null)
                            select new
                            {
                                SdtElem = sdt,
                                TagName = tag.GetAttribute("val", W).Value
                            };   

I got this code from elsewhere but cannot remember where at the moment; full credit goes to them.

The query just creates an IEnumerable of an anonymous type that contains the content control and its associated tag as properties. Handy!

Kersey answered 8/2, 2010 at 20:59 Comment(0)
E
0

When you look at an openxml document by changing the extension to zip and opening it you see that that word subfolder contains a _rels folder where all the relations are listed. These relations point to the parts you mentioned (style ...). Actually you need these parts because they contain the definition of the formatting. So not copying them will cause the new document to use the formatting defined in the normal.dot file and not the one defined in the original document. So I think you have to copy them.

Esophagus answered 17/7, 2009 at 12:31 Comment(1)
not really answering the question. read up on how to do it before answering.Inge

© 2022 - 2024 — McMap. All rights reserved.