Merge multiple word documents into one Open Xml
Asked Answered
D

5

24

I have around 10 word documents which I generate using open xml and other stuff. Now I would like to create another word document and one by one I would like to join them into this newly created document. I wish to use open xml, any hint would be appreciable. Below is my code:

 private void CreateSampleWordDocument()
    {
        //string sourceFile = Path.Combine("D:\\GeneralLetter.dot");
        //string destinationFile = Path.Combine("D:\\New.doc");
        string sourceFile = Path.Combine("D:\\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            //File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }

Update( using AltChunks):

using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\\Test.docx", true))
        {
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2) ;
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("D:\\Test1.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            AltChunk altChunk = new AltChunk();
            altChunk.Id = altChunkId;
            mainPart.Document
                .Body
                .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
            mainPart.Document.Save();
        } 

Why this code overwrites the content of the last file when I use multiple files? Update 2:

 using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\\Test.docx", true))
        {

            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 3);
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("d:\\Test1.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
                mainPart.Document.Save();
            }
            using (FileStream fileStream = File.Open("d:\\Test2.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            }
            using (FileStream fileStream = File.Open("d:\\Test3.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            } 
        }

This code is appending the Test2 data twice, in place of Test1 data as well. Means I get:

Test
Test2
Test2

instead of :

Test
Test1
Test2
Dustidustie answered 21/8, 2013 at 7:55 Comment(3)
Like chirs pointed out, you are using same Id for all the AltChunk's. They must be unique.Latchstring
Ok, its done now, Thank you for keeping up patience with me.Dustidustie
I am happy to see that you finally solved your issue :) Yep it was related to Altchunkid. I have edited my answer since it was maybe not very clear.Ozonide
O
20

Using openXML SDK only, you can use AltChunk element to merge the multiple document into one.

This link the-easy-way-to-assemble-multiple-word-documents and this one How to Use altChunk for Document Assembly provide some samples.

EDIT 1

Based on your code that uses altchunk in the updated question (update#1), here is the VB.Net code I have tested and that works like a charm for me:

Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:\\Test.docx", True)
        Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
        Dim mainPart = myDoc.MainDocumentPart
        Dim chunk = mainPart.AddAlternativeFormatImportPart(
            DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
        Using fileStream As IO.FileStream = IO.File.Open("D:\\Test1.docx", IO.FileMode.Open)
            chunk.FeedData(fileStream)
        End Using
        Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
        altChunk.Id = altChunkId
        mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
        mainPart.Document.Save()
End Using

EDIT 2

The second issue (update#2)

This code is appending the Test2 data twice, in place of Test1 data as well.

is related to altchunkid.

For each document you want to merge in the main document, you need to:

  1. add an AlternativeFormatImportPart in the mainDocumentPart with an Id which must be unique. This element contains the inserted data
  2. add in the body an Altchunk element in which you set the id to reference the previous AlternativeFormatImportPart.

In your code, you are using the same Id for all the AltChunks. It's why you see many time the same text.

I am not sure the altchunkid will be unique with your code: string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);

If you don't need to set a specific value, I recommend you to not set explicitly the AltChunkId when you add the AlternativeFormatImportPart. Instead, you get the one generated by the SDK like this:

VB.Net

Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
Dim altchunkid As String = mainPart.GetIdOfPart(chunk)

C#

AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);
Ozonide answered 21/8, 2013 at 8:30 Comment(6)
That is not performing what I want to do also there is no exception coming. I am posting my updated code with Altchunks.Dustidustie
Do I need to do something in the docx file as well, like adding bookmark type other action?Dustidustie
@ItiTyagi No, in my test, I have just created two files with a simple text (Text1 and Text2). And after running this code, the file Test.docx contains the two paragraphs when I open it.Ozonide
You know what, I was having open office, so there it was not reflected, but when I opened the same in office, there it worked.Dustidustie
It is only overwriting the last one when I want to merge multipleDustidustie
@ItiTyagi It shouldn't. Post your complete updated code with multiple files. Also see second edit concerning altchunkId.Ozonide
L
15

There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here (update: moved to github).

The documentation and screen casts on how to use it are here.

Update: Code Sample

 var sources = new List<Source>();
 //Document Streams (File Streams) of the documents to be merged.
 foreach (var stream in documentstreams)
 {
        var tempms = new MemoryStream();
        stream.CopyTo(tempms);
        sources.Add(new Source(new WmlDocument(stream.Length.ToString(), tempms), true));
 }

  var mergedDoc = DocumentBuilder.BuildDocument(sources);
  mergedDoc.SaveAs(@"C:\TargetFilePath");

Types Source and WmlDocument are from Document Builder API.

You can even add the file paths directly if you choose to as:

sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged1.docx"));
sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged2.docx"));

Found this Nice Comparison between AltChunk and Document Builder approaches to merge documents - helpful to choose based on ones requirements.

You can also use DocX library to merge documents but I prefer Document Builder over this for merging documents.

Hope this helps.

Latchstring answered 21/8, 2013 at 8:17 Comment(5)
Is there a way in open xml through coding as this task is really eating me, and I can't use any other tool etc.Dustidustie
These libraries are opensource wrappers around OpenXml. Document Builder is using Open Xml sdk to do the merging and there are no hard dependencies. Merging documents is not a simple task, along with the content you have to migrate styles + other open xml parts without loosing the relationships! And this becomes a nightmare when you have pictures in the document. The source code of Document Builder Api will give you an idea of the same.Latchstring
I just need to append content, as a page, so that I can print in one go.Dustidustie
The most simplest way to do it IMHO is to use Document Builder. I have added code snippet. Please check the updated answer.Latchstring
The Nice Comparison between AltChunk and Document Builder has moved (thanks Eric!)Seclusive
M
4

The only thing missing in these answers is the for loop.

For those who just want to copy / paste it:

void MergeInNewFile(string resultFile, IList<string> filenames)
{
    using (WordprocessingDocument document = WordprocessingDocument.Create(resultFile, WordprocessingDocumentType.Document))
    {
        MainDocumentPart mainPart = document.AddMainDocumentPart();
        mainPart.Document = new Document(new Body());

        foreach (string filename in filenames)
        {
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);
            string altChunkId = mainPart.GetIdOfPart(chunk);

            using (FileStream fileStream = File.Open(filename, FileMode.Open))
            {
                chunk.FeedData(fileStream);
            }

            AltChunk altChunk = new AltChunk { Id = altChunkId };
            mainPart.Document.Body.AppendChild(altChunk);
        }

        mainPart.Document.Save();
    }
}

All credits go to Chris and yonexbat

Milium answered 5/3, 2020 at 7:24 Comment(0)
A
1

Easy to use in C#:

using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace WordMergeProject
{
    public class Program
    {
        private static void Main(string[] args)
        {
            byte[] word1 = File.ReadAllBytes(@"..\..\word1.docx");
            byte[] word2 = File.ReadAllBytes(@"..\..\word2.docx");

            byte[] result = Merge(word1, word2);

            File.WriteAllBytes(@"..\..\word3.docx", result);
        }

        private static byte[] Merge(byte[] dest, byte[] src)
        {
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();

            var memoryStreamDest = new MemoryStream();
            memoryStreamDest.Write(dest, 0, dest.Length);
            memoryStreamDest.Seek(0, SeekOrigin.Begin);
            var memoryStreamSrc = new MemoryStream(src);

            using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStreamDest, true))
            {
                MainDocumentPart mainPart = doc.MainDocumentPart;
                AlternativeFormatImportPart altPart =
                    mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
                altPart.FeedData(memoryStreamSrc);
                var altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                              OpenXmlElement lastElem = mainPart.Document.Body.Elements<AltChunk>().LastOrDefault();
            if(lastElem == null)
            {
                lastElem = mainPart.Document.Body.Elements<Paragraph>().Last();
            }


            //Page Brake einfügen
            Paragraph pageBreakP = new Paragraph();
            Run pageBreakR = new Run();
            Break pageBreakBr = new Break() { Type = BreakValues.Page };

            pageBreakP.Append(pageBreakR);
            pageBreakR.Append(pageBreakBr);                

            return memoryStreamDest.ToArray();
        }
    }
}
Altdorfer answered 9/4, 2014 at 14:47 Comment(4)
There is something missing from the code in this answer.Cattail
What are you doing with lastElem? It seems to be set but then not used.Cookstove
This method doesn't work. Only the first doc is added to the output.Unchristian
This code is incomplete for required functionality.Spotter
M
0

My solution :

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;

namespace TestFusionWord
{
    internal class Program
    {
        public static void MergeDocx(List<string> ListPathFilesToMerge, string DestinationPathFile, bool OverWriteDestination, bool WithBreakPage)
        {
            #region Control arguments

            List<string> ListError = new List<string>();
            if (ListPathFilesToMerge == null || ListPathFilesToMerge.Count == 0)
            {
                ListError.Add("Il n'y a aucun fichier à fusionner dans la liste passée en paramètre ListPathFilesToMerge");
            }
            else
            {
                foreach (var item in ListPathFilesToMerge.Where(x => Path.GetExtension(x.ToLower()) != ".docx"))
                {
                    ListError.Add(string.Format("Le fichier '{0}' indiqué dans la liste passée en paramètre ListPathFilesToMerge n'a pas l'extension .docx", item));
                }

                foreach (var item in ListPathFilesToMerge.Where(x => !File.Exists(x)))
                {
                    ListError.Add(string.Format("Le fichier '{0}' indiqué dans la liste passée en paramètre ListPathFilesToMerge n'existe pas", item));
                }
            }

            if (string.IsNullOrWhiteSpace(DestinationPathFile))
            {
                ListError.Add("Le fichier destination FinalPathFile passé en paramètre ne peut être vide");
            }
            else
            {
                if (Path.GetExtension(DestinationPathFile.ToLower()) != ".docx")
                {
                    ListError.Add(string.Format("Le fichier destination '{0}' indiqué dans le paramètre DestinationPathFile n'a pas l'extension .docx", DestinationPathFile));
                }

                if (File.Exists(DestinationPathFile) && !OverWriteDestination)
                {
                    ListError.Add(string.Format("Le fichier destination '{0}' existe déjà. Utilisez l'argument OverWriteDestination si vous souhaitez l'écraser", DestinationPathFile));
                }
            }

            if (ListError.Any())
            {
                string MessageError = "Des erreurs ont été rencontrés, détail : " + Environment.NewLine + ListError.Select(x => "- " + x).Aggregate((x, y) => x + Environment.NewLine + y);
                throw new ArgumentException(MessageError);
            }

            #endregion Control arguments

            #region Merge Files

            //Suppression du fichier destination (aucune erreur déclenchée si le fichier n'existe pas)
            File.Delete(DestinationPathFile);

            //Création du fichier destination à vide
            using (WordprocessingDocument document = WordprocessingDocument.Create(DestinationPathFile, WordprocessingDocumentType.Document))
            {
                MainDocumentPart mainPart = document.AddMainDocumentPart();
                mainPart.Document = new Document(new Body());
                document.MainDocumentPart.Document.Save();
            }

            //Fusion des documents
            using (WordprocessingDocument myDoc = WordprocessingDocument.Open(DestinationPathFile, true))
            {
                MainDocumentPart mainPart = myDoc.MainDocumentPart;
                Body body = mainPart.Document.Body;

                for (int i = 0; i < ListPathFilesToMerge.Count; i++)
                {
                    string currentpathfile = ListPathFilesToMerge[i];
                    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);
                    string altchunkid = mainPart.GetIdOfPart(chunk);

                    using (FileStream fileStream = File.Open(currentpathfile, FileMode.Open))
                        chunk.FeedData(fileStream);

                    AltChunk altChunk = new AltChunk();
                    altChunk.Id = altchunkid;

                    OpenXmlElement last = body.Elements().LastOrDefault(e => e is AltChunk || e is Paragraph);
                    body.InsertAfter(altChunk, last);

                    if (WithBreakPage && i < ListPathFilesToMerge.Count - 1) // If its not the last file, add breakpage
                    {
                        last = body.Elements().LastOrDefault(e => e is AltChunk || e is Paragraph);
                        last.InsertAfterSelf(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));
                    }
                }

                mainPart.Document.Save();
            }

            #endregion Merge Files
        }

        private static int Main(string[] args)
        {
            try
            {
                string DestinationPathFile = @"C:\temp\testfusion\docfinal.docx";

                List<string> ListPathFilesToMerge = new List<string>()
                                    {
                                        @"C:\temp\testfusion\fichier1.docx",
                                        @"C:\temp\testfusion\fichier2.docx",
                                        @"C:\temp\testfusion\fichier3.docx"
                                    };

                ListPathFilesToMerge.Sort(); //Sort for always have the same file

                MergeDocx(ListPathFilesToMerge, DestinationPathFile, true, true);

#if DEBUG
                Process.Start(DestinationPathFile); //open file
#endif
                return 0;
            }
            catch (Exception Ex)
            {
                Console.Error.WriteLine(Ex.Message);
                //Log exception here
                return -1;
            }
            

        }
    }
}
Malacostracan answered 1/7, 2022 at 8:48 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.