Embed contents of a RTF file into a DOCX file using OpenXML SDK
Asked Answered
C

2

11

In our old MSWord-97 based system we use COM to interact with a .doc file, and embed an OLE object, so the embedded document is visible in the parent (not as an icon).

We're replacing this with a system using OpenXML SDK since it requires having Word on our server, which generates .docx files. however we still need to embed the contents of RTF files into the generated DOCX... specifically we replace a bookmark with the contents of the file.

I found a few examples online but they all differ. When I create a simple example in Word and view the XML, there's a lot of stuff to position/display the embedded object's visual representation, while the embedding itself doesn't seem too horrific. What's the easiest way to do this?

Clitoris answered 28/7, 2010 at 15:12 Comment(2)
Well I paused on this task but have re-opened it 3.5 years later. I started writing a question on SO and it reminded me this on already existed!Clitoris
Possibly related, maybe it can help someone: social.msdn.microsoft.com/Forums/office/en-US/…Clitoris
P
14

You could embed the content of a RTF document into a OpenXML DOCX file by using the AltChunk anchor for external content. The AltChunk (w:altChunk) element specifies a location in your OpenXML WordprocessingML document to insert external content such as a RTF document. The code below uses the AltChunk class in conjunction with the AlternativeFormatImportPart class to embed the content of a RTF document into a DOCX file after the last paragraph:

using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(@"your_docx_file.docx", true))
{
  string altChunkId = "AltChunkId5";

  MainDocumentPart mainDocPart = wordDocument.MainDocumentPart;
  AlternativeFormatImportPart chunk = mainDocPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.Rtf, altChunkId);      

  // Read RTF document content.
  string rtfDocumentContent = File.ReadAllText("your_rtf_document.rtf", Encoding.ASCII);

  using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(rtfDocumentContent)))
  {
    chunk.FeedData(ms);
  }

  AltChunk altChunk = new AltChunk();
  altChunk.Id = altChunkId;

  // Embed AltChunk after the last paragraph.
  mainDocPart.Document.Body.InsertAfter(
    altChunk, mainDocPart.Document.Body.Elements<Paragraph>().Last());

  mainDocPart.Document.Save();
}

If you want to embed an Unicode RTF string into a DOCX file then you have to escape the Unicode characters. For an example please refer to the following stackoverflow answer.

When you encounter the error "the file is corrupt" then ensure that you Dispose() or Close() the WordprocessingDocument. If you do not Close() the document then the releationship for the w:altchunk is not stored in the Document.xml.rels file.

Pyrosis answered 1/12, 2013 at 13:22 Comment(11)
I've used AltChunk before to insert HTML into docx files and it worked like a charm. Definitely the way to goVocational
Hmm, this seemed to be going so well but I just get "the file is corrupt" when trying to open it in Word 2010 after saving changes. I'm virtually using this example exactly now. What should I be checking, where should I be looking?Clitoris
@John: Could you provide a sample document (which is corrupt)? So I will look at it. I use the OpenXML productivity toolkit to check such documents. Is the RTF document you insert complete (valid)? Is the current position of the memory stream zero? Please note, that the FeedData method does not seek to the beginning of the stream.Pyrosis
@Pyrosis by unzipping the .docx before/after files and doing a diff, I see that the output dir has a (valid) RTF file, and document.xml has a new element <w:altChunk r:id="AltChunkId5" /> after the last paragraph but nothing else new. Seems something is missing?Clitoris
@John: Could you provide a sample document or the exact code you are using to include the w:altChunk? Please not that the r:id of the w:altChunk element must be unique.Pyrosis
@Pyrosis I'll see if I can find somewhere to host it - but those changes are the exact differences made to the doc... an empty w:altchunk and a new .rtf file. The id is unique as there are no other chunks - but there seems to be no mapping between the rtf file and the altchunk id. Do you know where that should be, what you'd expect to see that's different?Clitoris
@John: Do you save the document? Do you Dispose (or close) the WordprocessingDocument?Pyrosis
@John: You should find a releationship in the document.xml.rels file with the ID of your w:altchunk.Pyrosis
Thanks - I thought I was properly saving the document but calling Close() has fixed the problems and it all works great now. One point - I found using FileStream better as it works both for RTF and DOCX files.Clitoris
@John: Thank you for the feedback. I've extended my answer to reflect our discussion in the comments section. Please accept my answer if it helped you.Pyrosis
@Pyrosis I didn't realise I hadn't! I awarded the bounty without accepting, done now and thanks again.Clitoris
E
0

This fella seemed to have figured it out with his own question and answer at How can I embed any file type into Microsoft Word using OpenXml 2.0

Eyelet answered 30/7, 2010 at 7:14 Comment(2)
His solution still requires you to have Word installed, which is a terrible idea for server-side document generation and the entire reason we're creating the new tool in the first place. Apart from anything else, on some server configurations you can't run up Word through COM.Clitoris
Ugh, you're right, right there at the bottom. Seems kind of pointless to offer all that in WordprocessingML, just to ruin it with interop.Eyelet

© 2022 - 2024 — McMap. All rights reserved.