Word OpenXml Word Found Unreadable Content
Asked Answered
W

1

0

We are trying to manipulate a word document to remove a paragraph based on certain conditions. But the word file produced always ends up being corrupted when we try to open it with the error:

Word found unreadable content

The below code corrupts the file but if we remove the line:

Document document = mdp.Document;

The the file is saved and opens without issue. Is there an obvious issue that I am missing?

 var readAllBytes = File.ReadAllBytes(@"C:\Original.docx");


    using (var stream = new MemoryStream(readAllBytes))
    {
    using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
    {
        MainDocumentPart mdp = wpd.MainDocumentPart;
        Document document = mdp.Document;

    }
}

File.WriteAllBytes(@"C:\New.docx", readAllBytes);

UPDATE:

using (WordprocessingDocument wpd = WordprocessingDocument.Open(@"C:\Original.docx", true))
            {
                MainDocumentPart mdp = wpd.MainDocumentPart;
                Document document = mdp.Document;

                document.Save();
            }

Running the code above on a physical file we can still open Original.docx without the error so it seems limited to modifying a stream.

Westward answered 18/6, 2019 at 12:51 Comment(3)
Does Original.docx open correctly in Word? How about if you use the code in the question, but opening the file rather than a stream?Rachealrachel
Hello, yes Original.docx opens without any error, the problem is we are receiving a stream of the file which we have to remove paragraphs starting with a certain text and then return the modified stream so we have no physical file to open.Westward
Sorry, you didn't "ping" me, so I only just saw the comment... According to this article, you're missing as step converting the byte array to a memory stream the Open XML SDK can work with: learn.microsoft.com/en-us/previous-versions/office/developer/… Keep in mind that a docx is a ZIP package, not a flat file. There's also this: learn.microsoft.com/en-us/office/open-xml/…Rachealrachel
E
1

Here's a method that reads a document into a MemoryStream:

public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
    byte[] buffer = File.ReadAllBytes(path);
    var destStream = new MemoryStream(buffer.Length);
    destStream.Write(buffer, 0, buffer.Length);
    destStream.Seek(0, SeekOrigin.Begin);
    return destStream;
}

Note how the MemoryStream is instantiated. I am passing the capacity rather than the buffer (as in your own code). Why is that?

When using MemoryStream() or MemoryStream(int), you are creating a resizable MemoryStream instance, which you will want in case you make changes to your document. When using MemoryStream(byte[]) (as in your code), the MemoryStream instance is not resizable, which will be problematic unless you don't make any changes to your document or your changes will only ever make it shrink in size.

Now, to read a Word document into a MemoryStream, manipulate that Word document in memory, and end up with a consistent MemoryStream, you will have to do the following:

// Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(@"C:\Original.docx");

// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
    MainDocumentPart mdp = wpd.MainDocumentPart;
    Document document = mdp.Document;
    // Manipulate document ...
}

// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(@"C:\New.docx", stream.GetBuffer());

Note that you will have to reset the stream.Position property by calling stream.Seek(0, SeekOrigin.Begin) whenever your next action depends on that MemoryStream.Position property (e.g., CopyTo, CopyToAsync). Right after having left the using statement, the stream's position will be equal to its length.

Edelsten answered 28/11, 2019 at 18:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.