Reading .Doc File using DocumentFormat.OpenXml dll
Asked Answered
B

4

13

When I am trying to read .doc file using DocumentFormat.OpenXml dll its giving error as "File contains corrupted data."

This dll is reading .docx file properly.

Can DocumentFormat.OpenXml dll help in reading .doc file?

string path = @"D:\Data\Test.doc";
string searchKeyWord = @"java";

private bool SearchWordIsMatched(string path, string searchKeyWord)
{
    try
    {
       using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
       {
           var text = wordDoc.MainDocumentPart.Document.InnerText;
           if (text.Contains(searchKeyWord))
               return true;
           else
               return false;
       }
     }
     catch (Exception ex)
     {
         throw ex;
     }
}
Branching answered 2/4, 2012 at 10:47 Comment(0)
O
18

The old .doc files have a completely different format from the new .docx files. So, no, you can't use the OpenXml library to read .doc files.

To do that, you would either need to manually convert the files first, or you would need to use Office interop, instead of the Open XML SDK you're using now.

Optime answered 2/4, 2012 at 10:57 Comment(3)
Yes i am waiting for some more answers.Branching
A year passed. Still waiting. :/Uranology
@ShardaprasadSoni, it would be a bad idea to leave this good answer as it is. Please mark it as answer if it is the correct one.Transcription
L
6

I'm afraid there won't be any better answer than the ones already given. The Microsoft Word DOC format is binary whereas OpenXML formats such as DOCX are zipped XML files. The OpenXml framework is for working with the latter only.

As suggested, the only other option you have is to use Word interop or third party library to convert DOC -> DOCX which you can then work with the OpenXml library.

Loathe answered 4/7, 2012 at 10:19 Comment(0)
G
3

.doc (If created with an older version of Microsoft Word) does not have the same structure as a .docx (Which is basically a zip file with some XML documents).

If your .doc is 'unzippable' (Just rename the .doc extension to .zip) to probe, you'll have to manually convert the .doc to a .docx.

Georgettageorgette answered 2/4, 2012 at 10:52 Comment(0)
A
0

You can use IFilterTextReader.

TextReader reader = new FilterReader(path);
using (reader)
{
    txt = reader.ReadToEnd();
}

You can take a look at http://www.codeproject.com/Articles/13391/Using-IFilter-in-C

Airworthy answered 22/10, 2015 at 22:33 Comment(2)
This looks promising. Can you provide a link to the project as well? And perhaps an explanation as to why this works?Spanjian
Sorry, my english is not that good... but you could take a look at this: codeproject.com/Articles/13391/Using-IFilter-in-CSanferd

© 2022 - 2024 — McMap. All rights reserved.