Reading RTF File Containing an OLE Embedded Object
Asked Answered
P

2

0

Question :

I need to read an RTF File that contains an OLE Object as innerdocument.

RTF File = [ Ole object (word document) is embedded into it.]

Sample RTF File that contains word as OLE Embedded into it.

Reference I have done :

  1. OLE as Image in RTF

Here they have done a program to extract the image embedded as OLE in RTF.

I had extracted the program which is marked as correct answer , but its does not work for me.

  1. Using OpenXML SDK. (it cannot be able to open RTF Files.)

  2. some other SDK like GemBox etc.. Which cannot be able to open innerdocument ie. ole in RTF)

Work I have done :

I had done using microsoft.office.interop.word.dll which gives an accurate answer , but it will not work on server.

For eg: it opens an RTF File using MS WORD and which is installed in client machine where there is no WORD APPLICATION Installed in server.

so , this is not suitable for me.

I need to open and read the RTF OLE Content and i need to store in a string(say for eg). bcoz with string i can do lot of things.

Can anyone has an idea to solve my issue.?

Plumbago answered 22/10, 2018 at 10:35 Comment(5)
Your .rtf file doesn't contain an OLE package object (like in my previous answer), but a Word.Document.12 object (.RTF is a text format underneath). Just remove the test for "package" in the sample code so you'll get data from GetNextTextAsByteArray as a byte[]. From this data, in the Open Xml (.zip format) case, just look for the first 'PK' string (or 0x50 0x4B in byte hex) and this will be the start of the .docx or other document.Lianaliane
My input is an OLE File , ole is unable to read and opened using any application , so i append a RTF Header to the OLE File and made it as an RTF File. Now its able to open using MS_WORD. The given File is edited by me . File is same like ur case and instead of image ,i have a word document attached to it. @SimonMourierPlumbago
OLE is a wide term. Your input is not an ole package, it's an ole word.document.12 which is different than in my answer.Lianaliane
okay. whats the possible way to do this ? Any package that helps to read it ? @SimonMourierPlumbago
I told you how to extract the .docx fileLianaliane
E
2

Please use the following code example to extract the OLE object (Word document) from RTF and import it into Aspose.Words’ DOM to read its content. Hope this helps you.

Document doc = new Document(MyDir + "SAMPLE.rtf");

Shape shape = (Shape)doc.GetChild(NodeType.Shape, 0, true);
if (shape.OleFormat != null)
{
    //Save the document to disk.
    shape.OleFormat.Save(MyDir + "output" + shape.OleFormat.SuggestedExtension);

    if (shape.OleFormat.SuggestedExtension == ".docx")
    {
        //Import the .docx ole object into Aspose.Words' DOM
        Document ole = new Document(MyDir + "output" + shape.OleFormat.SuggestedExtension);
        Console.WriteLine(ole.ToString(SaveFormat.Text));
    }

}

I work with Aspose as Developer Evangelist.

Ephah answered 24/10, 2018 at 13:15 Comment(1)
ya i found this way yesterday and it works for me , thanks for ur answerPlumbago
J
0

Thanks for the above answer. Here is another version of the code which iterates and saves all the OLE's with the original file name in a local path.

string MyDir = @"E:\temp\";
            Document doc = new Document(MyDir + "Requirement#4.rtf");

            NodeCollection nodeColl = doc.GetChildNodes(NodeType.Shape, true);
            foreach (var node in nodeColl)
            {
                Shape shape1 = (Shape)node;
                if (shape1.OleFormat != null)
                {
                    shape1.OleFormat.Save(MyDir + shape1.OleFormat.SuggestedFileName + shape1.OleFormat.SuggestedExtension);
                }
            }
Jeannettajeannette answered 29/3, 2019 at 5:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.