Extract OLE Object (pdf) from Access DB
Asked Answered
B

2

6

We are upgrading/converting several old Access databases to MS-SQL. Many of these databases have OLE Object fields that store PDF files. I'm looking for a way to extract these files and store them in our SQL database. I've seen similar questions that answer how you might do this with image files (jpg, bmp, gif, etc) but I haven't found a way that works with PDF.

Berlyn answered 22/6, 2009 at 20:54 Comment(0)
B
5

I finally got some code working for what I want it to do. The trick is determining what part is the OLE Header and removing it. Here is what is working for me (based on code found here)

    public static byte[] StripOleHeader(byte[] fileData)
    {
        const string START_BLOCK = "%PDF-1.3";
        int startPos = -1;

        Encoding u8 = Encoding.UTF7;
        string strEncoding = u8.GetString(fileData);

        if (strEncoding.IndexOf(START_BLOCK) != -1)
        {
            startPos = strEncoding.IndexOf(START_BLOCK);
        }

        if (startPos == -1)
        {
            throw new Exception("Could not find PDF Header");
        }

        byte[] retByte = new byte[fileData.LongLength - startPos];

        Array.Copy(fileData, startPos, retByte, 0, fileData.LongLength - startPos);

        return retByte;
    }

Note that this only works for PDF files.

Berlyn answered 23/6, 2009 at 18:34 Comment(1)
Since this is an older answer, this code worked for me, I only had to update the "%PDF-1.3" to "%PDF-1.7" and it was able to correctly strip out the header. Oddly enough, it worked as well just searching for "%PDF"...Forfar
C
1

OLEtoDisk

"This version saves the entire contents of a table containing OLE Objects to disk. Does NOT require the original application that served as the OLE server to insert the object. Supports all MS Office documents, PDF, All images inserted by MS Photo Editor, MS Paint, and Paint Shop Pro. Also supports extraction of PACKAGE class including original Filename. Contains function to produce a full Inventory of the OLE field including LINKED path and Filenames. Uses Structured Storage API's to read the actual contents of the field"

http://lebans.com/oletodisk.htm

Cytogenetics answered 23/6, 2009 at 1:10 Comment(1)
I've seen (and tried) that. It works to pull out the PDFs but I am trying to find something that I can integrate into my own (c#) code. Some of these Access DBs have 4+ columns that store PDF files and ultimately, I just want to copy the file into a table on our SQL server with all of the other data.Berlyn

© 2022 - 2024 — McMap. All rights reserved.