LoadIFilter() fails on all PDFs (but MS's filtdump.exe doesn't.)

Asked 24/8, 2011 at 15:18 Answered 3/2, 2014 at 11:55

I'm trying to write a C# utility that mimics the behavior of filtdump.exe from the Windows Search SDK (since filtdump doesn't appear to be redistributable itself.) I'm running into a combination of contradictory and/or non-existent documentation and technical problems I can't seem to track down. I'm hoping someone can help eliminate one or the other of those hurdles...

According to MSDN, filtdump uses ILoadFilter::LoadIFilter to load it's IFilter. I contend that MSDN is lying, since it also claims ILoadFilter::LoadIFilter only exists on Windows 7, but filtdump works fine on earler OS's. Process Monitor indicates that it's actually calling LoadIFilter() from query.dll, so that's what I'm doing:

public static class NativeMethods
{
    // From Windows SDK v7.1, NTQuery.h
    [DllImport("query.dll", CharSet = CharSet.Unicode)]
    public static extern int LoadIFilter(
        string pwcsPath,
        [MarshalAs(UnmanagedType.IUnknown)] 
        ref object pUnkOuter,
        ref IFilter ppIUnk);
}

object iUnknown = null;
IFilter filter = null;
var result = NativeMethods.LoadIFilter(args[0], ref iUnknown, ref filter);
if (result != ResultCodes.S_OK)
{
  Console.WriteLine("Failed to load an IFilter for {0}: {1}", args[0], result);
  return;
}

For the most part, this application and filtdump give me the same results -- they can both open and extract text from text, Word document, and Outlook emails, and both fail on the same set of other documents that have no IFilter. However, PDFs are giving me a problem. Filtdump manages to open and extract the text from most of the PDFs I've thrown at it, but every single one of the PDFs I try with my own application gives me an HRESULT of 0x80004005, E_FAIL.

This is the same error from this question but I'm getting it on every PDF, and filtdump is not, so I know that the IFilter is working on at least some documents. Has anyone done this kind of thing before with PDFs that can see what I'm doing wrong?

Ziagos answered 24/8, 2011 at 15:18 Comment(1)

I have found that I cannot pull text from a PDF unless I have Acrobat Reader installed. The IFilter GUID is e8978da6-047f-4e3d-9c78-cdbe46041603 and the actual iFilter file is C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRdIF.dll. Hopefully you'll find something useful in that. – Clientage 26/1, 2012 at 20:5

You may want to see this blog post. In short, v10 of Adobe's PDF filter uses a whitelist of applications allowed to use the filter, including Microsoft's diagnostic tools like filtdump.exe, supposedly as a “security measure”.

Banister answered 3/2, 2014 at 11:55 Comment(1)

This works. i can't believe this. I renamed by tester tool to filtdump.exe and it started working. – Thilda 28/2, 2014 at 21:2

Load IFilter fails because Adove PDF Filter is marked as STA and our c sharp application are by default MTA so that is why it can not load PDF Filter. Try to make your application STA then load PDF Filter.

Ajax

Dipteral answered 22/2, 2012 at 4:49 Comment(0)

I also expect filtdump is using the old Win32 LoadIFilter call which was available from Windows 2000.

I've seen the same problem as you solved by running the calling process in a job. https://mcmap.net/q/1671208/-activator-createinstance-lt-guid-gt-works-inside-vside-but-not-externally.

I also got a similar problem with Reader 10.1.5 installed although the Win32 LoadIFilter() returned E_NOTIMPL not E_FAIL.

Seems like Adobe broke the standard Win32 LoadIFilter() call by removing the ability to load the content into the IFilter via the IStorage interface's Load method but the object still returns that interface as available via QI.

For that problem on Windows 7 and later you can create the FilterRegistration object which implements the ILoadFilter and then call ILoadFilter::LoadIFilter() to create the filter COM object. Then get the IPersistStream and call Load() on that with an IStream containing the file content.

For older versions you need to search for the Filter CLSID in the registry first or statically set the Adobe CLSID as a config value if you want to make it constant.

Joselyn answered 26/3, 2013 at 10:8 Comment(0)

Recommended topics

Hot tags