I'm trying to write a C# utility that mimics the behavior of filtdump.exe
from the Windows Search SDK (since filtdump
doesn't appear to be redistributable itself.) I'm running into a combination of contradictory and/or non-existent documentation and technical problems I can't seem to track down. I'm hoping someone can help eliminate one or the other of those hurdles...
According to MSDN, filtdump
uses ILoadFilter::LoadIFilter
to load it's IFilter. I contend that MSDN is lying, since it also claims ILoadFilter::LoadIFilter
only exists on Windows 7, but filtdump
works fine on earler OS's. Process Monitor indicates that it's actually calling LoadIFilter()
from query.dll
, so that's what I'm doing:
public static class NativeMethods
{
// From Windows SDK v7.1, NTQuery.h
[DllImport("query.dll", CharSet = CharSet.Unicode)]
public static extern int LoadIFilter(
string pwcsPath,
[MarshalAs(UnmanagedType.IUnknown)]
ref object pUnkOuter,
ref IFilter ppIUnk);
}
object iUnknown = null;
IFilter filter = null;
var result = NativeMethods.LoadIFilter(args[0], ref iUnknown, ref filter);
if (result != ResultCodes.S_OK)
{
Console.WriteLine("Failed to load an IFilter for {0}: {1}", args[0], result);
return;
}
For the most part, this application and filtdump
give me the same results -- they can both open and extract text from text, Word document, and Outlook emails, and both fail on the same set of other documents that have no IFilter. However, PDFs are giving me a problem. Filtdump
manages to open and extract the text from most of the PDFs I've thrown at it, but every single one of the PDFs I try with my own application gives me an HRESULT of 0x80004005, E_FAIL.
This is the same error from this question but I'm getting it on every PDF, and filtdump
is not, so I know that the IFilter is working on at least some documents. Has anyone done this kind of thing before with PDFs that can see what I'm doing wrong?