I want to be able to create a new Tika parser to extract metadata from a file. We're already using Tika and the metadata extraction will be done consistently.
I think that I've run into this problem/enhancement request for Tika:
Allow passing of files or memory buffers to parsers
I have a console c++ executable that accepts the path to a file on input and then outputs the metadata that it finds, each line consisting of name/value pairs.
The c++ code relies on libraries that expect a file path when accessing the data.
It's not going to be possible to rewrite this executable in Java.
I thought that it would be fairly easy to plug this into Tika. But the Tika parser needs to be in Java and the Tika parser method that needs to be overridden takes an open input stream:
void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)
So I guess that my only solution will be to take the input stream and write it to a temporary file and then to process the file that gets written and to then finally clean up the file. I hate messing with a temporary file and then potentially having to worry about cleanup of temp files should something go wrong and it doesn't get deleted.
Does anyone have a clever idea about how to cleanly deal with something like this?