How to use Apache Tika on .Net Core?
Asked Answered
A

2

6

I need to use .Net Core and create a console app that uses .NET bindings for Apache Tika. Do you guys have any idea on how to proceed?

I found a wrapper called 'TikaOnDotNet' but it only seems to work with .Net Framework but not .Net Core. Is there a way to make this work? Thank you for your response in advance.

Ajit answered 28/2, 2017 at 21:42 Comment(0)
G
4

Unfortunately the .NET Core framework doesn't have 100% coverage of other .NET Framework types, so it's not compatible on its own. It would have to be re-written to some extent to work. Fortunately it's open source :)

Garlicky answered 28/2, 2017 at 21:51 Comment(1)
I'm thinking of wrapping the Tika jar file with command line commands in .net core as you can run things like: java -jar tika-app.jar -text mydocument.pdf and just record the output. Could be a quick fix.Accrescent
R
3

You can use IKVM.Maven.SDK https://github.com/ikvmnet/ikvm-maven. I had this issue at work recently. This works for .NET 8 when working with a PDF file.

<ItemGroup>
  <PackageReference Include="IKVM.Maven.Sdk" Version="1.6.9" />
</ItemGroup>

    <ItemGroup>
<MavenReference Include="org.apache.tika:tika-core" Version="2.9.2"></MavenReference>
<MavenReference Include="org.apache.tika:tika-parsers-standard-package" Version="2.9.0"></MavenReference>
    </ItemGroup>
using org.apache.tika.parser;
using ikvm.io;
using org.apache.tika.sax;
using org.apache.tika.metadata;

using FileStream fs = new FileStream("some-file-name-here.pdf", FileMode.Open);
using InputStreamWrapper stream = new InputStreamWrapper(fs);
BodyContentHandler handler = new BodyContentHandler();
Parser parser = new org.apache.tika.parser.pdf.PDFParser();

Metadata metdata = new Metadata();
ParseContext parseContext = new ParseContext();
parser.parse(stream, handler, metdata, parseContext);
Console.WriteLine(handler.toString());
Racklin answered 26/4 at 0:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.