How do I configure the pom.xml of Tika to stop getting all the license dependency warnings?
Asked Answered
B

4

8

I am getting all these warnings from Tika when I try to use it:

Feb 24, 2018 9:24:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.

Feb 24, 2018 9:24:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version.

I tried adding this (in Tika pom.xml):

            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcprov-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcmail-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>org.bouncycastle</groupId>
                <artifactId>bcpkix-jdk15on</artifactId>
                <version>1.57</version>
            </dependency>
            <dependency>
                <groupId>log4j</groupId>
                <artifactId>log4j</artifactId>
                <version>1.2.17</version>
            </dependency>

            <dependency>
                <groupId>com.levigo.jbig2</groupId>
                <artifactId>levigo-jbig2-imageio</artifactId>
                <version>2.0</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>com.github.jai-imageio</groupId>
                <artifactId>jai-imageio-core</artifactId>
                <version>1.3.1</version>
                <scope>test</scope>
            </dependency>    
            <dependency>
                <groupId>com.github.jai-imageio</groupId>
                <artifactId>jai-imageio-jpeg2000</artifactId>
                <version>1.3.0</version>
                <scope>test</scope>
            </dependency>

            <dependency>
                    <groupId>org.xerial</groupId>
                    <artifactId>sqlite-jdbc</artifactId>
                    <version>3.20.1</version>
            </dependency>

But I still get the same warnings.

How do I resolve this?

UPDATE 1

My dependencies were added here: https://github.com/apache/tika/blob/1.17/pom.xml#L164-L170

Also I did try without the set to test. It did not do anything.

The dependencies that I added seemed to be for PDFBox a Tika dependency.

Besmirch answered 25/2, 2018 at 4:16 Comment(4)
How did you get the idea to include the three dependencies as scope test? I'm asking to find out whether this is a documentation problem somewhere.Segmental
Actually the first time I try it I did remove the scope test.Besmirch
That is the pom to build tika... not to use it. Tika doesn't distribute these jar files because of "bad" license.Segmental
To check whether your plugins are in your class path, run this code: System.out.println(Arrays.toString(ImageIO.getReaderFileSuffixes()));Segmental
S
9

I added the following dependencies and I didn't have any other warning

    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-core</artifactId>
        <version>1.18</version>
    </dependency>
    <dependency>
        <groupId>org.apache.tika</groupId>
        <artifactId>tika-parsers</artifactId>
        <version>1.18</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jbig2-imageio</artifactId>
        <version>3.0.1</version>
    </dependency>
    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-jpeg2000</artifactId>
        <version>1.3.0</version>
    </dependency>
Sprit answered 17/8, 2018 at 5:44 Comment(1)
Note that jai-imageio-jpeg2000 has JJ2000 license: "only "JJ2000 Partners" have the right to use and modify".Instrument
T
2

For Clojure visitors: I fixed it with:

(System/setProperty "tika.config" "tika-config.xml")

in my config.clj file. The xml is just:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
   <service-loader initializableProblemHandler="ignore"/>
</properties>

this xml file is in the "resources" dir and that dir must be in your path.

Thirst answered 18/8, 2019 at 17:51 Comment(2)
Works also for java client, tika version 1.25, and load this config like this: new TikaConfig(getClass().getResourceAsStream("/tika-config.xml"));Propaganda
It does not fix the problem right? But "just" hide it so no warning anymore?John
G
1

Its hard to see exactly what is happening because you did not include your entire <dependencies>...</dependencies> section of your pom.xml, but I suspect it is due to optional maven dependencies. According to maven docs, you need to declare optional dependencies in your pom or they will not be loaded.

Additionally, all of your imageio dependencies are all have <scope>test</scope> making them only usable during unit testing.

Gibert answered 25/2, 2018 at 5:6 Comment(4)
I tried scope without test. It did not do anything. The rest of <dependencies> is the same has the default of Tika.Besmirch
Here is the default pom.xml I am using: github.com/apache/tika/blob/1.17/pom.xmlBesmirch
How are you using Tika? It appears that you can use Tika in a lot of different ways: tika.apache.org/1.17/gettingstarted.htmlGibert
I am using the pom.xml of tika itself. I am just installing it (mvn install).Besmirch
F
0

this is now documented in the error log:

Feb 19, 2019 3:18:44 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.

However I'd prefer to have a version of Tika (e.g., with a classifier) which does not include OCR/image processing when I only want to parse text, or have an option to turn off the error logging (and only log an error when actually trying to load an unsupported format).

Fitted answered 19/2, 2019 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.