Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null
Asked Answered
D

3

11

Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs great but when running the full system which is a restFul API run by tomcat that receives the image and runs Tesseract it gives me the following error:

23:22:36.511 [http-nio-9999-exec-3] ERROR net.sourceforge.tess4j.Tesseract - null java.lang.NullPointerException: null at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:107) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48) at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:343) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197) at ocr.OcrUtil.getString(OcrUtil.java:54) at com.tapd.server.api.handlers.IRSHandler.uploadIRSImage(IRSHandler.java:65) at com.tapd.server.api.WebAPIService.updateParentIrsForm(WebAPIService.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:309) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:292) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1139) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:460) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Unknown Source) [2016-09-14 23:22:36,512] [ERROR] java.lang.NullPointerException

My guess is that the tessdata folder is not located in the right place and when packaged into a Jar and run by tomcat it is misplaced, but I couldn't figure out where it should be located and I have double checked to see that all Jars are deployed correctly.

Edit: so it appears that Tesseract can't handle the path when it is on a remote server such as AWS S3, so the question is why? and how can I allow it to use a path from S3? (yes the file is public)

Duster answered 15/9, 2016 at 6:20 Comment(5)
Which version of Tesseract?Euxenite
I use tess4j version 3.2.1Duster
Can you show Minimal, Complete, and Verifiable example?Cloakanddagger
Which OS? If not Windows, do you have GhostScript installed?Cemetery
Currently running on windows in production it will be LinuxDuster
D
2

As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.

Duster answered 22/9, 2016 at 6:28 Comment(1)
@Piotr R I don't have the stack trace of GhostscriptException nor can I debug it as it is a external library, the way I access s3 is not relevant as It can't access files that are not stored locally, that is exactly the answer I was looking for & the solution. That been said I really appreciate your help and support, don't worry when I mark myself as the correct answer I don't get the bounty.Duster
E
5

My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:

https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

106        } catch (GhostscriptException e) {
107            logger.error(e.getCause().toString(), e);
108        } finally {

In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.

(from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)

To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.

Euxenite answered 19/9, 2016 at 7:4 Comment(7)
I suggest replacing e.getCause().toString() by String.valueOf(e.getCause()) in OP's code to be safe in this case.Brunet
I managed to get as far as understanding that GhostscriptException is null but the real question is why? how can I resolve it? and why when I run it locally(junit) it doesn't happen?Duster
"understanding that GhostscriptException is null" - this is not correct. The GhostscriptException is not null, the GhostscriptException is a valid instance of the Exception. Only the ghostscriptException.getCause() is null. To address this problem start your app in debug mode and check what is the exception message - there should be more details.Euxenite
To fully resolve this issue you would have to raise a bug against tess4j: github.com/nguyenq/tess4j/issues (and maybe send a pull request). If you need some extra guidance on how to solve this temporarily you can call me on chatEuxenite
@Brunet - the problem here is that it's a third party lib (PR is required to fix this bug)Euxenite
@Duster - I've submitted an issue against the tess4j library github.com/nguyenq/tess4j/issues/41Euxenite
@Piotr R - Thank you very much!Duster
D
2

As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.

Duster answered 22/9, 2016 at 6:28 Comment(1)
@Piotr R I don't have the stack trace of GhostscriptException nor can I debug it as it is a external library, the way I access s3 is not relevant as It can't access files that are not stored locally, that is exactly the answer I was looking for & the solution. That been said I really appreciate your help and support, don't worry when I mark myself as the correct answer I don't get the bounty.Duster
R
0

Resources I used: Windows 10 (tried on Windows Server 2016 as well), JAVA, MAVEN

Status: Working good on my local as well as VM

  1. Download Tess4J-3.4.8 from here http://tess4j.sourceforge.net/ and set your ENV variable path under Advance System Setting
  2. Get repo from MAVEN -
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.7.0</version>
</dependency>
  1. Get libtesseract302.dll and copy to "C:\Windows\System32" folder from here http://api.256file.com/libtesseract302.dll/en-download-56466.html do not forget to set your ENV variable path under Advance System Setting

  2. Download and install Visual C++ 2015 Redistributable or VC++ 2017 Redistributable (I installed both ) from here https://programmer.help/blogs/net.sourceforge.tess4j.tesseractexception-java.lang.nullpointerexception.html

then do restart your PC

  1. on Safer side can have some Jar files if you dont have already in local - Please see image

    do not forget to set your ENV variable path for JARs under Advance System Setting

enter image description here

Rhineland answered 12/4, 2020 at 19:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.