Tess4J - Native library (linux-x86-64/libtesseract.so) not found in resource path
Asked Answered
I

5

5

I'm using Tess4J (JNA wrapper around tesseract), and trying to call tess.doOCR(myFile) to OCR text from a single-page PDF.

I have GhostScript installed (by using yum install ghostscript), gs -h works correctly.

My app server is using 64-bit JVM, and I have gsdll64.dll, and the 64-bit tesseract dll's liblept168.dll and libtesseract302.dll in the class path.

When tess.doOCR(myFile) is called, this is logged:

GPL Ghostscript 8.70 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

But then it just stops there. The program doesn't go any further.

UPDATE --

It looks like the real issue is from this error:

java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource path

After looking around a lot, I don't see a convenient place to find this libtesseract.so file, and I'm not sure what it takes to get this onto my Linux app server. I read that maybe I need to download some C++ runtime, but I don't see a Linux download for that. Any advice would be much appreciated.

Or is this something to do with a symbolic link?

Inflated answered 26/10, 2014 at 20:35 Comment(0)
S
5

The Fix was simple for me,just do sudo apt-get install tesseract-ocr from the command line. For linux you dont need to worry about the DDL librarires or the jvm version. Installing tessearct from apt-get will do the trick.

Scad answered 14/5, 2015 at 10:23 Comment(1)
Yeah looking back, the issue (I think) was I was using yum package manager (on some kind of RedHat or something), and tesseract-ocr was not a convenient download. Recalling, it was a nightmare to get it to work without having it available through package management. I definitely think switching to Ubuntu or something debian (with apt-get) makes life a lot easier to get tesseract working...Inflated
H
2

Tess4J should include required libraries. However, you need to extract them first.

This should do the trick:

File tmpFolder = LoadLibs.extractTessResources("win32-x86-64"); // replace platform
System.setProperty("java.library.path", tmpFolder.getPath());

You should replace the argument of extractTessResources(..) with your platform. You can find possible options by looking into the Tess4J jar file.

This way you need not to install Tesseract on your system.

Recently I wrote a blog post about Tess4J in which I used this technique. Maybe it can help if you need further information or a running example project.

Heder answered 26/8, 2020 at 15:26 Comment(0)
T
1

Those DLLs are for Windows. For Linux, you'll need to install or build from Tesseract source.

That GS version, 8.70, is quite old. The latest Ghost4J library that Tess4J uses is not compatible with that.

Theurer answered 26/10, 2014 at 23:2 Comment(6)
is it possible to specify a different version when executing yum install ghostscript? otherwise, what is the simplest way to install GhostScript on Linux without yum install? p.s. thank you for so actively helping those trying to work with Tess4J here on SO and other placesInflated
Looks like you have to build it from the source, if the latest is not available from the repository.Theurer
I switched from a Red Hat distro to Ubuntu and it made the process so much easier to install tesseract and ghostscript. apt-get install tesseract got tesseract 3.03 setup and working, and apt-get install ghostscript got ghostscript 9.10 working fine. Dumb question: if tesseract is installed and working on its own, and ghostscript, do I only need the JAR's from Tess4J? (and not the traineddata, tessdata folder, DLL's, other stuff)Inflated
Yes, you do. Make sure to use a compatible version with your Tesseract version.Theurer
from my experience on Ubuntu 14.04 LTS, all I needed to do was apt-get install tesseract-ocr and ghostscript. Then, I pointed TESSDATA_PREFIX env variable to the proper directory apt-get installed tesseract to (but I still needed to setDataPath on my Tess4J instance, even though the env var existed...). Then I included the JAR's that came with Tess4J's download (tess4j, ghostscript, log4j, imageio) on the class path... and that's all it took to get working. So it seems apt-get install tesseract-ocr got me the proper DLL's, and eng.traineddata...Inflated
Proper .so, not .dll.Theurer
P
0
sudo apt-get update
sudo apt-get install tesseract-ocr 

download test data by git

https://github.com/tesseract-ocr/tessdata
Piracy answered 12/4, 2020 at 21:41 Comment(2)
It is not clear how your answer addresses the question. Why will downloading test data correct resource not found in path?Cease
It installs tesseract that contains the library in question and adds it to the library path afaik.Unguiculate
D
0

If you run your service on Docker Compose, you should use apt-get install tesseract-ocr with Ubuntu in the dockerfile of the service; afterwards, use COPY libs from Ubuntu to your server.

Danit answered 22/4 at 7:43 Comment(1)
When you post answers with links to resources with which you have an affiliation, you are expected to disclose that affiliation in the post. Please edit your answer to include your affiliation. Otherwise your post may be deleted. Please see How to not be a spammer.Titled

© 2022 - 2024 — McMap. All rights reserved.