Error while installing tesseract-ocr
Asked Answered
F

4

6

I want to use pytesseract for ocr. So installed it. But before that i needed to install tesseract-ocr. I am using windows 8.1. I opened the command line and ran the command pip install tesseract-ocr. The following lines are the results of that command.

I am not able to understand whats happening here. How can I understand this and help me to successfully install tesseract on my pc?

C:\Users\HarshLaptop>pip install tesseract-ocr
Collecting tesseract-ocr
  Using cached https://files.pythonhosted.org/packages/e2/0d/dcee3dd0fc4c7bcd181
25a98f8ba6d9db7aecaa40770595203e312649587/tesseract-ocr-0.0.1.tar.gz
Requirement already satisfied: cython in c:\users\harshlaptop\anaconda3\lib\site
-packages (from tesseract-ocr) (0.25.2)
Building wheels for collected packages: tesseract-ocr
  Running setup.py bdist_wheel for tesseract-ocr ... error
  Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c "
import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\
\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open
)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __f
ile__, 'exec'))" bdist_wheel -d C:\Users\HARSHL~1\AppData\Local\Temp\pip-wheel-s
j29zfyo --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  file tesseract_ocr.py (for module tesseract_ocr) not found
  file tesseract_ocr.py (for module tesseract_ocr) not found
  running build_ext
  building 'tesseract_ocr' extension
  creating build
  creating build\temp.win-amd64-3.6
  creating build\temp.win-amd64-3.6\Release
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c
 /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic:\
users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual S
tudio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10
240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Pro
gram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows
Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6\R
elease\tesseract_ocr.obj
  tesseract_ocr.cpp
  tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'leptonic
a/allheaders.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN
\\x86_amd64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for tesseract-ocr
  Running setup.py clean for tesseract-ocr
Failed to build tesseract-ocr
Installing collected packages: tesseract-ocr
  Running setup.py install for tesseract-ocr ... error
    Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c
 "import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Tem
p\\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', op
en)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, _
_file__, 'exec'))" install --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-rec
ord-vnlr99lk\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    file tesseract_ocr.py (for module tesseract_ocr) not found
    file tesseract_ocr.py (for module tesseract_ocr) not found
    running build_ext
    building 'tesseract_ocr' extension
    creating build
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
/c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic
:\users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual
 Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\P
rogram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Window
s Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6
\Release\tesseract_ocr.obj
    tesseract_ocr.cpp
    tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'lepton
ica/allheaders.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\B
IN\\x86_amd64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\users\harshlaptop\anaconda3\python.exe -u -c "import setuptools, tok
enize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\\pip-install-x8nz3uhm\
\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.rea
d().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" insta
ll --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-record-vnlr99lk\install-rec
ord.txt --single-version-externally-managed --compile" failed with error code 1
in C:\Users\HARSHL~1\AppData\Local\Temp\pip-install-x8nz3uhm\tesseract-ocr\`enter code here`
Forceful answered 17/6, 2018 at 12:14 Comment(4)
Your python from anaconda distribution. Usually it's better to prefer conda to pip in such case. Have you tried conda install tesseract?Afterclap
Please read Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers? - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions.Erotic
@Ingaz PackageNotFoundErrorForceful
Try using pytesseractPeradventure
D
8

I had the same exact issue. Using Visual studio 2017, on windows 10 machine and python 3.6 installed. What worked for me was to:

  1. Download and Install tesseract-ocr executable from https://github.com/UB-Mannheim/tesseract/wiki (Script assumes running from a windows system and saved tesseract installation to the default location suggested I.e. C:\Program Files (x86)\Tesseract-OCR) See https://github.com/tesseract-ocr/tesseract/wiki for more information on installing on different OS types (including windows), using the pre-built binary package.
  2. Ensure you have Python Imaging Library('PIL') or 'pillow' package installed for opening images. (installing PIL didn't work in my setting but pillow did i.e. pip install pillow). The reason you need this is because it is required by pytesseract. See https://pypi.org/project/pytesseract/0.2.5/ for more info on that.
  3. Then to use it successfully in your code simply set the tesseract_cmd path within your code as follows:

    from PIL import Image
    import pytesseract
    
    try:
    img = Image.open(path/to/image.png) 
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
    text = pytesseract.image_to_string(path/to/image.png)
    Print(text)
    

    Hope it helps.

Duston answered 25/10, 2018 at 12:5 Comment(0)
A
0

You need to install leptonica.Tesseract needs it.

Afterthought answered 17/6, 2018 at 13:5 Comment(4)
What is leptonica ?Forceful
Leptonica is a library and dependency for tesseract.github.com/DanBloomberg/leptonicaAfterthought
I have visual studio installed already. still need leptonica ?Forceful
Yes my friend, it is a library for image processing which tesseract uses and dependent for it.Afterthought
N
0

In order to install leptonica you need to follow this link.

conda install -c conda-forge leptonica

However, this will not be a complete solution at all in order to remove error while installing tesseract-ocr.

You need to install tesseract using windows installer available here. Then you should install the python wrapper as:

pip install pytesseract

Last but not least, you should also set the tesseract path in your script after importing pytesseract library as below (Please do not forget that installation path might be modified in your case!):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
Normalie answered 3/9, 2020 at 15:48 Comment(0)
I
0

You can install tesseract by

  • -pip install tesseract

-pip install tesseract-oct does't seem to be working. So i just downloaded the .testeddata from here for the language i need https://github.com/tesseract-ocr/tesseract And pasted it in the testeddata folder in my local machine

Implicative answered 16/6, 2022 at 12:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.