pytesseract error Windows Error [Error 2]
Asked Answered
T

4

3

Hi I am trying the python library pytesseract to extract text from image. Please find the code:

from PIL import Image
from pytesseract import image_to_string
print image_to_string(Image.open(r'D:\new_folder\img.png'))

But the following error came:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

I did not found a specific solution to this. Can anyone help me what to do. Anything more to be downloaded or from where i can download it etc..

Thanks in advance :)

Thanhthank answered 14/1, 2017 at 16:34 Comment(0)
G
4

I had the same trouble and quickly found the solution after reading this post:

OSError: [Errno 2] No such file or directory using pytesser

Just need to adapt it to Windows, replace the following code:

tesseract_cmd = 'tesseract'

with:

tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'

(need double \\ to escape first \ in the string)

Gombosi answered 18/1, 2017 at 9:18 Comment(0)
M
2

You're getting exception because subprocess isn't able to find the binaries (tesser executable).

The installation is a 3 step process:

1.Download/Install system level libs/binaries:

For various OS here's the help. For MacOS you can directly install it using brew.

Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows). You must be able to invoke the tesseract command as tesseract. If this isn’t the case, for example because tesseract isn’t in your PATH, you will have to change the “tesseract_cmd” variable at the top of tesseract.py. Under Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users. please install homebrew package tesseract.

For Windows:

An installer for the old version 3.02 is available for Windows from our download page. This includes the English training data. If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract-OCR\tessdata.

To access tesseract-OCR from any location you may have to add the directory where the tesseract-OCR binaries are located to the Path variables, probably C:\Program Files\Tesseract-OCR.

Can download the .exe from here.


2.Install Python package

pip install pytesseract

3.Finally, you need to have tesseract binary in you PATH.

Or, you can set it at run-time:

import pytesseract

pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>'

For Windows:

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
  • The above line will make it work temporarily, for permanent solution add the tesseract.exe to the PATH - such as PATH=%PATH%;"C:\Program Files (x86)\Tesseract-OCR".

  • Beside that make sure that TESSDATA_PREFIX Windows environment variable is set to the directory, containing tessdata directory. For example:

    TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR

i.e. tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata


Your example:

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
print pytesseract.image_to_string(Image.open(r'D:\new_folder\img.png'))
Magnific answered 26/9, 2017 at 7:38 Comment(0)
H
0

You need Tesseract OCR engine ("Tesseract.exe") installed in your machine. If the path is not configured in your machine, provide complete path in pytesseract.py(tesseract.py).

README

Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows). You must be able to invoke the tesseract command as tesseract. If this isn't the case, for example because tesseract isn't in your PATH, you will have to change the "tesseract_cmd" variable at the top of tesseract.py. Under Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users. please install homebrew package tesseract.

Another thread

Hydrazine answered 22/9, 2017 at 11:23 Comment(0)
T
0

I have also faced the same problem regarding pytesseract. I would suggest you to work in linux environment, to solve such errors. Do the following commands in linux:

pip install pytesseract
sudo apt-get update
sudo apt-get install pytesseract-ocr

Hope this will do the work..

Tamratamsky answered 28/6, 2018 at 13:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.