You're getting exception because subprocess isn't able to find the binaries (tesser executable).
The installation is a 3 step process:
1.Download/Install system level libs/binaries:
For various OS here's the help. For MacOS you can directly install it using brew.
Install Google Tesseract OCR (additional info how to install the
engine on Linux, Mac OSX and Windows). You must be able to invoke the
tesseract command as tesseract. If this isn’t the case, for example
because tesseract isn’t in your PATH, you will have to change the
“tesseract_cmd” variable at the top of tesseract.py. Under
Debian/Ubuntu you can use the package tesseract-ocr. For Mac OS users.
please install homebrew package tesseract.
For Windows:
An installer for the old version 3.02 is available for Windows from
our download page. This includes the English training data. If you
want to use another language, download the appropriate training data,
unpack it using 7-zip, and copy the .traineddata file into the
'tessdata' directory, probably C:\Program Files\Tesseract-OCR\tessdata
.
To access tesseract-OCR from any location you may have to add the
directory where the tesseract-OCR binaries are located to the Path
variables, probably C:\Program Files\Tesseract-OCR
.
Can download the .exe from here.
2.Install Python package
pip install pytesseract
3.Finally, you need to have tesseract binary in you PATH.
Or, you can set it at run-time:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>'
For Windows:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
The above line will make it work temporarily, for permanent solution add the tesseract.exe
to the PATH
- such as PATH=%PATH%;"C:\Program Files (x86)\Tesseract-OCR
".
Beside that make sure that TESSDATA_PREFIX
Windows environment variable is set to the directory, containing tessdata directory. For example:
TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR
i.e. tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata
Your example:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
print pytesseract.image_to_string(Image.open(r'D:\new_folder\img.png'))