I wanted to perform text recognition from images and I want to use Python. I installed Anaconda. Now I want to install Tesseract but I also need to install Leptonica. I did not find any clear instruction how to do it in windows. For Leptonica I do not want to install Visual Studio. So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ? Thanks.
Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines:
1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice)
2- download the following two files for tesseract 3.05 dev version from http://domasofan.spdns.eu/tesseract/
There are 2 exe files:
- tesseract-core-yyyymmdd.exe Tesseract core application without language data
- tesseract-langs-yyyymmdd.exe All the language data available for Tesseract.
(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.)
The app is portable so you can install it on a USB stick or in another location.
sub Steps to install these:
- Download the tesseract-core and tesseract-langs packages.
- Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp").
Double click the tesseract-langs package and extract it to the same directory but add \tessdata to it in the above "Tess_temp" folder. For example if i would have extracted tesseract-core to c:\Tess_temp, tesseract-langs needs to go to c:\Tess_temp\tessdata.
Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\Program Files (x86)\Tesseract-OCR) (replace 3.02 materials with 3.05 )
It should work now with the 3.05 version on windows. copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands:
go to tesseract folder:
cd C:\Program Files <x86>\Tesseract-OCR
run tesseract on test.png:
tesseract -l eng test.png test_text -psm 6
it will show you
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
congratulations ! (check test_txt.txt for the extracted text)
© 2022 - 2024 — McMap. All rights reserved.