python-tesseract giving different results than the command line tesseract
Asked Answered
A

0

7

I'm using python-tesseract wrapper to OCR an image. However, for certain images I'm getting different results than what the tesseract command from command line fetches. On command line I do tesseract myimg.png myimg && more myimg.txt

However, the result from python tesseract wrapper are different.

I suspect it is because maybe liblept is not found from the wrapper since this line results False:

import tesseract
import ctypes
import os
print "HAVE_LIBLEPT=",tesseract.isLibLept()

and sometimes I get these errors while using the wrapper but i don't from command line tesseract:

Error in pixReduceRankBinary2: hs must be at least 2
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs1 not defined
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs2 not defined
Telephone Company Suspicious Activity

Does anyone know what could be cause of the mis-match? And How can I tell the wrapper to find liblept since the command line tesseract is working fine, I assume that it is finding tesseract properly

$ tesseract --version
tesseract 3.02.02
 leptonica-1.69
  libjpeg 8d : libpng 1.5.14 : libtiff 4.0.3 : zlib 1.2.5
Apprentice answered 13/10, 2013 at 20:36 Comment(1)
Did you find an answer about this? I am actually comparing the result of tesseract with -l eng --oem 3 --psm 11 using CLI to pytesseract.image_to_data(im, config='--psm 11 --oem 3 -l eng') and I can easily see that pytesseract is giving me different text as well as less relevant text. Doesn't pytesseract inherently use tesseract 4.0 ?Ope

© 2022 - 2024 — McMap. All rights reserved.