How to represent:
- Create new image with paint (any size)
- Add letter A to this image
- Try to recognize -> tesseract will not find any letters
- Copy-paste this letter 5-6 times to this image
- Try to recognize -> tesseract will find all the letters
Why?
How to represent:
Why?
You must set the "page segmentation mode" to "single char".
For example, in Android you do the following:
api.setPageSegMode(TessBaseAPI.pageSegMode.PSM_SINGLE_CHAR);
api.SetPageSegMode(tesseract::PSM_SINGLE_CHAR);
for C++ users ;) –
Tricycle --psm 10
–
Ehudd engine.DefaultPageSegMode = PageSegMode.SingleChar;
for C# users (engine is TesseractEngine) –
Tactician python code to do that configuration is like this:
import pytesseract
import cv2
img = cv2.imread("path to some image")
pytesseract.image_to_string(
img, config=("-c tessedit"
"_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
" --psm 10"
" -l osd"
" "))
the --psm
flag defines the page segmentation mode.
according to documentaion of tesseract, 10
means :
Treat the image as a single character.
so to recognize a single character you just need to use : --psm 10
flag.
You need to set Tesseract's page segmentation mode to "single character."
tesseract $image $outbase -psm 10
. The -psm sets the page segmentation mode, and mode 10 is for single characters. It's all in the man page. –
Planet Have you seen this?
https://code.google.com/p/tesseract-ocr/issues/detail?id=581
The bug list shows it as "no longer an issue".
baseApi.setVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
code before the init Tesseract
Other option is, in case if PageSegMode.SingleChar
is still not working, to decrease colors of image (Binarization for example) and then use PageSegMode.SingleChar
.
© 2022 - 2025 — McMap. All rights reserved.
PageSegMode.SingleChar
fixed the issue. – Flask