I want to read a specific character sequence with Tesseract like this post : Tesseract OCR: is it possible to force a specific pattern?
I have tried bazaar matching pattern in Tesseract with the pattern \d\d\d\A\A
and OCR still recognize other words which doesn't match.
I have tried to use the "tessedit_char_whitelist" parameter but I can't choose the position of the characters with that.
- I launch the command :
tesseract image.jpg result -l eng bazaar
And I have this message :
Please provide at least 4 concrete characters at the beginning of the pattern
Invalid user pattern
\A\A\d\d\d
Tesseract Open Source OCR Engine v3.01 with Leptonica
- image.jpg :
The result :
AB123 ABC12 A1234 12345 ABCD1
So it is wrong, I just wanted to catch the sequence "AB123".
Can somebody tell me why the regular expression in my user-patterns file as no effect ? For the configuration, I have strictly followed the bazaar tutorial.
\w\w\d\d\d
,\A
is not what you want for all "characters". Try it here. – Sedberry\w\w\d\d\d
and I have the same error : Please provide at least 4 concrete characters at the beginning of the pattern Invalid user pattern \w\w\d\d\d. – ParaphernaliaTEST\w\w\d\d\d
and tested with the wordsTESTAB123 TESTABC12
etc ... I have no more the error Please provide at least 4 concrete characters at the beginning of the pattern but I still have Invalid user pattern TEST\w\w\d\d\d. I don't understand why it is invalid – Paraphernalia[A-Z][A-Z][0-9][0-9][0-9]
? Did you define it in/path/to/eng.user-patterns
? Does /path/to/configs/bazaar containuser_patterns_suffix user-patterns
? Just guessing... – Bulb