Tesseract user-pattern is not applied
Asked Answered
E

1

10

I want to do OCR on this image.enter image description here This is pre-define format. ie first five will characters, then next four will be digits and last will be character.

When I execute following command

$ tesseract in.png stdout

I get output as BDVPD474SQ

So, I went for user-pattern. I created a file(in directory /usr/share/tesseract-ocr/tessdata/configs) named as bazaar (its content is as follow)

load_system_dawg     F
load_freq_dawg       F
user_patterns_suffix user-patterns

I also created a file, named as eng.user-patterns in directory /usr/share/tesseract-ocr/tessdata (its content is as follow)

\A\A\A\A\A\d\d\d\d\A

Still, I am getting same result

$ tesseract in.png stdout bazaar
BDVPD474SQ

What I am doing wrong ? Has anyone accomplished this by Tess4j ?

Embroider answered 30/10, 2015 at 5:18 Comment(4)
I had to delete my post, since it was obviously wrong. I looked at the source ( code.google.com/p/tesseract-ocr/source/browse/dict/trie.h ), which proves your pattern correct. Also I tried your example and got the same result.Cornhusking
@JoOko So can we say that, this feature is not implemented ?Embroider
And still seems to be the case all these years later? :\Ardellearden
still facing same issue, there isn't much on how to use itPolyhistor
M
0

You can add the option --oem 0 to ensure user patterns apply. See this PR comment.

Since I am on tesseract 5.3.3, I had to tweak your input image to reproduce a similar behavior:

enter image description here

I specify the user pattern \A\A\A\A\A\d\d\d\A\A, to force recognition of the partially erased 9 character as a letter.

With --oem 0, Tesseract returns BDVPD474SQ (it reads an S).

Without the option, Tesseract returns BDVPD474SQ (identifying a 5).

Monah answered 13/2, 2024 at 9:46 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.