Using Tesseract for handwriting recognition
Asked Answered
N

1

51

I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form.

I know you can train it to recognise your own handwriting somewhat but the problem in my case is I need to use it across multiple handwritings. Can anyone point me in the right direction?

Thanks a lot.

Norite answered 18/9, 2016 at 10:5 Comment(0)
O
42

In short, you would have to train the Tesseract engine to recognize the handwriting. Take a look at this link:

Tesseract handwriting with dictionary training

This is what the linked post says:

It's possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract

But don't expect very good results. Academics have typically gotten accuracy results topping out about 90%. Here are a couple references for words and numbers. So if your use case can deal with at least 1/10 errors, this might work for you.

Also here is a good academic article written on this subject:

Recognition of Handwritten Textual Annotations using Tesseract Open Source OCR Engine for information Just In Time (iJIT)

Oswald answered 19/9, 2016 at 15:8 Comment(6)
Thanks! That was very helpful.Norite
@hcam1 How does tesseract compare in terms of accuracy to other ocr as a service applications?Bandolier
FYI, I used tesseract for R but did not get very accurate results with handwriting recognition. Have you tried using it in R ?Spatterdash
I have not used in in R personally but you need to train the engine to recognize the handwriting. You also should take steps in your forms that you need to recognize to help increase the chance for good recognition. Here is a forum post with information on how to design your forms in order to get the most accurate results: leadtools.com/support/forum/posts/…Oswald
@TedTaylorofLife, tesseract as-is is not very good compared to other ocr as a service applications but it gives you a base to work with and customize to your application (since it's open source). If you do not have the time to spend training and customizing tesseract, then closed source ocr as a service applications are probably more accurate since they have engineers and resources and have already done most of the work for you.Oswald
I wonder how accurate Tesseract would be for handwriting applications if you limit it to only numbers: TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");Graver

© 2022 - 2024 — McMap. All rights reserved.