Tesseract handwriting with dictionary training

About

Asked 7/9, 2012 at 0:39 Answered 4/11, 2012 at 18:3

I have a dictionary of words in a text file, separated by newlines. And I want to recognize the handwriting using Tesseract, and output the nearest matching line in the text file.

This is the first time I'll be using Tesseract, and it's already in my project workspace, I just need the training data.

Is it possible to train Tesseract to do this?

Passe answered 7/9, 2012 at 0:39 Comment(4)

Handwriting is hard to recognize due to the lines that can possibly connect letters, and due to the large variations between instances of letters. Tesseract works well for recognizing text consisting of crisp, clean letters. – Halothane 7/9, 2012 at 0:42

@Halothane But will it be possible with training the possible characters? – Passe 7/9, 2012 at 0:51

Tesseract was never really designed for handwriting recognition or connected scripts (which is why Arabic OCR is so hard for Tesseract to manage). You might be able to do it for very cleanly written individual letters, but not for arbitrary handwriting. – Halothane 7/9, 2012 at 0:57

Ha, too bad I was designing this app for doctor's handwriting. :( @Blender, do you know any API similar to tessaract, but can do handwriting recognition? – Passe 7/9, 2012 at 1:2

It's possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract

But don't expect very good results. Academics have typically gotten accuracy results topping out about 90%. Here are a couple references for words and numbers. So if your use case can deal with at least 1/10 errors, this might work for you.

Reuben answered 4/11, 2012 at 18:3 Comment(4)

Link is broken. Has the document been migrated to GitHub with the rest of the code? I couldn't find it at a glance. – Introject 11/1, 2017 at 22:43

Thanks for the answer. Could you provide exactly WHERE the instructions to train handwritten text are in that link? – Cattleya 15/6, 2017 at 1:31

This link is gone, and now on the tesseract faq it says to use lipi toolkit, is this right? – Imperturbation 27/11, 2017 at 20:3

Training Tesseract document on Github: github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract – Echeverria 22/1, 2018 at 9:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags