Tensorflow model for OCR

A

2

19

I am new in Tensorflow and I am trying to build model which will be able to perform OCR on my images. I have to read 9 characters (fixed in all images), numbers and letters. My model would be similar to this

https://matthewearl.github.io/2016/05/06/cnn-anpr/

My questions would be, should I train my model against each character firstly and after combine characters to get full label represented. Or I should train on full label straight ?

I know that I need to pass to model, images + labels for corresponding image, what is the format of those labels, is it textual file, I am bit confused about that part, so any explanation about format of labels which are passed to model would be helpful ? I appreciate, thanks.

Adjuvant answered 25/4, 2017 at 12:21 Comment(3)

I'd recommend training on all labels combined. That's the cleanest solution. If that fails, then you can try different methods. You usually pass in a one-hot encoded vector as label. For example with dogs and cats, you'd have label cat represented as [1,0] and dog as [0,1]. – Tellurize 25/4, 2017 at 13:34

Ok thanks, how I can pass for example label "17C31T2F" ? – Adjuvant 25/4, 2017 at 13:57

The HASYv2 dataset of handwritten symbols of size 32px x 32px might be interesting for you. – Locular 26/4, 2017 at 7:43

G

10

I'd recommend to train an end-to-end OCR model with attention. You can try the Attention OCR which we used to transcribe street names https://github.com/tensorflow/models/tree/master/research/attention_ocr

My guess it should work pretty well for your case. Refer to the answer https://stackoverflow.com/a/44461910 for instructions on how to prepare the data for it.

Geminius answered 26/6, 2017 at 21:22 Comment(4)

Thanks Alexander for your respond, I will try to do on the way you suggested. – Adjuvant 27/6, 2017 at 11:31

hi Alexander, do you think the attention ocr model would work on license plates? For example the number plate like this: i.cbc.ca/1.3112890.1434422741!/fileImage/httpImage/… And assume we have enough data to train, do you know what would be the approximate accuracy the model can reach? Thanks. – Maurizio 19/8, 2017 at 7:2

@Adjuvant have you tried the attention ocr out? Does it work for you? Thanks. – Maurizio 21/8, 2017 at 7:34

Hi Bob, unfortunately I couldn't make it work for myself. For that project we used different ocr solution which is not using AI. – Adjuvant 21/8, 2017 at 9:31

T

12

There are a couple of ways to deal with this (the following list is not exhaustive).

1) The first one is word classification directly from your image. If your vocabulary of 9 characters is limited you can train a word specific classifier. You can then convolve this classifier with your image and select the word with the highest probability.

2) The second option is to train a character classifier, find all characters in your image, and find the most likely line that has the 9 character you are looking for.

3) The third option is to train a text detector, find all possible text boxes. Then read all text boxes with a sequence-based model, and select the most likely solution that follows your constraints. A simple sequence-based model is introduced in the following paper: http://ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf. Other sequence-based models could be based on HMMs, Connectionist Temporal Classification, Attention based models, etc.

4) The fourth option are attention-based models that work end-to-end to first find the text and then output the characters one-by-one.

Note that this list is not exhaustive, there can be many different ways to solve this problem. Other options can even use third party solutions like Abbyy or Tesseract to help solve your problem.

Thiazine answered 25/4, 2017 at 16:44 Comment(1)

Thanks, is there any examples available for 1, 2 and 4. By your opinion which way would be the best and which would be easiest to go with and why ? – Adjuvant 25/4, 2017 at 18:0

G

10