TensorFlow - Text recognition in image [closed]
Asked Answered
B

1

7

I am new to TensorFlow and to Deep Learning. I am trying to recognize text in naturel scene images. I used to work with an OCR but I would like to use Deep Learning. The text has always the same format : ABC-DEF 88:88.

What I have done is recognize every character/digit. It means that I cropped the image around every character (so each picture gives me 10 characters) to build my training and test set and they build a two conv neural networks. So my training set was a set of characters pictures and the labels were just characters/digits.

But I want to go further. What I would like to do is just to give the full pictures and output the entire text (not one character such as in my previous model).

Thank you in advance for any help.

Bussey answered 15/2, 2017 at 4:56 Comment(0)
G
6

The difficulty is that you don't know where the text is. The solution is, given an image, you need to use a sliding window to crop different part of the image, then use a classifier to decide if there are texts in the cropped area. If so, use your character/digit recognizer to tell which characters/digits they really are.

So you need to train another classifer: given a cropped image (the size of cropped images should be slightly larger than that of your text area), decide if there are texts inside.

Just construct training set (positive samples are text areas, negative samples are other areas randomly cropped from the big images) and train it~

Guiana answered 15/2, 2017 at 9:2 Comment(5)
Thanks but should this classifier (sliding window) must be a convnet ? The training set must contained multi character text areas or just one character ?Bussey
A convnet is fine and easy to implement, if you are using TensorFlow, Caffe or some other deep learning framework, but might be slow in the detection phase (because you need to slide the window across the whole image, for each image there are many windows). Other models also works, such as a boosting method with Haar-like features (By Google "haar like feature adaboost cascade" you can find a lot of material on face recognition).Guiana
@alexattia The training set is better to contain multiple characters. By doing this, you can have a larger window and reduce false positive. If the area is too small, may be some other things will be reported as letters/digits. Say, the algorithm may take some vertical edge as digit "1", which is terrible.Guiana
Ok I'll try it ! What do you think of this matthewearl.github.io/2016/05/06/cnn-anpr ? It just contained one convnet instead of two algorithms as you said (detection + classification)Bussey
The project you mentioned above is great and highly relevant! Try to reuse it instead of building a new one from the scratch!Guiana

© 2022 - 2024 — McMap. All rights reserved.