Digit Recognition on CNN

Asked 15/7, 2016 at 7:1 Answered 17/7, 2016 at 11:24

machine-learning ocr deep-learning image-recognition handwriting-recognition

I am testing printed digits (0-9) on a Convolutional Neural Network. It is giving 99+ % accuracy on the MNIST Dataset, but when I tried it using fonts installed on computer (Ariel, Calibri, Cambria, Cambria math, Times New Roman) and trained the images generated by fonts (104 images per font(Total 25 fonts - 4 images per font(little difference)) the training error rate does not go below 80%, i.e. 20% accuracy. Why?

Here is "2" number Images sample -

I resized every image 28 x 28.

Here is more detail :-

Training data size = 28 x 28 images. Network parameters - As LeNet5 Architecture of Network -

Input Layer -28x28
| Convolutional Layer - (Relu Activation);
| Pooling Layer - (Tanh Activation)
| Convolutional Layer - (Relu Activation)
| Local Layer(120 neurons) - (Relu)
| Fully Connected (Softmax Activation, 10 outputs)

This works, giving 99+% accuracy on MNIST. Why is so bad with computer-generated fonts? A CNN can handle lot of variance in data.

Acute answered 15/7, 2016 at 7:1 Comment(2)

What is the full topology you use? Is it the original LeNet5, or have you altered any of the hidden layers? If you train a new model from scratch, overfitting should look like another 99+% success rate; your 20% suggests a much different problem of some sort. – Transonic 15/7, 2016 at 23:18

Yes, it is original LeNet5, Layers are as mentioned above, It is working with MNIST Dataset but not my dataset, My dataset size is 1036 images, 104 on per number. – Acute 16/7, 2016 at 5:6

I see two likely problems:

Preprocessing: MNIST is not only 28px x 28px, but also:

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

Source: MNIST website

Overfitting:

MNIST has 60,000 training examples and 10,000 test examples. How many do you have?
Did you try dropout (see paper)?
Did you try dataset augmentation techniques? (e.g. slightly shifting the image, probably changing the aspect ratio a bit, you could also add noise - however, I don't think those will help)
Did you try smaller networks? (And how big are your filters / how many filters do you have?)

Remarks

Interesting idea! Did you try simply applying the trained MNIST network on your data? What are the results?

Hardunn answered 17/7, 2016 at 11:24 Comment(1)

Good points. Just for adding another option, also Batch Normalization (arxiv.org/abs/1502.03167) could be considered. – Trevatrevah 28/7, 2016 at 12:30

It may be an overfitting problem. It could happen when your network is too complex for the problem to resolve. Check this article: http://es.mathworks.com/help/nnet/ug/improve-neural-network-generalization-and-avoid-overfitting.html

Lesbos answered 15/7, 2016 at 8:1 Comment(3)

should I remove layers from network, any suggestion you can provide. – Acute 15/7, 2016 at 12:12

I am not very falimiar with CNN, but I guess you could have too many hidden layers. May be this clould be usefull for you: cs231n.github.io/neural-networks-1/#arch <<...it seems that smaller neural networks can be preferred if the data is not complex enough to prevent overfitting.>> – Lesbos 15/7, 2016 at 12:40

How many epochs are you training your network? What is the size of your dataset? – Godber 15/7, 2016 at 21:18

It definitely looks like an issue of overfitting. I see that you have two convolution layers, two max pooling layers and two fully connected. But how many weights total? You only have 96 examples per class, which is certainly smaller than the number of weights you have in your CNN. Remember that you want at least 5 times more instances in your training set than weights in your CNN.

You have two solutions to improve your CNN:

Shake each instance in the training set. You each number about 1 pixel around. It will already multiply your training set by 9.
Use a transformer layer. It will add an elastic deformation to each number at each epoch. It will strengthen a lot the learning by artificially increase your training set. Moreover, it will make it much more effective to predict other fonts.

Gadoid answered 17/7, 2016 at 9:13 Comment(0)

Recommended topics

Hot tags