Mnist recognition using keras
Asked Answered
O

4

12

How can I train the model to recognize five numbers in one picture. The code is as follows:

from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dropout, Dense, Input
from keras.models import Model, Sequential

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
             activation='relu',
             input_shape=(28, 140, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dropout(0.5))

Here should be a loop for recognizing each number in the picture, but I don't know how to realize it.

model.add(Dense(11, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.Adadelta(),
          metrics=['accuracy'])

model.fit(X_train, y_train,
      batch_size=1000,
      epochs=8,
      verbose=1,
      validation_data=(X_valid, y_valid))

The picture of combined mnist number is as follows:

combined numbers in one picture

Obsecrate answered 5/4, 2017 at 7:57 Comment(1)
To summarise, there are basically two approaches to this problem. Preprocess the image with something like opencv to pull out the digits you want to identify and then run a standard single digit CNN OR do the whole thing with a CNN of some type as described below: a brute force CNN (trained it on multiple digits), an RNN etc. If the images are predictably formatted, then opencv is a good choice, for now and is the route I have chosen, for now!Libretto
M
2

I suggest two possible approaches:

Case 1- The images are nicely structured.

In the example you provided, this is indeed the case, so if your data looks like in the link you provided, I will suggest this approach.

In the link you provided, every image basically consists of 5 28-by-28 pixeled images stacked together. In this case, I would suggest to cut the images (that is, cut each image into 5 pieces), and train your model as with a usual MNIST data (for example, using the code you provided). Then, when you want to apply your model to classify new data, just cut each new image into 5 pieces as well. Classify each one of these 5 pieces using your model, and then just write these 5 numbers right next to the other as an output.

so regarding this sentence:

Here should be a loop for recognizing each number in the picture, but I don't know how to realize it

you don't need a for loop. Just cut your figures.

Case 2- The images are not nicely structured.

In this case, each image is labeled with 5 numbers. So each row in y_train and y_valid) will be a 0,1-vector with 55 entries. The first 11 entries is the one-hot encoding of the first number, the second 11 entries is the one-hot encoding of the second number and so on. So each row in y_train will have 5 entries equal 1, and the rest equal 0.

In addition, instead of using softmax activation on the output layer and categorical_crossentropy loss, use sigmoid activation function and 'binary_crossentropy' loss (see further discussion about the reasons here and here)

To summarize, replace this:

model.add(Dense(11, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.Adadelta(),
          metrics=['accuracy'])

with this:

model.add(Dense(55, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
          optimizer=keras.optimizers.Adadelta())
Metaplasia answered 29/8, 2017 at 14:54 Comment(0)
U
2

The classic work in this area is 'Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks'

Keras model (functional, not sequential):

inputs = Input(shape=(28, 140, 1), name="input")
x = inputs
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 140, 1))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
digit1 = Dense(10, activation='softmax', name='digit1')(x)
digit2 = Dense(10, activation='softmax', name='digit2')(x)
digit3 = Dense(10, activation='softmax', name='digit3')(x)
digit4 = Dense(10, activation='softmax', name='digit4')(x)
digit5 = Dense(10, activation='softmax', name='digit5')(x)
predictions = [digit1,digit2,digit3,digit4,digit5]
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=Adam(), metrics=['accuracy'], oss='categorical_crossentropy')

PS You may use 11 classes for 10 digits and empty space.

Uredo answered 31/8, 2017 at 3:41 Comment(1)
All the answers are helpful, this seemed the closest answer for the original question and why I have awarded bonus here.Libretto
P
1

Since you already have a very well behaved image, all you have to do is expand the number of classes in your model.

You can use 5 times 11 classes instead of using just 11 classes.

The first 11 classes identify the first number, the following 11 classes identify the second number and so on. A total of 55 classes, 11 classes for each position in the image.

So, in short:

  • X_training will be the entire image, as you have shown in the link, shaped as (28,140), or (140,28), depending on which methods you're using to load the images.
  • Y_training will be a 55-element vector, shape (55,), telling which numbers are in each quadrant.

Example: for the first image, with 9,7,5,4,10, you'd create Y_training with the following positions containing the value 1:

  • Y_training[9] = 1
  • Y_training[18] = 1 #(18=7+11)
  • Y_training[27] = 1 #(27=5+22)
  • Y_training[37] = 1 #(37=4+33)
  • Y_training[54] = 1 #(54=10+44)

Create your model layers the way you want, pretty much the same as a regular MNIST model, that means: no need to try loops or things like that.

But it will probably need to be a little bigger than before.

You will not be able to use categorical_crossentropy anymore, sice you will have 5 correct classes per image instead of just 1. If you're using "sigmoid" activations at the end, binary_crossentropy should be a good replacement.

Make sure your last layer fits the 55-element vector, such as Dense(55), for instance.

Parasol answered 29/8, 2017 at 13:49 Comment(0)
D
0

This problem has been tackled by Yann LeCun in the 90's. You can find demos and papers on his website.

A not so general solution is to train a CNN on single digits MNIST and use this CNN to perform inference on images like the one you provided. Prediction is done by sliding the trained CNN on the multi-digit image and applying post processing to aggregate the results and possibly estimating the bounding boxes.

A very general solution that can handle a variable number of number and of different scales and positions is to build a model that is able to predict the bounding boxes of the numbers and perform classification on them. There's a recent history of such models with R-CNN, Fast-RCNN and Faster-RCNN.

You can find a python implementation of Faster-RCNN on github.

Deathwatch answered 29/8, 2017 at 15:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.