How to improve digit recognition of a model trained on MNIST?

Asked 15/10, 2019 at 16:23 Answered 31/1, 2020 at 15:36

Solved java opencv machine-learning image-recognition mnist

I am working on handprinted multi-digit recognition with Java, using OpenCV library for preprocessing and segmentation, and a Keras model trained on MNIST (with an accuracy of 0.98) for recognition.

The recognition seems to work quite well, apart from one thing. The network quite often fails to recognize the ones (number "one"). I can't figure out if it happens due to preprocessing / incorrect implementation of the segmentation, or if a network trained on standard MNIST just hasn't seen the number one which looks like my test cases.

Here's what the problematic digits look like after preprocessing and segmentation:

becomes and is classified as 4.

becomes and is classified as 7.

becomes and is classified as 4. And so on...

Is this something that could be fixed by improving the segmentation process? Or rather by enhancing the training set?

Edit: Enhancing the training set (data augmentation) would definitely help, which I am already testing, the question of correct preprocessing still remains.

My preprocessing consists of resizing, converting to grayscale, binarization, inversion, and dilation. Here's the code:

Mat resized = new Mat();
Imgproc.resize(image, resized, new Size(), 8, 8, Imgproc.INTER_CUBIC);

Mat grayscale = new Mat();
Imgproc.cvtColor(resized, grayscale, Imgproc.COLOR_BGR2GRAY);

Mat binImg = new Mat(grayscale.size(), CvType.CV_8U);
Imgproc.threshold(grayscale, binImg, 0, 255, Imgproc.THRESH_OTSU);

Mat inverted = new Mat();
Core.bitwise_not(binImg, inverted);

Mat dilated = new Mat(inverted.size(), CvType.CV_8U);
int dilation_size = 5;
Mat kernel = Imgproc.getStructuringElement(Imgproc.CV_SHAPE_CROSS, new Size(dilation_size, dilation_size));
Imgproc.dilate(inverted, dilated, kernel, new Point(-1,-1), 1);

The preprocessed image is then segmented into individual digits as following:

List<Mat> digits = new ArrayList<>();
List<MatOfPoint> contours = new ArrayList<>();
Imgproc.findContours(preprocessed.clone(), contours, new Mat(), Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE);

// code to sort contours
// code to check that contour is a valid char

List rects = new ArrayList<>();

for (MatOfPoint contour : contours) {
     Rect boundingBox = Imgproc.boundingRect(contour);
     Rect rectCrop = new Rect(boundingBox.x, boundingBox.y, boundingBox.width, boundingBox.height);

     rects.add(rectCrop);
}

for (int i = 0; i < rects.size(); i++) {
    Rect x = (Rect) rects.get(i);
    Mat digit = new Mat(preprocessed, x);

    int border = 50;
    Mat result = digit.clone();
    Core.copyMakeBorder(result, result, border, border, border, border, Core.BORDER_CONSTANT, new Scalar(0, 0, 0));

    Imgproc.resize(result, result, new Size(28, 28));
    digits.add(result);
}

Stacistacia answered 15/10, 2019 at 16:23 Comment(2)

you you using the mask or the (masked?) original grayscale pixels as input for your classification? – Ramses 21/10, 2019 at 14:57

@Ramses I'm using the preprocessed (binarized, inverted, dilated) version. Ones that match the MNIST training set. There are examples of number "1" after preprocessing in my post. – Stacistacia 21/10, 2019 at 15:33

After some research and experiments, I came to a conclusion that the image preprocessing itself was not the problem (I did change some suggested parameters, like e.g. dilation size and shape but they were not crucial to the results). What did help, however, were 2 following things:

As @f4f noticed, I needed to collect my own dataset with real-world data. This already helped tremendously.
I made important changes to my segmentation preprocessing. After getting individual contours, I first size-normalize the images to fit into a 20x20 pixel box (as they are in MNIST). After that I center the box in the middle of 28x28 image using the center of mass (which for binary images is the mean value across both dimensions).

Of course, there are still difficult segmentation cases, such as overlapping or connected digits, but the above changes answered my initial question and improved my classification performance.

Stacistacia answered 31/1, 2020 at 15:36 Comment(0)

I believe that your problem is dilation process. I understand that you wish to normalize image sizes, but you shouldn't break the proportions, you should resize to maximum desired by one axis (the one that allows largest re-scale without letting another axis dimension to exceed the maximum size) and fill with background color the rest of the image. It's not that "standard MNIST just hasn't seen the number one which looks like your test cases", you make your images look like different trained numbers (the ones that are recognized)

If you maintained the correct aspect ration of your images (source and post-processed), you can see that you did not just resize the image but "distorted" it. It can be the result of either non-homogeneous dilation or incorrect resizing

Bernardina answered 20/10, 2019 at 12:51 Comment(5)

I believe that @Bernardina has some weight, Try not changing the aspect ratio of the numeric literals. – Rotterdam 21/10, 2019 at 12:43

Sorry, I don't quite follow. Do you think my dilation process or resizing process is the problem? I only resize the image in the beginning with this line Imgproc.resize(image, resized, new Size(), 8, 8, Imgproc.INTER_CUBIC);. Here the aspect ration stays the same, where do I break the proportions? – Stacistacia 21/10, 2019 at 14:39

@Bernardina in answer on your edits above: yes, I don't just resize the image, I apply different operations, one of them is dilation, which is a morphological one, which causes slight "distortion" as it causes bright regions within an image to “grow". Or do you mean resizing in the very end, where I make the images 28x28? – Stacistacia 22/10, 2019 at 14:51

@youngpanda, you might find the discussion here #28525936 interesting. It might give you a clue why your approach doesn't bring good results – Bernardina 23/10, 2019 at 16:24

@Bernardina thank you for the link, I am familiar with LeNet, but it's nice to read again – Stacistacia 24/10, 2019 at 11:31

There are already some answers posted but neither of them answers your actual question about image preprocessing.

In my turn I don't see any significant problems with your implementation as long as it's a study project, well done.

But one thing to notice you may miss. There are basic operations in mathematical morphology: erosion and dilation (used by you). And there complex operations: various combinations of basic ones (eg. opening and closing). Wikipedia link is not the best CV reference, but you may start with it to get the idea.

Usually in its better to use opening instead of erosion and closing instead of dilation since in this case original binary image changes much less (but the desired effect of cleaning sharp edges or filling gaps is reached). So in your case you should check closing (image dilation followed by erosion with the same kernel). In case extra-small image 8*8 is greatly modified when you dilate even with 1*1 kernel (1 pixel is more than 16% of image) which is less on larger images).

To visualize the idea see the following pics (from OpenCV tutorials: 1, 2):

dilation:

closing:

Hope it helps.

Darb answered 24/10, 2019 at 10:7 Comment(5)

Thank you for the input! Actually it is not a study project, so what would be the problem then?.. My image is quite big when I apply dilation, 8x8 is not the size of the image, it's the resizing factor for height and width. But it can still be an improvement option to try out different mathematical operations. I didn't know about opening and closing, I will try it out! Thank you. – Stacistacia 24/10, 2019 at 11:35

My fault, misread resize call as it was with 8*8 as new size. In case you want to use OCR in real world you shall consider an option of transfer learning your original net on data typical to your area of use. At least check whether it improves accuracy, generally it should do. – Darb 24/10, 2019 at 11:54

Another thing to check is preprocessing order: grayscale->binary->inverse->resize. Resizing is a costly operation and I don't see a need for applying it to color image. And symbols segmentation may be done without contour detection (with something less costly) if you have some specific input format but it may be hard to implement. – Darb 24/10, 2019 at 12:1

If I had another dataset apart from MNIST, I could try transfer learning :) I will try to change the preprocessing order and get back to you. Thank you! I didn't yet find any easier option than contour detection for my problem... – Stacistacia 24/10, 2019 at 12:15

Ok. You can collect dataset yourself from that images you will use OCR on it's a common practice. – Darb 24/10, 2019 at 12:34

So, you need a complex approach cause every step of your computing cascade based on the previous results. In your algorithm you have the next features:

Image preprocessing

As mentioned earlier, if you apply the resizing, then you lose information about the aspect ratios of the image. You have to do the same reprocessing of digits images to get the same results that was implied in the training process.

Better way if you just crop the image by fixed size pictures. In that variant you won't need in contours finding and resizing digit image before training process. Then you could make a little change in your crop algorithm for better recognizing: simple find the contour and put your digit without resizing at the center of relevant image frame for recognition.

Also you should pay more attention to the binarization algorithm. I have had experience studying the effect of binarization threshold values on learning error: I can say that this is a very significant factor. You may try another algorithms of binarization to check this idea. For example you may use this library for testing alternate binarization algorithms.

Learning algorithm

To improve the quality of recognition you use cross-validation at the training process. This helps you to avoid the problem of overfitting for your training data. For example you may read this article where explained how to use it with Keras.

Sometimes higher rates of accuracy measure doesn't say anything about the real recognition quality cause trained ANN not found the pattern in the training data. It may be connected with the training process or the input dataset as explained above, or it may cause by the ANN architecture choose.

ANN architecture

It's a big problem. How to define the better ANN architecture to solve the task? There are no common ways to do that thing. But there are a few ways to get closer to the ideal. For example you could read this book. It helps you to make a better vision for your problem. Also you may find here some heuristics formulas to fit the number of hidden layers/elements for your ANN. Also here you will find a little overview for this.

I hope this will helps.

Valdovinos answered 22/10, 2019 at 14:24 Comment(5)

1. If I understand you correctly, I can't crop to fixed size, it's a picture of a multi digit number, and all cases are different in size/place etc. Or did you mean something different? Yes, I tried different binarization methods and tweaked parameters, if that's what you mean. 2. Actually the recognition on MNIST is great, there's no overfitting happening, the accuracy I mentioned is test accuracy. Neither the network or its training are the problem. 3. Thanks for all the links, I am quite happy with my architecture though, of course there's always room for improvement. – Stacistacia 22/10, 2019 at 14:41

Yes, you get it. But you are always have possibility to make your dataset more unified. In your case just better to crop images of digits by the contours as you already do. But after that it will be better to just expand your digits images to unified size with accordance to the maximal size of a digit image by x and y scale. You could prefer center of the digit contour region to do that thing. It will give your more clean input data for your training algorithm. – Valdovinos 22/10, 2019 at 16:11

Do you mean that I have to skip dilation? In the end I already center the image, when I apply the border (50 px on each side). After that I am resizing each digit to 28x28, since this is the size we need for MNIST. Do you mean I can resize to 28x28 differently? – Stacistacia 23/10, 2019 at 7:57

Yes, dilation is undesirable. Your contours may have different ratios by height and width, that's why you are needed in improvement of your algorithm here. At least you should make images sizes with the same ratios. Since you have 28x28 input picture sizes, you must prepare images with the same 1:1 ratio by x and y scales. You should get not a 50 px border for each picture side, but X, Y px borders which satisfy the condition: contourSizeX+borderSizeX == contourSizeY+borderSizeY. Thats all. – Valdovinos 23/10, 2019 at 10:6

I already tried without dilation (forgot to mention in the post). It didn't change any results... My border number was experimental. Ideally I would need my digits to fit 20x20 box (size-normalized as such in the dataset) and after that shift it using the center of mass... – Stacistacia 23/10, 2019 at 11:19

As @f4f noticed, I needed to collect my own dataset with real-world data. This already helped tremendously.
I made important changes to my segmentation preprocessing. After getting individual contours, I first size-normalize the images to fit into a 20x20 pixel box (as they are in MNIST). After that I center the box in the middle of 28x28 image using the center of mass (which for binary images is the mean value across both dimensions).

Of course, there are still difficult segmentation cases, such as overlapping or connected digits, but the above changes answered my initial question and improved my classification performance.

Stacistacia answered 31/1, 2020 at 15:36 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags