Simple Digit Recognition OCR in OpenCV-Python
Asked Answered
N

3

429

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn't figure out on how to use it. I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don't understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

Nava answered 23/2, 2012 at 12:37 Comment(0)
N
605

Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers. (Although I didn't understand some of the features at the end)

2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

I took the below image for my training data:

enter image description here

(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).

To prepare the data for training, I made a small code in OpenCV. It does the following things:

  1. It loads the image.
  2. Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
  3. Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in the box.
  4. Once the corresponding digit key is pressed, it resizes this box to 10x10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
  5. Then save both the arrays in separate .txt files.

At the end of the manual classification of digits, all the digits in the training data (train.png) are labeled manually by ourselves, image will look like below:

enter image description here

Below is the code I used for the above purpose (of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)

Now we enter in to training and testing part.

For the testing part, I used the below image, which has the same type of letters I used for the training phase.

enter image description here

For training we do as follows:

  1. Load the .txt files we already saved earlier
  2. create an instance of the classifier we are using (it is KNearest in this case)
  3. Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

  1. We load the image used for testing
  2. process the image as earlier and extract each digit using contour methods
  3. Draw a bounding box for it, then resize it to 10x10, and store its pixel values in an array as done earlier.
  4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)

I included last two steps (training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked, below is the result I got:

enter image description here


Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.

But anyway, this is a good start to go for beginners (I hope so).

Nava answered 8/3, 2012 at 15:35 Comment(28)
+1 Long post, but very educational. This should go to opencv tag infoBrutus
in case anyone's interested, I made a proper OO engine from this code, along with some bells and whistles: github.com/goncalopp/simple-ocr-opencvGreenhead
Hi, the google docs link refered by this post doesn't work for me.Brindisi
@Ricardo : edited the link, check if it works, or search for the name of the paper, first link in google turns up.Nava
When you're extracting the features, it is possible to do it in order? Or the border detection is always random? Thanks in advanceMuff
I didn't get you. Are you asking the order in which contours are found in opencv? then it is not in a good order. I don't know if it is random, but not in any order we know, like left-to-right-top-to-bottom, vice versa etc. It causes some difficulty in some cases. We have to manually order it according to our criteria. (if this is not what you meant to ask, please clarify )Nava
Note that there is no need for using SVM and KNN when you have a well defined perfect font. For instance, the digits 0, 4, 6, 9 form one group, the digits 1, 2, 3, 5, 7 form another, and 8 another. This group is given by the euler number. Then "0" has no endpoints, "4" has two, and "6" and "9" are distinguished by centroid position. "3" is the only one, in the other group, with 3 endpoints. "1" and "7" are distinguished by the skeleton length. When considering the convex hull together with the digit, "5" and "2" have two holes and they can be distinguished by the centroid of largest hole.Pimp
Well, thanks for this information on euler number. I didn't know that. Anyway, in this case I know it is pretty straight forward problem. But my aim was to understand how to use kNearest function and how to develop a simple OCR, in the most basic level. That is what all I wanted.Nava
@AbidRahmanK, First code runs perfect. But unfortunately I am getting this error while running second code.OpenCV Error: Bad argument (train data must be floating-point matrix) in cvCheckTrainData, file /build/buildd/opencv-2.3.1/modules/ml/src/inner_functions.cpp, line 857 Traceback (most recent call last): File "num1.py", line 10, in <module> model.train(samples,responses) cv2.error: /build/buildd/opencv-2.3.1/modules/ml/src/inner_functions.cpp:857: error: (-5) train data must be floating-point matrix in function cvCheckTrainData How can i fix this?Frown
@Frown : check if samples and responses are floating point. If not, convert them to floating point.Nava
@AbidRahmanK : I am getting generalresponses.data and generalsamples.data files as empty. Actually first program gives responses a null list in my case and gives 1048586 as key.Frown
@Frown : did you use the same images I used? First part of code is set for this image only.Nava
@AbidRahmanK : Yes brother. And it showed as you said. But the files are empty :( .Frown
@rash: did you get the solution for your problem? I was doing this thing and getting the same issue.Achievement
Got the problem.. Thank you. It was a great tutorial. I was making a small mistake. If anyone else faces same issue in this like me and @Frown then that is because you are pressing wrong key. For each number in box, you have to enter that no so that it gets trained on it. Hope that helps.Achievement
In case for guys who installed OpenCV(version 2.4+), there are several API has been changed in OpenCV. I just made a repo, which works fine for me. Hope it helps.Glow
A stellar tutorial. Thank you! There are a few changes needed to get this to work with the latest (3.1) versjon of OpenCV: contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE) => _,contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE), model = cv2.KNearest() => model = cv2.ml.KNearest_create(), model.train(samples,responses) => model.train(samples,cv2.ml.ROW_SAMPLE,responses), retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1) => retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)Mohawk
Thanks. Yeah, this has become pretty old and I haven't explored 3.x version recently. I think I should revisit this once.Nava
@JohannesBrodwall Thanks for your update, quick note - your last correction is slightly off and should read: retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1) => retval, results, neigh_resp, dists = model.findNearest(roismall, k = 1)Sedimentology
it's really good explaination but how we can find number iteself in image at the above image every contour will be the number. but in realworld there are contours which are not number.@AbidRahmanKCitizenry
I'm curious to know how fast is this aproach compared to tesseract OCR regarding only numbers. Does anybody tested both aproaches?Tradein
I tested myself, so I'll give my results if anyone wants to have a clear idea on how fast this aproach can be. I used a small (38x24px) image with the characters 123 I counted from when the image to analyse is loaded till i get the string var. Using OpenCV 3 on Windows 10 x64 with an Intel i-73540M CPU and Tesseract 4.0 alpha (untrained). - AbidRahmanK code took: **0.0510** seconds - Tesseract with "-psm 6" attributes took: **0.8970** seconds So @AbidRahmanK wins by far on speed, haven't compared accuracy beyond this very simple test wich gave 100% accuracyTradein
I tried your code and indeed the digits was perfectly recognized. But how do you recognize the dot?Skelly
Can we make the same process using Java ?Tailrace
I tried your second code and get the following error message at line model.train(samples,responses): TypeError: only size-1 arrays can be converted to Python scalars I don't know what I am doing wrong. Any ideas?Carberry
@Carberry Did you solve your problem? I am getting the same error.Sedulity
@Carberry Use model.train(samples, cv2.ml.ROW_SAMPLE, responses)Tectrix
I updated the code and added more images. For every character i trained on about 500 images and files are created but the size of those files is extremely large. classifications.txt is about 12mb and flattened_images files is about 11.2gb. what am i doing wrong?Midgut
I
61

For those who interested in C++ code can refer below code. Thanks Abid Rahman for the nice explanation.


The procedure is same as above but, the contour finding uses only first hierarchy level contour, so that the algorithm uses only outer contour for each digit.

Code for creating sample and Label data

//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
thr.copyTo(con);

// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;  
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
    Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
    Mat ROI = thr(r); //Crop the image
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
    tmp1.convertTo(tmp2,CV_32FC1); //convert to float
    sample.push_back(tmp2.reshape(1,1)); // Store  sample data
    imshow("src",src);
    int c=waitKey(0); // Read corresponding label for contour from keyoard
    c-=0x30;     // Convert ascii to intiger value
    response_array.push_back(c); // Store label to a mat
    rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);    
}

// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert  to float

FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;

imshow("src",src);
waitKey();

Code for training and testing

Mat thr,gray,con;
Mat src=imread("dig.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
thr.copyTo(con);


// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
Data.release();

FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
Label.release();


KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;

vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;

//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));

for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
{
    Rect r= boundingRect(contours[i]);
    Mat ROI = thr(r);
    Mat tmp1, tmp2;
    resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
    tmp1.convertTo(tmp2,CV_32FC1);
    float p=knn.find_nearest(tmp2.reshape(1,1), 1);
    char name[4];
    sprintf(name,"%d",(int)p);
    putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}

imshow("src",src);
imshow("dst",dst);
imwrite("dest.jpg",dst);
waitKey();

Result

In the result the dot in the first line is detected as 8 and we haven’t trained for dot. Also I am considering every contour in first hierarchy level as the sample input, user can avoid it by computing the area.

Results

Iterate answered 3/1, 2014 at 11:13 Comment(4)
I tired to run this code. I was able to create sample and label data. But when i run the test-training file, it runs with an error *** stack smashing detected ***: and hence i am not getting a final proper image as you are getting above (digits in green color)Keats
i change char name[4]; in your code to char name[7]; and i didn't get the stack related error but still i am not getting the correct results. I am getting a image like here < i.imgur.com/qRkV2B4.jpg >Keats
@Keats Make sure that you are getting number of contour same as the number of digits in the image, also try by printing the result on console.Iterate
Hello, could we load a trained net to use?Pankhurst
D
4

I had some problems to generate the training data, because it was hard sometimes to identify the last selected letter, so I rotated the image 1.5 degrees. Now each character is selected in order and the test still shows a 100% accuracy rate after training. Here is the code:

import numpy as np
import cv2

def rotate_image(image, angle):
  image_center = tuple(np.array(image.shape[1::-1]) / 2)
  rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
  result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
  return result

img = cv2.imread('training_image.png')
cv2.imshow('orig image', img)
whiteBorder = [255,255,255]
# extend the image border
image1 = cv2.copyMakeBorder(img, 80, 80, 80, 80, cv2.BORDER_CONSTANT, None, whiteBorder)
# rotate the image 1.5 degrees clockwise for ease of data entry
image_rot = rotate_image(image1, -1.5)
#crop_img = image_rot[y:y+h, x:x+w]
cropped = image_rot[70:350, 70:710]
cv2.imwrite('rotated.png', cropped)
cv2.imshow('rotated image', cropped)
cv2.waitKey(0)

For sample data, I made some changes to the script, like this:

import sys
import numpy as np
import cv2

def sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM'):
    # initialize the reverse flag
    x_reverse = False
    y_reverse = False
    if x_axis_sort == 'RIGHT_TO_LEFT':
        x_reverse = True
    if y_axis_sort == 'BOTTOM_TO_TOP':
        y_reverse = True
    
    boundingBoxes = [cv2.boundingRect(c) for c in contours]
    
    # sorting on x-axis 
    sortedByX = zip(*sorted(zip(contours, boundingBoxes),
    key=lambda b:b[1][0], reverse=x_reverse))
    
    # sorting on y-axis 
    (contours, boundingBoxes) = zip(*sorted(zip(*sortedByX),
    key=lambda b:b[1][1], reverse=y_reverse))
    # return the list of sorted contours and bounding boxes
    return (contours, boundingBoxes)

im = cv2.imread('rotated.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
contours, boundingBoxes = sort_contours(contours, x_axis_sort='LEFT_TO_RIGHT', y_axis_sort='TOP_TO_BOTTOM')

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28 and h < 40:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.ubyte)
responses = responses.reshape((responses.size,1))
print("training complete")

np.savetxt('generalsamples.data',samples,fmt='%i')
np.savetxt('generalresponses.data',responses,fmt='%i')
Diminish answered 29/5, 2021 at 7:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.