Image preprocessing with OpenCV before doing character recognition (tesseract)
Asked Answered
B

3

14

I'm trying to develop simple PC application for license plate recognition (Java + OpenCV + Tess4j). Images aren't really good (in further they will be good). I want to preprocess image for tesseract, and I'm stuck on detection of license plate (rectangle detection).

My steps:

1) Source Image

True Image

Mat img = new Mat();
img = Imgcodecs.imread("sample_photo.jpg"); 
Imgcodecs.imwrite("preprocess/True_Image.png", img);

2) Gray Scale

Mat imgGray = new Mat();
Imgproc.cvtColor(img, imgGray, Imgproc.COLOR_BGR2GRAY);
Imgcodecs.imwrite("preprocess/Gray.png", imgGray);

3) Gaussian Blur

Mat imgGaussianBlur = new Mat(); 
Imgproc.GaussianBlur(imgGray,imgGaussianBlur,new Size(3, 3),0);
Imgcodecs.imwrite("preprocess/gaussian_blur.png", imgGaussianBlur);  

4) Adaptive Threshold

Mat imgAdaptiveThreshold = new Mat();
Imgproc.adaptiveThreshold(imgGaussianBlur, imgAdaptiveThreshold, 255, CV_ADAPTIVE_THRESH_MEAN_C ,CV_THRESH_BINARY, 99, 4);
Imgcodecs.imwrite("preprocess/adaptive_threshold.png", imgAdaptiveThreshold);

Here should be 5th step, which is detection of plate region (probably even without deskewing for now).

I croped needed region from image (after 4th step) with Paint, and got:

plate region

Then I did OCR (via tesseract, tess4j):

File imageFile = new File("preprocess/adaptive_threshold_AFTER_PAINT.png");
ITesseract instance = new Tesseract();
instance.setLanguage("eng");
instance.setTessVariable("tessedit_char_whitelist", "acekopxyABCEHKMOPTXY0123456789");
String result = instance.doOCR(imageFile); 
System.out.println(result);

and got (good enough?) result - "Y841ox EH" (almost true)

How can I detect and crop plate region after 4th step? Have I to make some changes (improvements) in 1-4 steps? Would like to see some example implemented via Java + OpenCV (not JavaCV).
Thanks in advance.

EDIT (thanks to @Abdul Fatir's answer) Well, I provide working (for me atleast) code sample (Netbeans+Java+OpenCV+Tess4j) for those who interested in this question. Code is not the best, but I made it just for studying.
http://pastebin.com/H46wuXWn (do not forget to put tessdata folder into your project folder)

Bemoan answered 18/5, 2016 at 14:8 Comment(3)
You could try analyzing the contours. However it might be more reliable to use a cascade classifier to locate the license plate (test your algorithm with a white car and see how it works). Deskew the plate so it's horizonal. You should also add an additional phase before tesseract -- segment the license plate into individual characters (vertical projection will probably work well given the quality of your image) and only feed those to tesseract..Stifling
Can you post the image after step 4 as well? I think you should be able to detect the plate-border by extracting contours and filter them on size and h/w-ratio. If you have the contour (since you know it is a rectangle, you can undo the projection transformation)Harborage
@RobAu, Yeah sure: i.imgur.com/chrNMYX.pngBemoan
A
13

Here's how I suggest you should do this task.

  1. Convert to Grayscale.
  2. Gaussian Blur with 3x3 or 5x5 filter.
  3. Apply Sobel Filter to find vertical edges.

    Sobel(gray, dst, -1, 1, 0)

  4. Threshold the resultant image to get a binary image.
  5. Apply a morphological close operation using suitable structuring element.
  6. Find contours of the resulting image.
  7. Find minAreaRect of each contour. Select rectangles based on aspect ratio and minimum and maximum area.
  8. For each selected contour, find edge density. Set a threshold for edge density and choose the rectangles breaching that threshold as possible plate regions.
  9. Few rectangles will remain after this. You can filter them based on orientation or any criteria you deem suitable.
  10. Clip these detected rectangular portions from the image after adaptiveThreshold and apply OCR.

a) Result after Step 5

Result after Step 5

b) Result after Step 7. Green ones are all the minAreaRects and the Red ones are those which satisfy the following criteria: Aspect Ratio range (2,12) & Area range (300,10000)

c) Result after Step 9. Selected rectangle. Criteria: Edge Density > 0.5

enter image description here

EDIT

For edge-density, what I did in the above examples is the following.

  1. Apply Canny Edge detector directly to input image. Let the cannyED image be Ic.
  2. Multiply results of Sobel filter and Ic. Basically, take an AND of Sobel and Canny images.
  3. Gaussian Blur the resultant image with a large filter. I used 21x21.
  4. Threshold the resulting image using OTSU's method. You'll get a binary image
  5. For each red rectangle, rotate the portion inside this rectangle (in the binary image) to make it upright. Loop through the pixels of the rectangle and count white pixels. (How to rotate?)

Edge Density = No. of White Pixels in the Rectangle/Total no. of Pixels in the rectangle

  1. Choose a threshold for edge density.

NOTE: Instead of going through steps 1 to 3, you can also use the binary image from step 5 for calculating edge density.

Alrich answered 19/5, 2016 at 8:58 Comment(3)
Than you for such detailed answer! I did everything that you described except step "c", which is "Edge density". Untill this step algorithm worked well - I playd a bit with thresholding and ratios (thanks to "Mastering OpenCV Chapter 5" github.com/MasteringOpenCV/code/blob/master/… in particular "verifySizes" function) and for some photos it worked good enough without Edge Density criteria. Can you explain how I can check RotatedRec (which is given by minAreaRect) for Edge Density criteria?Bemoan
Hi! Please check the edited answer. Please upvote and mark as the answer if it works for you. :)Alrich
Yeah, I skipped 1-4 steps (from Edit paragraph ) and forced to step 5. I cropped each obtained rectangle (usually there are 1-3 "possible" plates) from image ( adaptiveThreshold Mat object ). Then I count amout of white pixels (countNonZero) and total amount of pixels; got density ( >= ~0.6++ density is good) and needed rectangle. Tesseract did job well also (I think I'll segment each individual character on a plate for better recognition in further, like @Dan adviced).Bemoan
B
2

Actually OpenCV has pre-trained model specially for Russian license plates: haarcascade_russian_plate_number

Also there is open source ANPR project for Russian license plates: plate_recognition. It is not use tesseract, but it has quite good pre-trained neural network.

Bousquet answered 5/9, 2016 at 2:45 Comment(1)
Well, thanks for response. I've already seen this project - it is good, but a lot of C++ and QT (I am not good at it). I guess that cropping each symbol from plate (which is detected by cascade) and passing it to tesseract-engine can work too and it is easy to make with Java. I would like to interpeter that C++ project (or bind JNI) for Java, but I have no that much time nowdays.Bemoan
H
1
  • You find all connected components (the white areas) and determine their outline.
  • If you filter them based on size (as part of the image), ratio (width-height) and white/black ratio to retrieve candidate-plates.
  • Undo the transformation of the rectangle
  • Remove the bolts
  • Pass in image to the OCR engine.
Harborage answered 19/5, 2016 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.