OpenCV - Removal of noise in image
Asked Answered
E

9

24

I have an image here with a table.. In the column on the right the background is filled with noise

How to detect the areas with noise? I only want to apply some kind of filter on the parts with noise because I need to do OCR on it and any kind of filter will reduce the overall recognition

And what kind of filter is the best to remove the background noise in the image?

As said I need to do OCR on the image

enter image description here

Everywhere answered 16/2, 2017 at 11:15 Comment(5)
You might wanna remove "personal" information from your sample image.Incisor
Could you tell the language in the document image?Kelula
its danish.....Everywhere
All the documents are in non-color basis visually? Would you accept solution that only remove the noise to improve OCR accuracy? As that would be easier.Kelula
All images are black/white.. Yes I need a solution that removes the noise but in the same time doesn't reduce the recognition accuracy on the rest of the text in the image (without noise)Everywhere
I
21

I tried some filters/operations in OpenCV and it seems to work pretty well.

Step 1: Dilate the image -

kernel = np.ones((5, 5), np.uint8)
cv2.dilate(img, kernel, iterations = 1)

Dilated Image

As you see, the noise is gone but the characters are very light, so I eroded the image.

Step 2: Erode the image -

kernel = np.ones((5, 5), np.uint8)
cv2.erode(img, kernel, iterations = 1)

Eroded dilated image

As you can see, the noise is gone however some characters on the other columns are broken. I would recommend running these operations on the noisy column only. You might want to use HoughLines to find the last column. Then you can extract that column only, run dilation + erosion and replace this with the corresponding column in the original image. Additionally, dilation + erosion is actually an operation called closing. This you could call directly using -

cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

As @Ermlg suggested, medianBlur with a kernel of 3 also works wonderfully.

cv2.medianBlur(img, 3)

Median Blur

Alternative Step

As you can see all these filters work but it is better if you implement these filters only in the part where the noise is. To do that, use the following:

edges = cv2.Canny(img, 50, 150, apertureSize = 3) // img is gray here
lines = cv2.HoughLinesP(edges, 1, np.pi / 180, 100, 1000, 50) // last two arguments are minimum line length and max gap between two lines respectively.
for line in lines: 
    for x1, y1, x2, y2 in line: 
        print x1, y1
// This gives the start coordinates for all the lines. You should take the x value which is between (0.75 * w, w) where w is the width of the entire image. This will give you essentially **(x1, y1) = (1896, 766)**

Then, you can extract this part only like :

extract = img[y1:h, x1:w] // w, h are width and height of the image

Extracted image

Then, implement the filter (median or closing) in this image. After removing the noise, you need to put this filtered image in place of the blurred part in the original image. image[y1:h, x1:w] = median

This is straightforward in C++ :

extract.copyTo(img, new Rect(x1, y1, w - x1, h - y1))

Final Result with alternate method

Final Result Hope it helps!

Ignazio answered 1/5, 2017 at 12:46 Comment(6)
Do you have a code example that can detect the last column with the noise.. Like you say yourself.. No matter what filter you apply - the text will always be harder to recognize.. So I only want to apply the filter on the part of the image with noiseEverywhere
Yes I do, give me sometime and I will add it to the answer.Ignazio
Added the method to detect the last column. Let me know if it answers your question.Ignazio
Very niice. :) I will look into it the coming weekEverywhere
the second option did it!!Honeysweet
Thanks, but I think if you have a black background with white text, then the steps should be erode -> dilate.Crossbar
K
9

My solution is based on thresholding to get the resulted image in 4 steps.

  1. Read image by OpenCV 3.2.0.
  2. Apply GaussianBlur() to smooth image especially the region in gray color.
  3. Mask the image to change text to white and the rest to black.
  4. Invert the masked image to black text in white.

The code is in Python 2.7. It can be changed to C++ easily.

import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline 

# read Danish doc image 
img = cv2.imread('./imagesStackoverflow/danish_invoice.png')

# apply GaussianBlur to smooth image
blur = cv2.GaussianBlur(img,(5,3), 1) 

# threshhold gray region to white (255,255, 255) and sets the rest to black(0,0,0)
mask=cv2.inRange(blur,(0,0,0),(150,150,150))

# invert the image to have text black-in-white
res = 255 - mask

plt.figure(1)
plt.subplot(121), plt.imshow(img[:,:,::-1]), plt.title('original') 
plt.subplot(122), plt.imshow(blur, cmap='gray'), plt.title('blurred')
plt.figure(2)
plt.subplot(121), plt.imshow(mask, cmap='gray'), plt.title('masked')
plt.subplot(122), plt.imshow(res, cmap='gray'), plt.title('result')
plt.show()

The following is the plotted images by the code for reference.

enter image description here

Here is the result image at 2197 x 3218 pixels.

enter image description here

Kelula answered 1/5, 2017 at 15:21 Comment(2)
this still apply the filter on the whole image.. I need a solution which ONLY apply the filter on the part with noiseEverywhere
@Everywhere OK. Is the noise region fixed and known? Like the sample image you attached? If not, any more sample documents to show?Kelula
T
3

As I know the median filter is the best solution to reduce noise. I would recommend to use median filter with 3x3 window. See function cv::medianBlur().

But be careful when use any noise filtration simultaneously with OCR. Its can lead to decreasing of recognition accuracy.

Also I would recommend to try using pair of functions (cv::erode() and cv::dilate()). But I'm not shure that it will best solution then cv::medianBlur() with window 3x3.

Tomlinson answered 16/2, 2017 at 11:21 Comment(2)
If noise filtration could lead to poor OCR, could you then detect the areas where the noise is located (if there is any noise) and only apply the filter there?Everywhere
@Everywhere It is not easy to separate noise and fine structure of characters. But in you case I think it is not a problem because font is enough big compared to noise.Tomlinson
P
2

I would go with median blur (probably 5*5 kernel).

if you are planning to apply OCR the image. I would advise you to the following:

  1. Filter the image using Median Filter.
  2. Find contours in the filtered image, you will get only text contours (Call them F).
  3. Find contours in the original image (Call them O).
  4. isolate all contours in O that have intersection with any contour in F.

Faster solution:

  1. Find contours in the original image.
  2. Filter them based on size.
Pechora answered 16/2, 2017 at 11:34 Comment(0)
B
2

Result:

enter image description here

Bootstrap answered 4/5, 2017 at 16:11 Comment(0)
H
1

If you are very worried of removing pixels that could hurt your OCR detection. Without adding artefacts ea be as pure to the original as possible. Then you should create a blob filter. And delete any blobs that are smaller then n pixels or so.

Not going to write code, but i know this works great as i use this myself, though i dont use openCV (i wrote my own multithreaded blobfilter out of speed reasons). And sorry but i cannot share my code here. Just describing how to do it.

Hurling answered 4/5, 2017 at 14:39 Comment(0)
B
1

If processing time is not an issue, a very effective method in this case would be to compute all black connected components, and remove those smaller than a few pixels. It would remove all the noisy dots (apart those touching a valid component), but preserve all characters and the document structure (lines and so on).

The function to use would be connectedComponentWithStats (before you probably need to produce the negative image, the threshold function with THRESH_BINARY_INV would work in this case), drawing white rectangles where small connected components where found.

In fact, this method could be used to find characters, defined as connected components of a given minimum and maximum size, and with aspect ratio in a given range.

Bootstrap answered 5/5, 2017 at 8:14 Comment(2)
A i sugested earlierHurling
@user3800527 True, I missed that. My answer adds some hints for an opencv implementation.Bootstrap
D
1

I had already faced the same issue and got the best solution. Convert source image to grayscale image and apply fastNlMeanDenoising function and then apply threshold.

Like this -

fastNlMeansDenoising(gray,dst,3.0,21,7);
threshold(dst,finaldst,150,255,THRESH_BINARY);

ALSO use can adjust threshold accorsing to your background noise image. eg- threshold(dst,finaldst,200,255,THRESH_BINARY);

NOTE - If your column lines got removed...You can take a mask of column lines from source image and can apply to the denoised resulted image using BITWISE operations like AND,OR,XOR.

Dennadennard answered 9/3, 2018 at 7:14 Comment(0)
F
-2

Try thresholding the image like this. Make sure your src is in grayscale. This method will only retain the pixels which are between 150 and 255 intensity.

threshold(src, output, 150, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);

You might want to invert the image as you are trying to negate the gray pixels. After the operation, invert it again to get your desired result.

Featured answered 20/4, 2017 at 9:42 Comment(1)
If you look carefully at the pixels in the input image, you'll see that the input here is already a binary image, with pixels either at 0 or 255Bootstrap

© 2022 - 2024 — McMap. All rights reserved.