Detect text area in an image using python and opencv

Asked 12/6, 2016 at 6:12 Answered 27/9, 2019 at 3:18

python opencv image-processing computer-vision ocr

I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it. Like shown in the example image below.

I am new to image processing so any idea how to do this will be appreciated.

Domino answered 12/6, 2016 at 6:12 Comment(8)

If possible, use OpenCV 3.1 and use the scene text detection feature. – Stepmother 12/6, 2016 at 6:15

@Stepmother I cannot upgrade it since there are other components of the project. – Domino 12/6, 2016 at 7:53

You have to look for the color in the image. But it having the similar color in your image so it might be difficult. If you are only looking for the text inside that there is a library called 'tesseract' – Ceyx 12/6, 2016 at 9:33

Are you looking for a "tool-like" solution? (A ready made function from a module or something like that) or would you be alright with doing it from first principles? It is relatively easy to do just that (detect text) in conditions such as those you describe here. Also, you missed the word "LIN" in the cupboard at the north-west of the large bedroom. Would you like to be able to catch those letters as well? – Glairy 12/6, 2016 at 10:41

@Glairy Doing it from first principles is the way that I want.I only want to detect the marked words – Domino 12/6, 2016 at 11:10

And this is THE image or are there other cases you might need to cover? Is it possible to upload a few representative cases? – Glairy 12/6, 2016 at 11:28

@Glairy Simply what I want to do is to detect the text areas of blue prints of houses like this. – Domino 12/6, 2016 at 12:8

I presume you realise the text is black and everything else is grey-blue? Is this always the case? If so, the answer is simple. – Gesso 25/7, 2016 at 16:51

There are multiple ways to go about detecting text in an image.

I recommend looking at this question here, for it may answer your case as well. Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem). The solutions here may not work for your case, but I recommend trying them out.

If I were to go about this I would do the following process:

Prep your image: If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black. I would first white out all content that is not black (or already white). Doing so will leave only the black text left.

# must import if working with opencv in python
import numpy as np
import cv2

# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lower_bound = np.array([0,0,lower_val])
    upper_bound = np.array([255,255,upper_val])
    mask = cv2.inRange(gray, lower_bound, upper_bound)
    return cv2.bitwise_and(gray, gray, mask = mask)

Now that all you have is the black text the goal is to get those boxes. As stated before, there are different ways of going about this.

Stroke Width Transform (SWT)

The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform " by Boris Epshtein, Eyal Ofek, and Yonatan Wexler. To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code. You can still use the code above to remove the blueprint design though, and that may help the overall performance of the swt algorithm.

Here is a c library that implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete. Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.

The library I linked is CCV. It is a library that is meant to be used in your applications, not recreate algorithms. So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments. Still, useful to know it exists if you don't want to code the algorithm yourself.

Home Brewed Non-SWT Method

If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels. Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels. Then simply find each contours' bounding box and you are done.

# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
    retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
    kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
    eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
    y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
    return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)

# finds contours of eroded image
def prep( img, kern_size = 3 ):    
    img = erode( img, kern_size )
    retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) #   invert colors for findContours
    return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image

# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
    prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
    while len(contours) > num_of_labels:
        kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
        previous = (prep_img, contours, hierarchy)
        processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
    if len(contours) < num_of_labels:
        return (processed_img, contours, hierarchy)
    else:
        return previous

# finds bounding boxes of all contours
def bounding_box(contours):
    bBox = []
    for curve in contours:
        box = cv2.boundingRect(curve)
    bBox.append(box)
    return bBox

The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image. To avoid this make regions of interest via your new found boxes and trim the white space. Then save that roi's shape as your new box.

Perhaps you have no way of knowing how many labels will be in the image. If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.

Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.

After you found your boxes, simply use those boxes with respect to the original image and you will be done.

Scene Text Detection Module in OpenCV 3

As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end. Documentation for the scene text detection can be found with a simple google search.

The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module. The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.

Links:

Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module. The same with the SWT algorithm. I just got into this stuff this past few months, but as I learn more I will edit this answer.

Pannikin answered 24/7, 2016 at 16:38 Comment(1)

I have been reading about this and there are a couple of implementations on Python of SWT that might be useful for you: [1] github.com/marrrcin/swt-python [2] github.com/mypetyak/StrokeWidthTransform – Cuff 8/2, 2018 at 10:20

Here's a simple image processing approach using only thresholding and contour filtering:

Obtain binary image. Load image, convert to grayscale, Gaussian blur, and adaptive threshold
Combine adjacent text. We create a rectangular structuring kernel then dilate to form a single contour
Filter for text contours. We find contours and filter using contour area. From here we can draw the bounding box with cv2.rectangle()

Using this original input image (removed red lines)

After converting the image to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image

Next we dilate to combine the text into a single contour

From here we find contours and filter using a minimum threshold area (in case there was small noise). Here's the result

If we wanted to, we could also extract and save each ROI using Numpy slicing

Code

import cv2

# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)

# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(thresh, kernel, iterations=4)

# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    if area > 10000:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
        # ROI = image[y:y+h, x:x+w]
        # cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        # ROI_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()

Sculpture answered 27/9, 2019 at 3:18 Comment(2)

I would like to add a small, but the important point for future readers when working with these kinds of tasks. Be sure you keep dpi in mind too. The same image with 300 dpi might not give the same results as 72dpi one. – Leicester 16/4, 2020 at 10:14

@PrameshBajracharya yes this depending on the size of your image, you may have to adjust the contour area threshold value or change the dilate kernel sizes. Unfortunately, there is no one solution for all images when dealing with image processing to extract objects – Sculpture 23/10, 2020 at 21:50

Stroke Width Transform (SWT)

Home Brewed Non-SWT Method

Scene Text Detection Module in OpenCV 3

Recommended topics

Hot tags