Finding contours with lines of text in OpenCV
Asked Answered
S

1

8

I am writing a text recognition program, and I have a problem with sorting contours. The program works fine for one line of text, but when it comes to the whole block of text my program doesn't detect the lines of text like 80% of the time. What would be a really efficient way to extract a line of text and then all of the other lines (one at a time)?

What I want to achieve:

enter image description here

Symbolist answered 9/6, 2018 at 19:18 Comment(0)
D
27

There are a sequence of steps to achieve this:

  1. Find the optimum threshold to binarize your image. I used Otsu threshold.
  2. Find the suitable morphological operation that will form a single region along the horizontal direction. Choose a kernel that is larger in width than the height.
  3. Draw bounding boxes over the resulting contours

UPDATE

Here is the implementation:

x = 'C:/Users/Desktop/text.jpg' 

img = cv2.imread(x)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  

#--- performing Otsu threshold ---
ret,thresh1 = cv2.threshold(gray, 0, 255,cv2.THRESH_OTSU|cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh1)

enter image description here

#--- choosing the right kernel
#--- kernel size of 3 rows (to join dots above letters 'i' and 'j')
#--- and 10 columns to join neighboring letters in words and neighboring words
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 3))
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)
cv2.imshow('dilation', dilation)

enter image description here

#---Finding contours ---
_, contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

im2 = img.copy()
for cnt in contours:
        x, y, w, h = cv2.boundingRect(cnt)
        cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('final', im2)

enter image description here

Darwin answered 9/6, 2018 at 19:47 Comment(6)
@Mithor I'm sorry I only have it in pythonDarwin
Pretty slick. I like that trick of dilating with a wide kernel to join letters and words.Amalgamation
Thank you. Saved me a lot of timeCracow
is there any way to extract this text in string formatLitotes
@Rudrashah You can perform OCR on the extracted portion to get the result in string format.Darwin
Aint it a neat solution!Wobble

© 2022 - 2024 — McMap. All rights reserved.