Text Detection: Getting Bounding boxes
Asked Answered
H

2

7

I have a black and white image, of a text document. I want to be able to get a list of the bounding boxes for each character. I have attempted an algorithm myself, but it takes excessively long, and is only somewhat successful. Are there any python libraries I can use to find the bounding box? I've been looking into opencv but the documentation is hard to follow. And in this tutorial I can't even decipher whether the bounding boxes were found because I can't easily find what the functions actually do.

Hallucination answered 24/4, 2018 at 11:15 Comment(1)
The LINK provided by you involves watershed algorithm. This is not required when you want to separate clear text. You better try THISUnthread
A
9

You can use boundingRect(). Make sure your image background is black and text in image is white.Using this code you can draw rectangles around text in your image. To get a list of every rectangle please add respective code segment as per your requirement.

import cv2
img = cv2.imread('input.png', 0) 
cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU,img)

image, contours, hier = cv2.findContours(img, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
for c in contours:
    # get the bounding rect
    x, y, w, h = cv2.boundingRect(c)
    # draw a white rectangle to visualize the bounding rect
    cv2.rectangle(img, (x, y), (x + w, y + h), 255, 1)

cv2.drawContours(img, contours, -1, (255, 255, 0), 1)

cv2.imwrite("output.png",img)
Apicella answered 24/4, 2018 at 14:35 Comment(0)
G
3

I would suggest that you look into the boundingRect() function in the openCV library:

https://docs.opencv.org/2.4/doc/tutorials/imgproc/shapedescriptors/bounding_rects_circles/bounding_rects_circles.html

The documentation is based on cpp but it can be implemented in python as well.

Gathers answered 24/4, 2018 at 11:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.