Merge the Bounding boxes near by into one

Asked 9/4, 2019 at 12:58 Answered 5/4, 2022 at 15:14

python python-3.x machine-learning deep-learning computer-vision

I am new in python and I am using Quickstart: Extract printed text (OCR) using the REST API and Python in Computer Vision for text detection in Sales Fliers.So this algorithm is given has a coordinates Ymin, XMax, Ymin, and Xmax and draw a bounding boxes for each line of text, it show in this next image.

but I want to group the texts that are close by to have a single delimited frame. so for the case of the above image it will have 2 bounding boxes containing the closest text.

The below code provide as a coordinates Ymin, XMax, Ymin, and Xmax and draw a bounding boxes for each line of text.

import requests
# If you are using a Jupyter notebook, uncomment the following line.
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from PIL import Image
from io import BytesIO

# Replace <Subscription Key> with your valid subscription key.
subscription_key = "f244aa59ad4f4c05be907b4e78b7c6da"
assert subscription_key

vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"

ocr_url = vision_base_url + "ocr"

# Set image_url to the URL of an image that you want to analyze.
image_url = "https://cdn-ayb.akinon.net/cms/2019/04/04/e494dce0-1e80-47eb-96c9-448960a71260.jpg"

headers = {'Ocp-Apim-Subscription-Key': subscription_key}
params  = {'language': 'unk', 'detectOrientation': 'true'}
data    = {'url': image_url}
response = requests.post(ocr_url, headers=headers, params=params, json=data)
response.raise_for_status()

analysis = response.json()

# Extract the word bounding boxes and text.
line_infos = [region["lines"] for region in analysis["regions"]]
word_infos = []
for line in line_infos:
    for word_metadata in line:
        for word_info in word_metadata["words"]:
            word_infos.append(word_info)
word_infos

# Display the image and overlay it with the extracted text.
plt.figure(figsize=(100, 20))
image = Image.open(BytesIO(requests.get(image_url).content))
ax = plt.imshow(image)
texts_boxes = []
texts = []
for word in word_infos:
    bbox = [int(num) for num in word["boundingBox"].split(",")]
    text = word["text"]
    origin = (bbox[0], bbox[1])
    patch  = Rectangle(origin, bbox[2], bbox[3], fill=False, linewidth=3, color='r')
    ax.axes.add_patch(patch)
    plt.text(origin[0], origin[1], text, fontsize=2, weight="bold", va="top")
#     print(bbox)
    new_box = [bbox[1], bbox[0], bbox[1]+bbox[3], bbox[0]+bbox[2]]
    texts_boxes.append(new_box)
    texts.append(text)
#     print(text)
plt.axis("off")
texts_boxes = np.array(texts_boxes)
texts_boxes

Output bounding boxes

array([[  68,   82,  138,  321],
       [ 202,   81,  252,  327],
       [ 261,   81,  308,  327],
       [ 364,  112,  389,  182],
       [ 362,  192,  389,  305],
       [ 404,   98,  421,  317],
       [  92,  421,  146,  725],
       [  80,  738,  134, 1060],
       [ 209,  399,  227,  456],
       [ 233,  399,  250,  444],
       [ 257,  400,  279,  471],
       [ 281,  399,  298,  440],
       [ 286,  446,  303,  458],
       [ 353,  394,  366,  429]]

But I want to merge then by close distances.

Gyatt answered 9/4, 2019 at 12:58 Comment(4)

you means to merge box A and B which A is complete included in B, or merge them if they have intersection? – Pokelogan 9/4, 2019 at 13:13

thank you for your reply, No exactly like that, I mean that merge the close Boxes into one more bigger , so this new box should contain the all close boxes. – Gyatt 9/4, 2019 at 14:46

so you want to merge boxes which close enough? I think you should define the rules. which situation should bind together, which situation should keep sepatorly. – Pokelogan 9/4, 2019 at 14:52

Exactly, the rule is that only boxes which have a close distances between them, should be merge. – Gyatt 9/4, 2019 at 15:24

Thank you @recnac your algorithms help me to solved it.

My solution was this. Generate a new box,merge the text boxes with close distances for to get a new boxes. In which have a close texts.

#Distance definition  between text to be merge
dist_limit = 40

#Copy of the text and object arrays
texts_copied = copy.deepcopy(texts)
texts_boxes_copied = copy.deepcopy(texts_boxes)


#Generate two text boxes a larger one that covers them
def merge_boxes(box1, box2):
    return [min(box1[0], box2[0]), 
         min(box1[1], box2[1]), 
         max(box1[2], box2[2]),
         max(box1[3], box2[3])]



#Computer a Matrix similarity of distances of the text and object
def calc_sim(text, obj):
    # text: ymin, xmin, ymax, xmax
    # obj: ymin, xmin, ymax, xmax
    text_ymin, text_xmin, text_ymax, text_xmax = text
    obj_ymin, obj_xmin, obj_ymax, obj_xmax = obj

    x_dist = min(abs(text_xmin-obj_xmin), abs(text_xmin-obj_xmax), abs(text_xmax-obj_xmin), abs(text_xmax-obj_xmax))
    y_dist = min(abs(text_ymin-obj_ymin), abs(text_ymin-obj_ymax), abs(text_ymax-obj_ymin), abs(text_ymax-obj_ymax))

    dist = x_dist + y_dist
    return dist

#Principal algorithm for merge text 
def merge_algo(texts, texts_boxes):
    for i, (text_1, text_box_1) in enumerate(zip(texts, texts_boxes)):
        for j, (text_2, text_box_2) in enumerate(zip(texts, texts_boxes)):
            if j <= i:
                continue
            # Create a new box if a distances is less than disctance limit defined 
            if calc_sim(text_box_1, text_box_2) < dist_limit:
            # Create a new box  
                new_box = merge_boxes(text_box_1, text_box_2)            
             # Create a new text string 
                new_text = text_1 + ' ' + text_2

                texts[i] = new_text
                #delete previous text 
                del texts[j]
                texts_boxes[i] = new_box
                #delete previous text boxes
                del texts_boxes[j]
                #return a new boxes and new text string that are close
                return True, texts, texts_boxes

    return False, texts, texts_boxes


need_to_merge = True

#Merge full text 
while need_to_merge:
    need_to_merge, texts_copied, texts_boxes_copied = merge_algo(texts_copied, texts_boxes_copied)

texts_copied

Gyatt answered 15/4, 2019 at 15:27 Comment(1)

Thank you so much, it's working super fine. If you know any other resources, please share. Still some boxes are not merging to me. – Chemmy 15/11, 2022 at 4:15

you can check the boudary of two boxes (x_min, x_max, y_min, y_max), if difference is less than close_dist, they should be merged to a new box. then continously do this, in two for loop:

from itertools import product

close_dist = 20

# common version
def should_merge(box1, box2):
    for i in range(2):
        for j in range(2):
            for k in range(2):
                if abs(box1[j * 2 + i] - box2[k * 2 + i]) <= close_dist:
                    return True, [min(box1[0], box2[0]), min(box1[1], box2[1]), max(box1[2], box2[2]),
                                  max(box1[3], box2[3])]
    return False, None


# use product, more concise
def should_merge2(box1, box2):
    a = (box1[0], box1[2]), (box1[1], box1[3])
    b = (box2[0], box2[2]), (box2[1], box2[3])

    if any(abs(a_v - b_v) <= close_dist for i in range(2) for a_v, b_v in product(a[i], b[i])):
        return True, [min(*a[0], *b[0]), min(*a[1], *b[1]), max(*a[0], *b[0]), max(*a[1], *b[1])]

    return False, None

def merge_box(boxes):
    for i, box1 in enumerate(boxes):
        for j, box2 in enumerate(boxes[i + 1:]):
            is_merge, new_box = should_merge(box1, box2)
            if is_merge:
                boxes[i] = None
                boxes[j] = new_box
                break

    boxes = [b for b in boxes if b]
    print(boxes)

test code:

boxes = [[68, 82, 138, 321],
         [202, 81, 252, 327],
         [261, 81, 308, 327],
         [364, 112, 389, 182],
         [362, 192, 389, 305],
         [404, 98, 421, 317],
         [92, 421, 146, 725],
         [80, 738, 134, 1060],
         [209, 399, 227, 456],
         [233, 399, 250, 444],
         [257, 400, 279, 471],
         [281, 399, 298, 440],
         [286, 446, 303, 458],
         [353, 394, 366, 429]]

print(merge_box(boxes))

output:

[[286, 394, 366, 458], [261, 81, 421, 327], [404, 98, 421, 317], [80, 738, 134, 1060], [353, 394, 366, 429]]

can not do visiual test, please test for me.

Hope that helps you, and comment if you have further questions. : )

Pokelogan answered 9/4, 2019 at 15:54 Comment(2)

Brother, I thank you very much for your response. I have tried your code and it works well, but when using a large sales flyer, your algorithm creates a delimited box where there is no text. I would like you to review this image where I drew some delimited green boxes on the text with close distances, that is exactly the result I am trying to obtain. i.ibb.co/c1ttrkb/Capture.png because as I explained before, my algorithm detects the text by part and I want to merge in a single bound box the texts with close distances – Gyatt 10/4, 2019 at 14:27

Hm, if I apply your solution, a lot of bboxes go missing: imgur.com/a/mv9HGaW – Geneva 30/4, 2021 at 17:38

You can use openCV and Apply dilation and blackhat transforms to process the image before running your code

Revet answered 15/4, 2019 at 5:33 Comment(1)

Note that this is not a viable solution if you want to reduce a list of contours while needing the precision of undilated diffing. – Geneva 2/5, 2021 at 2:52

Made an easy-to-read solution:

contours = get_contours(frame)
boxes = [cv2.boundingRect(c) for c in contours]
boxes = merge_boxes(boxes, x_val=40, y_val=20) # Where x_val and y_val are axis thresholds

def get_contours(frame):  # Returns a list of contours
    contours = cv2.findContours(frame, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = imutils.grab_contours(contours)
    return contours


def merge_boxes(boxes, x_val, y_val):
    size = len(boxes)
    if size < 2:
        return boxes

    if size == 2:
        if boxes_mergeable(boxes[0], boxes[1], x_val, y_val):
            boxes[0] = union(boxes[0], boxes[1])
            del boxes[1]
        return boxes

    boxes = sorted(boxes, key=lambda r: r[0])
    i = size - 2
    while i >= 0:
        if boxes_mergeable(boxes[i], boxes[i + 1], x_val, y_val):
            boxes[i] = union(boxes[i], boxes[i + 1])
            del boxes[i + 1]
        i -= 1
    return boxes


def boxes_mergeable(box1, box2, x_val, y_val):
    (x1, y1, w1, h1) = box1
    (x2, y2, w2, h2) = box2
    return max(x1, x2) - min(x1, x2) - minx_w(x1, w1, x2, w2) < x_val \
        and max(y1, y2) - min(y1, y2) - miny_h(y1, h1, y2, h2) < y_val


def minx_w(x1, w1, x2, w2):
    return w1 if x1 <= x2 else w2


def miny_h(y1, h1, y2, h2):
    return h1 if y1 <= y2 else h2


def union(a, b):
    x = min(a[0], b[0])
    y = min(a[1], b[1])
    w = max(a[0] + a[2], b[0] + b[2]) - x
    h = max(a[1] + a[3], b[1] + b[3]) - y
    return x, y, w, h

Kiblah answered 5/4, 2022 at 15:14 Comment(1)

for new comers, check solution at #66490874 – Loopy 14/7, 2022 at 8:23

-6

Hi i think your problem will be solved with easyocr

import easyocr

reader = easyocr.Reader(['en']) 

result = reader.readtext('image_name.jpg',paragraph=True)

print(result)

Ton answered 9/4, 2019 at 12:58 Comment(1)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Cod 17/11, 2021 at 18:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags