How to sort contours left to right, while going top to bottom, using Python and OpenCV
Asked Answered
N

2

2

I'm finding the contours for an image with digits and characters, for OCR. So, I need the contours to be sorted left to right, while going line to line, i.e. top to bottom. Right now, the contours aren't sorted that way.

picture

For example, the contours for the above image is sorted randomly.

What I need is the sorting as D,o,y,o,u,k,n,o,w,s,o,m,e,o,n,e,r,.(dot),i(without dot),c,h...and so on. I've tried couple of methods where we first observe the y-coordinate and then use some keys and the x-coordinate. Like right now, I have the following sorting code. It works for the first 2 lines. Then in the 3rd line, the sorting somehow doesn't happen. The main problem seem to be in the letters such as i, j, ?, (dot), (comma), etc where the y axis of the (dot) varies, despite belonging to the same line. So what might be a good solution for this?

for ctr in contours:    
    if cv2.contourArea(ctr) > maxArea * areaRatio: 
        rect.append(cv2.boundingRect(cv2.approxPolyDP(ctr,1,True)))

#rect contains the contours
for i in rect:
    x = i[0]
    y = i[1]
    w = i[2]
    h = i[3]

    if(h>max_line_height):
        max_line_height = h

mlh = max_line_height*2
max_line_width = raw_image.shape[1] #width of the input image
mlw = max_line_width
rect = np.asarray(rect)
s = rect.astype( np.uint32 ) #prevent overflows
order= mlw*(s[:,1]/mlh)+s[:,0]
sort_order= np.argsort( order )
rect = rect[ sort_order ]
Nickinickie answered 6/8, 2016 at 14:42 Comment(2)
pls provide a clear example of what isn't working in the 3rd line.Mousetail
The contours of the third line of the image is being sorted as stress.ed,hav..i,n,g. and so on. The dots are appearing randomly in places of other letters, causing other letters to fallout of the proper sorted position.Nickinickie
B
2

I like your trying to solve the problem with a single sorting. But as you said, the variation of y in each line might break your algorithm, plus, the max_line_height is something you probably have to tweak based on different inputs.

So instead, I would propose a slightly different algorithm, but with decent computational complexity. The idea is that, if you only look at all the boxes horizontally, all the boxes from line N+1 will never intersect with the boxes from line 1 to N, but they intersects with each other inside one line. So you can sort all the boxes by their y first, walk through them one by one and try to find 'breaking point' (grouping them into one line), then within each line, sort them by their x.

Here is a less Pythonic solution:

# sort all rect by their y
rect.sort(key=lambda b: b[1])
# initially the line bottom is set to be the bottom of the first rect
line_bottom = rect[0][1]+rect[0][3]-1
line_begin_idx = 0
for i in xrange(len(rect)):
    # when a new box's top is below current line's bottom
    # it's a new line
    if rect[i][1] > line_bottom:
        # sort the previous line by their x
        rect[line_begin_idx:i] = sorted(rect[line_begin_idx:i], key=lambda b: b[0])
        line_begin_idx = i
    # regardless if it's a new line or not
    # always update the line bottom
    line_bottom = max(rect[i][1]+rect[i][3]-1, line_bottom)
# sort the last line
rect[line_begin_idx:] = sorted(rect[line_begin_idx:], key=lambda b: b[0])

Now rect should be sorted in the way you want.

Bromo answered 10/8, 2016 at 6:3 Comment(0)
M
2

I used this method and it worked for me. In my case, there was 5 contours in each row

def x_cord_contour(contours):
    #Returns the X cordinate for the contour centroid
    M = cv2.moments(contours)
    return (int(M['m10']/M['m00']))
    
def y_cord_contour(contours):
    #Returns the Y cordinate for the contour centroid
    M = cv2.moments(contours)
    return (int(M['m01']/M['m00']))
    

# Sort by top to bottom using our y_cord_contour function
contours_top_to_bottom = sorted(questionCnts, key = y_cord_contour, reverse = False)





for (q, i) in enumerate(np.arange(0, len(contours_top_to_bottom), 5)):
    # sort the contours for the current question from left to right
    
    # As in my example every row contain 5 coutours so now i sorted them in row wise
    cnts = sorted(contours_top_to_bottom[i:i + 5], key = x_cord_contour, reverse = False)
    
    # loop over the sorted contours
    for (j, c) in enumerate(cnts):
        # construct a mask that reveals only the current contour
        #and do what ever you want to do
        #....#

please correct me if i m wrong

Matrilineage answered 27/7, 2020 at 18:18 Comment(1)
This assumes there will always be 5 contours per row. For a varying quantity of contours per row, this solution will not work.Windowlight

© 2022 - 2024 — McMap. All rights reserved.