I'm finding the contours for an image with digits and characters, for OCR. So, I need the contours to be sorted left to right, while going line to line, i.e. top to bottom. Right now, the contours aren't sorted that way.
For example, the contours for the above image is sorted randomly.
What I need is the sorting as D,o,y,o,u,k,n,o,w,s,o,m,e,o,n,e,r,.(dot),i(without dot),c,h...and so on. I've tried couple of methods where we first observe the y-coordinate and then use some keys and the x-coordinate. Like right now, I have the following sorting code. It works for the first 2 lines. Then in the 3rd line, the sorting somehow doesn't happen. The main problem seem to be in the letters such as i, j, ?, (dot), (comma), etc where the y axis of the (dot) varies, despite belonging to the same line. So what might be a good solution for this?
for ctr in contours:
if cv2.contourArea(ctr) > maxArea * areaRatio:
rect.append(cv2.boundingRect(cv2.approxPolyDP(ctr,1,True)))
#rect contains the contours
for i in rect:
x = i[0]
y = i[1]
w = i[2]
h = i[3]
if(h>max_line_height):
max_line_height = h
mlh = max_line_height*2
max_line_width = raw_image.shape[1] #width of the input image
mlw = max_line_width
rect = np.asarray(rect)
s = rect.astype( np.uint32 ) #prevent overflows
order= mlw*(s[:,1]/mlh)+s[:,0]
sort_order= np.argsort( order )
rect = rect[ sort_order ]