What does the key values of the dictionary output of the following code in tesseract signify?

You called an API to get information about text in your image.

The best way to think about response is as a composition of boxes (rectangles) on the image highlighting text areas.

Result-set contains values for multiple different levels.

You can check value of level key to see what level box belongs to. Bellow are supported values:

page
block
paragraph
line
word

Image can contain multiple blocks of the same type and these attributes used to define position of block in list and parents hierarchy - page_num, block_num, par_num, line_num, word_num

top, width, height, left values define box shape.

Let's take a look at sample see how it works.

Assume we have picture with 2 words on the same line.

For that picture tesseract returns 6 boxes: 1 for page, 1 for block, 1 for paragraph, 1 for line and 2 for words

This is the data you get:

'level': [1, 2, 3, 4, 5, 5]
'page_num': [1, 1, 1, 1, 1, 1]
'block_num': [0, 1, 1, 1, 1, 1]
'par_num': [0, 0, 1, 1, 1, 1]
'line_num': [0, 0, 0, 1, 1, 1]
'word_num': [0, 0, 0, 0, 1, 2]

Code below renders all level boxes on image:

d = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top']
                    [i], d['width'][i], d['height'][i])
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

Recommended topics

Hot tags