What does the key values of the dictionary output of the following code in tesseract signify?
Asked Answered
V

1

5

I am using the following code in python:

I am getting the following key values in the dictionary:

'block_num' 'conf'  'level' 'line_num'  'page_num'  'par_num', 'text', 'top', 'width', 'word_num', 'height, 'left'.

What do these key values signify

I tried to find these in the official documentation of tesseract. If you have some links which explain the same please do provide or explain it.

    img = cv2.imread('../Image_documents/6.png')
    d = pytesseract.image_to_data(img, output_type=Output.DICT)
    pprint.pprint(d)
Veronica answered 21/6, 2019 at 7:38 Comment(0)
F
11

You called an API to get information about text in your image.

The best way to think about response is as a composition of boxes (rectangles) on the image highlighting text areas.

Result-set contains values for multiple different levels.

You can check value of level key to see what level box belongs to. Bellow are supported values:

  1. page
  2. block
  3. paragraph
  4. line
  5. word

Image can contain multiple blocks of the same type and these attributes used to define position of block in list and parents hierarchy - page_num, block_num, par_num, line_num, word_num

top, width, height, left values define box shape.

Let's take a look at sample see how it works.

Assume we have picture with 2 words on the same line.

For that picture tesseract returns 6 boxes: 1 for page, 1 for block, 1 for paragraph, 1 for line and 2 for words

This is the data you get:

  • 'level': [1, 2, 3, 4, 5, 5]
  • 'page_num': [1, 1, 1, 1, 1, 1]
  • 'block_num': [0, 1, 1, 1, 1, 1]
  • 'par_num': [0, 0, 1, 1, 1, 1]
  • 'line_num': [0, 0, 0, 1, 1, 1]
  • 'word_num': [0, 0, 0, 0, 1, 2]

Code below renders all level boxes on image:

d = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top']
                    [i], d['width'][i], d['height'][i])
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
Fairy answered 30/11, 2019 at 1:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.