You called an API to get information about text in your image.
The best way to think about response is as a composition of boxes (rectangles) on the image highlighting text areas.
Result-set contains values for multiple different levels.
You can check value of level
key to see what level box belongs to. Bellow are supported values:
- page
- block
- paragraph
- line
- word
Image can contain multiple blocks of the same type and these attributes used to define position of block in list and parents hierarchy - page_num
, block_num
, par_num
, line_num
, word_num
top
, width
, height
, left
values define box shape.
Let's take a look at sample see how it works.
Assume we have picture with 2 words on the same line.
For that picture tesseract returns 6 boxes:
1 for page, 1 for block, 1 for paragraph, 1 for line and 2 for words
This is the data you get:
- 'level': [1, 2, 3, 4, 5, 5]
- 'page_num': [1, 1, 1, 1, 1, 1]
- 'block_num': [0, 1, 1, 1, 1, 1]
- 'par_num': [0, 0, 1, 1, 1, 1]
- 'line_num': [0, 0, 0, 1, 1, 1]
- 'word_num': [0, 0, 0, 0, 1, 2]
Code below renders all level boxes on image:
d = pytesseract.image_to_data(image, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top']
[i], d['width'][i], d['height'][i])
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)