Not able to understand coordinate in extracted document using OCR engine tesseract

Asked 31/8, 2013 at 16:38 Answered 14/3, 2018 at 17:16

I have extracted a image document from tesseract and It has extracted successful. But I am not able to understand coordinate of extracted document.

Problem description: -

It showing coordinates but let me know that are these coordinates representing pixel or something else. These are in four like title="bbox 10 13 43 46" , so what is 10, 13 43 and 46. What position they are representing

complete code after extracting

   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>
</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "D:\ABC.tif"; bbox 0 0 464 101'>
    <div class='ocr_carea' id='block_1_1' title="bbox 10 13 330 55">
    <p 1class='ocr_par'>
        <span class='ocr_line' id='line_1_1' title="bbox 10 13 330 55">
            <span class='ocr_word' id='word_1_1' title="bbox 10 13 43 46">
                <span class='ocrx_word' id='xword_1_1' title="x_wconf -1"><strong>hi</strong></span>
            </span> 
            <span class='ocr_word' id='word_1_2' title="bbox 148 13 268 47">
                <span class='ocrx_word' id='xword_1_2' title="x_wconf -1"><strong>whats</strong></span>
            </span> 
            <span class='ocr_word' id='word_1_3' title="bbox 283 22 330 55">
                <span class='ocrx_word' id='xword_1_3' title="x_wconf -1"><strong>up</strong></span>
            </span>
        </span>
    </p>
    </div>
</div>
</body>
</html>

Incommodious answered 31/8, 2013 at 16:38 Comment(1)

can you show the image you enter? – Kathrinekathryn 31/8, 2013 at 17:16

Well for anybody who still is wondering how the coordinate system is working, i finally found it and this is like

10 13 43 46 startx, starty, endx, endy

if you want to find width and height of the word that would be

width = endx - startx, height = endy - starty

split the string with ' ' and then eliminate bbox and there you go..

Eczema answered 18/2, 2016 at 15:40 Comment(0)

Maybe this will help someone in the future. I think the image speaks for itself. You can compute the height or top distance (for css) from those values (eg. height = y1-y0)

Selfassertion answered 14/3, 2018 at 17:16 Comment(1)

Except the y axis is reversed, as in most graphical applications, github.com/kba/hocr-spec/issues/34#issuecomment-252418295 – Gareri 22/10, 2020 at 21:44

These numbers should show be the position of the corner of a box ( a rectangle) in wich there is one word.

That is the hocr protocol.

according to your document tesseract recognize the sentence "hi whats up"

Kathrinekathryn answered 31/8, 2013 at 17:18 Comment(3)

Please let me know the position of these word. – Incommodious 1/9, 2013 at 4:6

Are they represent in pixel with position Left, Top, Right, Bottom ? – Incommodious 1/9, 2013 at 4:30

first link on wikipedia here. I gave you a link and you didn't use it. – Kathrinekathryn 1/9, 2013 at 8:45

Recommended topics

Hot tags