Google vision Text Detection response to be line by line

Asked 3/10, 2017 at 15:27 Answered 25/5, 2020 at 19:49

I am using the Google vision api to perform text recognition on receipt images. I am getting some nice results returned but the format in which the return is quite unreliable. If there is a large gap between text the readout will print the line below instead of the line next to it.

For example, with the following Recipt Image i get the below response:

    4x Löwenbräu Original a 3,00 12,00 1
    8x Weissbier dunkel a 3,30 26,401
    3x Hefe-Weissbier a 3,30 9,90 1
    1x Saft 0,25
    1x Grosses Wasser
    1x Vegetarische Varia
    1x Gyros
    1x Baby Kalamari Gefu
    2x Gyros Folie
    1x Schafskäse Ofen
    1x Bifteki Metaxa
    1x Schweinefilet Meta
    1x St ifado
    1x Tee
    2,50 1
    2,40 1
    9,90 1
    8,90 1
    12,90
    a 9,9019,80 1
    6,90 1
    11,90 1
    13,90 1
    14,90 1
    2,10 1

Which starts of well and as expected but then becomes fairly un helpful when trying to connect prices to text etc. The ideal response would be as follows:

    4x Löwenbräu Original a 3,00 12,00 1
    8x Weissbier dunkel    a 3,30 26,401
    3x Hefe-Weissbier      a 3,30 9,90 1
    1x Saft 0,25                  2,50 1
    1x Grosses Wasser             2,40 1
    1x Vegetarische Varia         9,90 1
    1x Gyros                      8,90 1
    1x Baby Kalamari Gefu        12,90 1
    2x Gyros Folie         a 9,9019,80 1
    1x Schafskäse Ofen            6,90 1
    1x Bifteki Metaxa            11,90 1
    1x Schweinefilet Meta        13,90 1
    1x St ifado                  14,90 1
    1x Tee                        2,10 1

Or close to that.

Is there a formatting request you can add to the api to get different responses? I have had success when using tessereact where you can change the output format to achieve this result and was wondering if the vision api has something similar.

I understand the api returns letter coordinates which could be used but i was hoping not to have to go into that kind of depth.

Polybius answered 3/10, 2017 at 15:27 Comment(0)

You can add feature hints to your JSON request. For image of a receipt like this, DOCUMENT_TEXT_DETECTION give good results:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.sstatic.net/TRTXo.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

You can copy the above JSON and paste it into Request Body in the Try This API pane on the documentation page. Result:

4x LOwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3, 3026, 40 1
3x Hefe-Weissbier a 3,30990 1
1x Saft 0,25 2, 50 1
1x Grosses Wasser 2, 40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 !
2x Gyros Folie a 9,9019, 80 1
1x Schaf skäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x Stifado 14, 90 1
1x Tee 2, 10 1

Googie Vision is much less configurable than Tesseract at the moment. Because Google is behind both projects, guess which one gonna get higher priority in the future?

Jeer answered 3/10, 2017 at 19:45 Comment(2)

Amazing thank you so much! I wish they would make tesseract as an api i could call as its just to big to incorporate into an app. – Polybius 4/10, 2017 at 12:56

Hey, can you tell me how can I do this in python? – Selfimmolation 24/9, 2022 at 9:35

This might be a late answer but adding it for future reference. For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.

The following code does simple line segmentation based on the character polygon coordinates.

https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision

Moonshiner answered 16/1, 2018 at 10:20 Comment(0)

You can add feature hints to your JSON request. For image of a receipt like this, DOCUMENT_TEXT_DETECTION give good results:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.sstatic.net/TRTXo.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

You can copy the above JSON and paste it into Request Body in the Try This API pane on the documentation page. Result:

4x LOwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3, 3026, 40 1
3x Hefe-Weissbier a 3,30990 1
1x Saft 0,25 2, 50 1
1x Grosses Wasser 2, 40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 !
2x Gyros Folie a 9,9019, 80 1
1x Schaf skäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x Stifado 14, 90 1
1x Tee 2, 10 1

Googie Vision is much less configurable than Tesseract at the moment. Because Google is behind both projects, guess which one gonna get higher priority in the future?

Jeer answered 3/10, 2017 at 19:45 Comment(2)

Amazing thank you so much! I wish they would make tesseract as an api i could call as its just to big to incorporate into an app. – Polybius 4/10, 2017 at 12:56

Hey, can you tell me how can I do this in python? – Selfimmolation 24/9, 2022 at 9:35

Google Vision API has a method fullTextAnnotation which returns a full text string with \n specifying the end of the line, you can try that. Hope it works for you.

Epithalamium answered 25/5, 2020 at 19:49 Comment(0)

Recommended topics

Hot tags