Google Cloud Vision - Numbers and Numerals OCR
Asked Answered
C

2

17

I've been trying to implement an OCR program with Python that reads numbers with a specific format, XXX-XXX. I used Google's Cloud Vision API Text Recognition, but the results were unreliable. Out of 30 high-contrast 1280 x 1024 bmp images, only a handful resulted in the correct output, or at least included the correct output in the results. The program tends to omit some numbers, output in non-English languages or sneak in a few special characters.

The goal is to at least output the correct numbers consecutively, doesn't matter if the results are sprinkled with other junk. Is there a way to help the program recognize numbers better, for example limit the results to a specific format, or to numbers only?

Centralize answered 16/9, 2016 at 22:6 Comment(0)
A
6

At this moment it is not possible to add constraints or to give a specific expected number format to Vision API requests, as mentioned here (by the Project Manager of Cloud Vision API).

You can also check all the possible request parameters (in the API reference), none indicating anything to specify number format. Currently only options to:

  • latLongRect: specify location of the image
  • languageHints: indicating the expected language for text_detection (list of supported languages here)

I assume you already checked out the multiple responses (with different included image regions) to see if you could reconstruct the text using the location of different digits?

Note that the Vision API and text_detection is not optimized for your data specifically, if you would have a lot of annotated data, it is also an option to actually build your own model using Tensorflow. This blogpost explains a system setup to detect number plates (with a specific number format). All the code is available on Github and the problem seems very related to yours.

Addy answered 24/9, 2016 at 20:33 Comment(0)
H
10

I am unable to tell you why this works, perhaps it has to do with how the language is read, o vs 0, l vs 1, etc. But whenever I use OCR and I am specifically looking for numbers, I have read to set the detection language to "Korean". It works exceptionally well for me and has influenced the accuracy greatly.

Henbit answered 1/10, 2016 at 17:43 Comment(1)
I can confirm this - using Korean also improves number OCR for the OCR.space api.Namnama
A
6

At this moment it is not possible to add constraints or to give a specific expected number format to Vision API requests, as mentioned here (by the Project Manager of Cloud Vision API).

You can also check all the possible request parameters (in the API reference), none indicating anything to specify number format. Currently only options to:

  • latLongRect: specify location of the image
  • languageHints: indicating the expected language for text_detection (list of supported languages here)

I assume you already checked out the multiple responses (with different included image regions) to see if you could reconstruct the text using the location of different digits?

Note that the Vision API and text_detection is not optimized for your data specifically, if you would have a lot of annotated data, it is also an option to actually build your own model using Tensorflow. This blogpost explains a system setup to detect number plates (with a specific number format). All the code is available on Github and the problem seems very related to yours.

Addy answered 24/9, 2016 at 20:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.