Google Vision API does not recognize single digits
Asked Answered
G

1

17

I have a project that make use of Google Vision API DOCUMENT_TEXT_DETECTION in order to extract text from document images.

Often the API has troubles in recognizing single digits, as you can see in this image:

enter image description here

I suppose that the problem could be related to some algorithm of noise removal, that recognizes isolated single digits as noise. Is there a way to improve Vision response in these situations? (for example managing noise threshold or others parameters)

At other times Vision confuses digits with letters:

enter image description here

But if I specify as parameter languageHints = 'en' or 'mt' these digits are ignored by the ocr. Is there a way to force the recognition of digits or latin characters?

Gull answered 20/3, 2018 at 14:12 Comment(5)
I don't know exact reasons, but it seems there's also a problem with block sizes - they are too big - so some numbers can be missed / mis-interpreted. Look for an option for controlling segment sizes, if there is oneLimulus
You can try to use TEXT_DETECTION. As explained in the documentation, DOCUMENT_TEXT_DETECTION is optimized for dense text. The images that you used seem not be the case.Brister
thanks @enlelin Unfortunately I need to extract text from written documents, that often have zones with different text density. In my case DOCUMENT_TEXT_DETECTION works significantly better, but has troubles in recognizing isolate characters.Gull
Did you find a way to fix this?Divebomb
I am experiencing this problem also. Anyone who fix this already? ThanksHideout
S
1

Unfortunately I think the Vision API is optimized for both ends of the spectrum -- dense text (DOCUMENT_TEXT_DETECTION) on one end, and arbitrary bits of text (TEXT_DETECTION) on the other. As you noted in the comments, the regular TEXT_DETECTION works better for these stray single digits while DOCUMENT_TEXT_DETECTION works better overall.

As far as I've heard, there are no current plans to try to cover both of these in a single way, but it's possible that this could improve in the future.

I think there have been other requests to do more fine-tuning and hinting on what you're looking to detect (e.g., here and here), but this doesn't seem to be available yet. Perhaps in the future you'll be able to provide more hints on the format of the text that you're looking to find in images (e.g., phone numbers, single digits, etc).

Stubbed answered 28/5, 2019 at 17:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.