I wrote a basic python script to call and use the GCP Vision API. My aim is to send an image of a product to it and to retrieve (with OCR) the words written on this box. I have a predefined list of brands so I can search within the returned text from the API the brand and detect what it is.
My python script is the following:
import io
from google.cloud import vision
from google.cloud.vision import types
import os
import cv2
import numpy as np
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "**************************"
def detect_text(file):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
file_name = "Image.jpg"
img = cv2.imread(file_name)
detect_text(file_name)
For now, I am experimenting with the following product image: (951 × 335 resolution)
Its brand is Acuvue
.
The problem is the following. When I am testing the online demo of GCP Cloud Vision API then I am getting the following text result for this image:
FOR ASTIGMATISM 1-DAY ACUVUE MOIST WITH LACREON™ 30 Lenses BRAND CONTACT LENSES UV BLOCKING
(The json result for this returns all the above words including the word Acuvue
which matters for me but the json is too long to post it here)
Therefore, the online demo detects pretty well the text on the product and at least it detects accurately the word Acuvue
(which is the brand). However, when I am calling the same API in my python script with the same image I am getting the following result:
Texts:
"1.DAY
FOR ASTIGMATISM
WITH
LACREONTM
MOIS
30 Lenses
BRAND CONTACT LENSES
UV BLOCKING
"
bounds: (221,101),(887,101),(887,284),(221,284)
"1.DAY"
bounds: (221,101),(312,101),(312,125),(221,125)
"FOR"
bounds: (622,107),(657,107),(657,119),(622,119)
"ASTIGMATISM"
bounds: (664,107),(788,107),(788,119),(664,119)
"WITH"
bounds: (614,136),(647,136),(647,145),(614,145)
"LACREONTM"
bounds: (600,151),(711,146),(712,161),(601,166)
"MOIS"
bounds: (378,162),(525,153),(528,200),(381,209)
"30"
bounds: (614,177),(629,178),(629,188),(614,187)
"Lenses"
bounds: (634,178),(677,180),(677,189),(634,187)
"BRAND"
bounds: (361,210),(418,210),(418,218),(361,218)
"CONTACT"
bounds: (427,209),(505,209),(505,218),(427,218)
"LENSES"
bounds: (514,209),(576,209),(576,218),(514,218)
"UV"
bounds: (805,274),(823,274),(823,284),(805,284)
"BLOCKING"
bounds: (827,276),(887,276),(887,284),(827,284)
But this does not detect at all the word "Acuvue" as the demo does!!
Why is this happening?
Can I fix something in my python script to make it work properly?
DOCUMENT_TEXT_DETECTION
request instead of aTEXT_DETECTION
request? (example) – Plautusdetect_text(file_name)
withdocument_detect_text(file_name)
? This gives me the following error:name 'document_detect_text' is not defined
– Bryantbryantyresponse = ...
line andtexts = ...
line should be changed using thedocument_text_detection
method andfull_text_annotation
attribute as shown in the example I linked. My hope is that the more robust detector will find "Acuvue" but at a confidence that the standard detector considers too low to include. – Plautustext: "FOR ASTIGMATISM\n1-DAY ACUVUE\nMOIST\nWITH\nLACREON\342\204\242\n30 Lenses\nBRAND CONTACT LENSES\nUV BLOCKING\n"
. However, I am wondering what these numbers `342\204\242` mean in it. – Bryantbryanty