Using Textract for OCR locally

I want to extract text from images using Python. (Tessaract lib does not work for me because it requires installation).

I have found boto3 lib and Textract, but I'm having trouble working with it. I'm still new to this. Can you tell me what I need to do in order to run my script correctly.

This is my code:

import cv2
import boto3
import textract


#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
    img = bytearray(document.read())

textract = boto3.client('textract',region_name='us-west-2')

response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)

When I run this code, I get:

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

I have also tried this:

# Document
documentName = "slika2.jpg"

# Read document content
with open(documentName, 'rb') as document:
    imageBytes = bytearray(document.read())

# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')

# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR

#print(response)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

But I get this error:

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

Im noob in this, so any help would be good. How can I read text form my image or pdf file?

I have also added this block of code, but the error is still Unable to locate credentials.

session = boto3.Session(
    aws_access_key_id='xxxxxxxxxxxx',
    aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)

import boto3 # boto3 client client = boto3.client( 'textract', region_name='us-west-2', aws_access_key_id='xxxxxxx', aws_secret_access_key='xxxxxxx' ) # Read image with open('slika2.png', 'rb') as document: img = bytearray(document.read()) # Call Amazon Textract response = client.detect_document_text( Document={'Bytes': img} ) # Print detected text for item in response["Blocks"]: if item["BlockType"] == "LINE": print ('\033[94m' + item["Text"] + '\033[0m')

Recommended topics

Hot tags