I want to use Textract OCR service for reading text from pdf file. I have a problem with that because I want to do it locally, without S3 bucket. I tested it for image files and it works good, but it does not work for PDF files.
This is the code where I get an error:
response = textract.start_document_text_detection(DocumentLocation="sample2.pdf")
Error:
Invalid type for parameter DocumentLocation, value: sample2.pdf, type: <class 'str'>, valid types: <class 'dict'>
Code2:
response = textract.start_document_text_detection(DocumentLocation={"name":"sample2.pdf"})
Error:
Unknown parameter in DocumentLocation: "name", must be one of: S3Object
Code3:
response = textract.start_document_text_detection(Document={'Bytes': "sample2.pdf"})
Error:
Unknown parameter in input: "Document", must be one of: DocumentLocation, ClientRequestToken, JobTag, NotificationChannel, OutputConfig
What should I do, Is there a way to make Textract work for PDF documents without s3?
Bytes
docs.aws.amazon.com/textract/latest/dg/API_Document.html – Sellers