AWS Textract InvalidParameterException
Asked Answered
E

5

6

I have a .Net core client application using amazon Textract with S3,SNS and SQS as per the AWS Document , Detecting and Analyzing Text in Multipage Documents(https://docs.aws.amazon.com/textract/latest/dg/async.html)

Created an AWS Role with AmazonTextractServiceRole Policy and added the Following Trust relation ship as per the documentation (https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html) { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "textract.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Subscribed SQS to the topic and Given Permission to the Amazon SNS Topic to Send Messages to the Amazon SQS Queue as per the aws documentation .

All Resources including S3 Bucket, SNS ,SQS are in the same us-west2 region

The following method shows a generic error "InvalidParameterException" Request has invalid parameters

But If the NotificationChannel section is commented the code is working fine and returning the correct job id.

Error message is not giving a clear picture about the parameter. Highly appreciated any help .

public async Task<string> ScanDocument()
{
            string roleArn = "aws:iam::xxxxxxxxxxxx:instance-profile/MyTextractRole";
            string topicArn = "aws:sns:us-west-2:xxxxxxxxxxxx:AmazonTextract-My-Topic";
            string bucketName = "mybucket";
            string filename = "mytestdoc.pdf";

            var request = new StartDocumentAnalysisRequest();
            var notificationChannel = new NotificationChannel();
            notificationChannel.RoleArn = roleArn;
            notificationChannel.SNSTopicArn = topicArn;

            var s3Object = new S3Object
            {
                Bucket = bucketName,
                Name = filename
            };
            request.DocumentLocation = new DocumentLocation
            {
                S3Object = s3Object
            };
            request.FeatureTypes = new List<string>() { "TABLES", "FORMS" };
            request.NotificationChannel = channel; /* Commenting this line work the code*/
            var response = await this._textractService.StartDocumentAnalysisAsync(request);
            return response.JobId;

        }
Egyptian answered 30/11, 2019 at 5:52 Comment(0)
E
2

After a long days analyzing the issue. I was able to resolve it .. as per the documentation topic only required SendMessage Action to the SQS . But after changing it to All SQS Action its Started Working . But Still AWS Error message is really misleading and confusing

Egyptian answered 2/12, 2019 at 5:34 Comment(0)
L
12

Debugging Invalid AWS Requests

The AWS SDK validates your request object locally, before dispatching it to the AWS servers. This validation will fail with unhelpfully opaque errors, like the OP.

As the SDK is open source, you can inspect the source to help narrow down the invalid parameter.

Before we look at the code: The SDK (and documentation) are actually generated from special JSON files that describe the API, its requirements and how to validate them. The actual code is generated based on these JSON files.

I'm going to use the Node.js SDK as an example, but I'm sure similar approaches may work for the other SDKs, including .NET

In our case (AWS Textract), the latest Api version is 2018-06-27. Sure enough, the JSON source file is on GitHub, here.

In my case, experimentation narrowed the issue down to the ClientRequestToken. The error was an opaque InvalidParameterException. I searched for it in the SDK source JSON file, and sure enough, on line 392:

"ClientRequestToken": {
  "type": "string",
  "max": 64,
  "min": 1,
  "pattern": "^[a-zA-Z0-9-_]+$"
},

A whole bunch of undocumented requirements!

In my case the token I was using violated the regex (pattern in the above source code). Changing my token code to satisfy the regex solved the problem.

I recommend this approach for these sorts of opaque type errors.

Legman answered 21/10, 2020 at 2:21 Comment(3)
"A whole bunch of undocumented requirements!" This is spot on!Berliner
please give feedback in AWS doc. It's an easy fix on their sideEarhart
Helpful post. An alternative, and often simpler to locate, option is the AWS API documentation for the underlying HTTP/REST APIs. For example: StartDocumentAnalysis which indicates the valid shape of parameters e.g. ClientRequestToken is optional, 1-64 characters in length, and must match the pattern ^[a-zA-Z0-9-_]+$Basketwork
E
2

After a long days analyzing the issue. I was able to resolve it .. as per the documentation topic only required SendMessage Action to the SQS . But after changing it to All SQS Action its Started Working . But Still AWS Error message is really misleading and confusing

Egyptian answered 2/12, 2019 at 5:34 Comment(0)
E
2

Invoking textract with Python, I received the same error until I truncated the ClientRequestToken down to 64 characters

        response = client.start_document_text_detection(
            DocumentLocation={
                'S3Object':{
                    'Bucket': bucket,
                    'Name' : fileName
                }
            },
            ClientRequestToken= fileName[:64],
            NotificationChannel= {
                "SNSTopicArn": "arn:aws:sns:us-east-1:AccountID:AmazonTextractXYZ",
                "RoleArn": "arn:aws:iam::AccountId:role/TextractRole"
            }
        )
        print('Processing started : %s' % json.dumps(response))
Earhart answered 25/12, 2022 at 2:17 Comment(0)
K
1

you would need to change the permissions to All SQS Action and then use the code as below


def startJob(s3BucketName, objectName):
    response = None
    response = textract.start_document_text_detection(
    DocumentLocation={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': objectName
        }
    })

    return response["JobId"]

def isJobComplete(jobId):
    # For production use cases, use SNS based notification 
    # Details at: https://docs.aws.amazon.com/textract/latest/dg/api-async.html
    time.sleep(5)
    response = textract.get_document_text_detection(JobId=jobId)
    status = response["JobStatus"]
    print("Job status: {}".format(status))

    while(status == "IN_PROGRESS"):
        time.sleep(5)
        response = textract.get_document_text_detection(JobId=jobId)
        status = response["JobStatus"]
        print("Job status: {}".format(status))

    return status

def getJobResults(jobId):

    pages = []

    response = textract.get_document_text_detection(JobId=jobId)
    
    pages.append(response)
    print("Resultset page recieved: {}".format(len(pages)))
    nextToken = None
    if('NextToken' in response):
        nextToken = response['NextToken']

    while(nextToken):

        response = textract.get_document_text_detection(JobId=jobId, NextToken=nextToken)
        pages.append(response)
        print("Resultset page recieved: {}".format(len(pages)))
        nextToken = None
        if('NextToken' in response):
            nextToken = response['NextToken']

    return pages
Kyser answered 25/1, 2022 at 23:45 Comment(1)
Remember that Stack Overflow isn't just intended to solve the immediate problem, but also to help future readers find solutions to similar problems, which requires understanding the underlying code. This is especially important for members of our community who are beginners, and not familiar with the syntax. Given that, can you edit your answer to include an explanation of what you're doing and why you believe it is the best approach?Unerring
H
0

In my case when I take a picture from camera or picked an image from the gallery, image size is bigger than 5 MB, so that I received an error as "Invalid Parameter Exception" from Textract SDK Analyse Expense Call. Textract supports maximum image size as 5 MB. So, I have compressed the image and made image size less than 5MB then Textract Analyse Expense call got worked for me. Hope it helps someone. Have a great day!

Hardnett answered 11/7 at 5:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.