AWS Glue error - Invalid input provided while running python shell program

Asked 27/7, 2022 at 11:5 Answered 19/1, 2023 at 18:34

Solved python amazon-web-services amazon-s3 aws-glue aws-glue-spark

I have Glue job, a python shell code. When I try to run it I end up getting the below error. Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid input provided It is not specific to code, even if I just put

import boto3
print('loaded')

I am getting the error right after clicking the run job option. What is the issue here?

Unstained answered 27/7, 2022 at 11:5 Comment(3)

can you share some more job details and whats in the log.? – Americana 27/7, 2022 at 14:6

I'm having the same issue. Any python script generates this error. The logs are all empty – Gripsack 27/7, 2022 at 16:55

I am having the same error as well – Rowlandson 22/8, 2022 at 23:24

I think Quatermass is right, the jobs started working out of the blue the next day without any changes.

Unstained answered 29/7, 2022 at 6:50 Comment(0)

It happend to me but the same job is working on a different account.

AWS documentation is not really explainative about this error:

The input provided was not valid.

I doubt this is an Amazon issue as mentionned @Quartermass

Dropsical answered 11/8, 2022 at 11:6 Comment(1)

and it can come from a lot of Glue functions: github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-glue/src/… – Levitus 11/8, 2022 at 15:15

I think Quatermass is right, the jobs started working out of the blue the next day without any changes.

Unstained answered 29/7, 2022 at 6:50 Comment(0)

I too received this super helpful error message.

What worked for me was explicitly setting properties like worker type, number of workers, Glue version and Python version.

In Terraform code:

resource "aws_glue_job" "my_job" {
  name              = "my_job"
  role_arn          = aws_iam_role.glue.arn
  worker_type       = "Standard"
  number_of_workers = 2
  glue_version      = "4.0"

  command {
    script_location = "s3://my-bucket/my-script.py"
    python_version  = "3"
  }

  default_arguments = {
    "--enable-job-insights" = "true",
    "--additional-python-modules" : "boto3==1.26.52,pandas==1.5.2,SQLAlchemy==1.4.46,requests==2.28.2",
  }
}

Update

After doing some more digging, I realised that what I needed was a Python shell script Glue job, not an ETL (Spark) job. By choosing this flavour of job, setting the Python version to 3.9 and "ticking the box" for Glue's pre-installed analytics libraries, my script, incidentally, had access to all the libraries I needed.

My Terraform code ended up looking like this:

resource "aws_glue_job" "my_job" {
  name         = "my-job"
  role_arn     = aws_iam_role.glue.arn
  glue_version = "1.0"
  max_capacity = 1

  connections = [
    aws_glue_connection.redshift.name
  ]

  command {
    name            = "pythonshell"
    script_location = "s3://my-bucket/my-script.py"
    python_version  = "3.9"
  }

  default_arguments = {
    "--enable-job-insights" = "true",
    "--library-set" : "analytics",
  }
}

Note that I have switched to using Glue version 1.0. I arrived at this after some trial and error, and could not find this explicitly stated as the compatible version for pythonshell jobs… but it works!

Kinchinjunga answered 19/1, 2023 at 18:34 Comment(2)

Thanks for this. I ended up just needing to specify the glue_version. – Adagietto 24/1, 2023 at 10:27

@AndrewMoore no worries! Please see my update. I was able to simplify my code a lot after figuring out that I only needed a pythonshell job type. – Kinchinjunga 24/1, 2023 at 12:11

Same issue here in eu-west-2 yesterday, working now. This was only happening with Pythonshell jobs, not Pyspark ones, and job runs weren't getting as far as outputting any log streams. I can only assume it was an AWS issue they've now fixed and not issued a service announcement for.

Valeriavalerian answered 28/7, 2022 at 10:20 Comment(0)

Well, in my case, I get this error from time to time without any clear reason. The only thing that seems to cause the issue, is modifying some job parameter and saving the modifications. As soon as I save and try to execute the job, I usually get this error and, the only way to solve the issue, is destroying the job and, then, re-creating it again. Does anybody solved this issue by other means? As I saw in the accepted answer, the job simply begun to work again wthout any manual action, giving an understanding that the problem was a bug in AWS that was corrected.

Fishbowl answered 5/10, 2022 at 19:55 Comment(0)

I was facing a similar issue. I was invoking my job from a workflow. I could solve it by adding WorkerType, GlueVersion, NumberOfWorkers to the job before adding the job to the workflow. I could see it consistently fail before and succeed after this addition.

Pellagra answered 15/11, 2022 at 8:51 Comment(0)

Update

Recommended topics

Hot tags