"GlueArgumentError: argument --input_file_path is required"
Asked Answered
D

6

9

I have created a pyspark script(glue job) and trying it to run through EC2 instance with the cli command aws glue start-job-run --arguments (Here I am passing list of argument). I have tried both the short-hand syntax and json syntax to pass the arguments with the above cli command but I am getting error "GlueArgumentError: argument --input_file_path is required" (input file path is the argument I am trying to access in the pyspark script as given below)

spark = SparkSession.builder.getOrCreate()
args = getResolvedOptions(sys.argv, ['input_file_path', 'CONFIG_FILE_PATH', 'SELECTED_RECORD_FILE_PATH', 'REJECTED_RECORD_FILE_PATH']

The cli commands which I used to run the job are as below:

1] aws glue start-job-run --job-name dsb_clng_and_vldtn --arguments input_file_path="s3://dsb-lfnsrn-001/lndg/data/CompanyData_UK.csv"
2] aws glue start-job-run --job-name dsb_clng_and_vldtn --arguments "file://$JSON_FILES_PATH/job_arguments_list.json"
(JSON_FILES_PATH is shell variable)

In the method 2] I used the json syntax to execute the job. The json file content is as below :

{
    "input_file_path":"s3://dsb-lfnsrn-001/lndg/data/CompanyData_UK.csv",
    "CONFIG_FILE_PATH":"s3://htcdsb-dev/wrkspc/src/dsb-lfnsrn-001-config.json",
    "SELECTED_RECORD_FILE_PATH":"s3://dsb-lfnsrn-001/pckpby/processed/Valid_UK.csv",
    "REJECTED_RECORD_FILE_PATH":"s3://dsb-lfnsrn-001/pckpby/processed/Invalid_UK.csv"
}

Please advice me as I am struggling to resolve above issue from several hours.

Deplorable answered 28/11, 2017 at 10:26 Comment(0)
J
11

Somewhat infuriatingly this issue is also apparent when a Glue job is run from the console.

Job Parameters must be specified with a '--' prefix, and referenced in the script without the prefix.

enter image description here

args = getResolvedOptions(sys.argv, ['JOB_NAME', 'table_name'])

print(args['table_name'])
Jeanniejeannine answered 9/7, 2018 at 2:22 Comment(1)
The fact that the parameters are configurable from the console via "Edit Job" and must be prefixed with a double-hyphen makes it nearly unforgivable that AWS fails to mention that the end user must provide it. Their documentation for adding jobs and calling the APIs do show the double-hyphen, but it'd make much more sense for it to be called out in the GUI.Scrubland
A
8

getResolvedOptions expects the parameters passed to have double hyphen in the job call.

aws glue start-job-run --job-name dsb_clng_and_vldtn --arguments='--input_file_path="s3://dsb-lfnsrn-001/lndg/data/CompanyData_UK.csv"'

And in your job:

args = getResolvedOptions(sys.argv, ['input_file_path']
Acarpous answered 18/12, 2017 at 21:2 Comment(0)
K
4

To run multiple arguments for the glue job, you add separated arguments by a comma. This worked for me:

aws glue start-job-run --job-name "example-my-glue-job" --arguments="--input_first_day=2013-01-01","--input_last_day=2013-01-31","--run_timestamp=20200803211121"
Kearse answered 4/8, 2020 at 16:24 Comment(0)
A
3

comfytoday's answer really helped me out. I'd like to add that you can't use hyphens in parameter names either.

For example, I tried:

ARGUMENTS = {
    '--s3-source':   's3://cs3-bucket-here/'
    }

response = glue.start_job_run(JobName=JOB_NAME, Arguments=ARGUMENTS)

And I received KeyErrors. When I replace 's3-source' with 's3_source' in the API call and within the Glue script it ran successfully.

Atomicity answered 13/1, 2020 at 22:32 Comment(0)
N
0

Another thing that is worth mentioning is that multiple arguments need to be added separately, like below. Also notice the specification of an argument without value.

aws glue start-job-run --job-name Ivan-Air-ETL --arguments="--job-bookmark-option=job-bookmark-enable" --arguments="--enable-metrics="
Nerves answered 22/1, 2020 at 7:44 Comment(0)
M
0

This is what works for me in Glue 3.0

aws glue start-job-run --job-name 'NameOfJob' --arguments='inputS3Key="S3://bucketname/path/",outputS3Key="S3://bucketname/path/"'

Make sure that you've no extra spaces anywhere in this.

Mina answered 18/11, 2022 at 8:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.