Optional job parameter in AWS Glue?
Asked Answered
D

9

22

How can I implement an optional parameter to an AWS Glue Job?

I have created a job that currently have a string parameter (an ISO 8601 date string) as an input that is used in the ETL job. I would like to make this parameter optional, so that the job use a default value if it is not provided (e.g. using datetime.now and datetime.isoformatin my case). I have tried using getResolvedOptions:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])

However, when I am not passing an --ISO_8601_STRING job parameter I see the following error:

awsglue.utils.GlueArgumentError: argument --ISO_8601_STRING is required

Durango answered 4/9, 2018 at 8:27 Comment(0)
D
19

Porting Yuriy's answer to Python solved my problem:

if ('--{}'.format('ISO_8601_STRING') in sys.argv):
    args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
else:
    args = {'ISO_8601_STRING': datetime.datetime.now().isoformat()}
Durango answered 12/9, 2018 at 14:53 Comment(0)
S
22

matsev and Yuriy solutions is fine if you have only one field which is optional.

I wrote a wrapper function for python that is more generic and handle different corner cases (mandatory fields and/or optional fields with values).

import sys    
from awsglue.utils import getResolvedOptions

def get_glue_args(mandatory_fields, default_optional_args):
    """
    This is a wrapper of the glue function getResolvedOptions to take care of the following case :
    * Handling optional arguments and/or mandatory arguments
    * Optional arguments with default value
    NOTE: 
        * DO NOT USE '-' while defining args as the getResolvedOptions with replace them with '_'
        * All fields would be return as a string type with getResolvedOptions

    Arguments:
        mandatory_fields {list} -- list of mandatory fields for the job
        default_optional_args {dict} -- dict for optional fields with their default value

    Returns:
        dict -- given args with default value of optional args not filled
    """
    # The glue args are available in sys.argv with an extra '--'
    given_optional_fields_key = list(set([i[2:] for i in sys.argv]).intersection([i for i in default_optional_args]))

    args = getResolvedOptions(sys.argv,
                            mandatory_fields+given_optional_fields_key)

    # Overwrite default value if optional args are provided
    default_optional_args.update(args)

    return default_optional_args

Usage :

# Defining mandatory/optional args
mandatory_fields = ['my_mandatory_field_1','my_mandatory_field_2']
default_optional_args = {'optional_field_1':'myvalue1', 'optional_field_2':'myvalue2'}
# Retrieve args
args = get_glue_args(mandatory_fields, default_optional_args)
# Access element as dict with args[‘key’]
Saturninasaturnine answered 24/9, 2019 at 15:19 Comment(2)
Thanks for your wrapper function, @mehdio. I found that when I supplied optional args to the job with a value in the form OPTION_ARG=value, the value was ignored in preference for the value supplied in default_optional_args dictionary. I tweaked you wrapper as follows: supplied = set([i[2:].split('=')[0] for i in sys.argv]) given_optional_fields_key = list((supplied).intersection([i for i in default_optional_args]))Mcfarlin
The intersection is basically just the set of arguments passed from outside overriding the defaults. I would suggest to rename given_optional_fields_key to something like overrided_optional_fields_keyMend
D
19

Porting Yuriy's answer to Python solved my problem:

if ('--{}'.format('ISO_8601_STRING') in sys.argv):
    args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
else:
    args = {'ISO_8601_STRING': datetime.datetime.now().isoformat()}
Durango answered 12/9, 2018 at 14:53 Comment(0)
C
8

There is a workaround to have optional parameters. The idea is to examine arguments before resolving them (Scala):

val argName = 'ISO_8601_STRING'
var argValue = null
if (sysArgs.contains(s"--$argName"))
   argValue = GlueArgParser.getResolvedOptions(sysArgs, Array(argName))(argName)
Chausses answered 11/9, 2018 at 8:49 Comment(0)
D
4

I don't see a way to have optional parameters, but you can specify default parameters on the job itself, and then if you don't pass that parameter when you run the job, your job will receive the default value (note that the default value can't be blank).

Denature answered 7/9, 2018 at 23:7 Comment(0)
S
4

Wrapping matsev's answer in a function:

def get_glue_env_var(key, default="none"):
    if f'--{key}' in sys.argv:
        return getResolvedOptions(sys.argv, [key])[key]
    else:
        return default
Shimberg answered 1/2, 2021 at 16:49 Comment(0)
B
1

It's possible to create a Step Function that starts the same Glue job with different parameters. The state machine starts with a Choice state and uses different number of inputs depending on which is present.

stepFunctions:
  stateMachines:
    taskMachine:
      role:
        Fn::GetAtt: [ TaskExecutor, Arn ]
      name: ${self:service}-${opt:stage}
      definition:
        StartAt: DefaultOrNot
        States:

          DefaultOrNot:
            Type: Choice
            Choices:
              - Variable: "$.optional_input"
                IsPresent: false
                Next: DefaultTask
              - Variable: "$. optional_input"
                IsPresent: true
                Next: OptionalTask

          OptionalTask:
            Type: Task
            Resource:  "arn:aws:states:::glue:startJobRun.task0"
            Parameters:
              JobName: ${self:service}-${opt:stage}
              Arguments:
                '--log_group.$': "$.specs.log_group"
                '--log_stream.$': "$.specs.log_stream"
                '--optional_input.$': "$. optional_input"

            Catch:
              - ErrorEquals: [ 'States.TaskFailed' ]
                ResultPath: "$.errorInfo"
                Next: TaskFailed
            Next: ExitExecution


          DefaultTask:
            Type: Task
            Resource:  "arn:aws:states:::glue:startJobRun.sync"
            Parameters:
              JobName: ${self:service}-${opt:stage}
              Arguments:
                '--log_group.$': "$.specs.log_group"
                '--log_stream.$': "$.specs.log_stream"


            Catch:
              - ErrorEquals: [ 'States.TaskFailed' ]
                ResultPath: "$.errorInfo"
                Next: TaskFailed
            Next: ExitExecution

          TaskFailed:
            Type: Fail
            Error: "Failure"

          ExitExecution:
            Type: Pass
            End: True
Bonus answered 15/1, 2022 at 1:27 Comment(0)
B
1

I came across the same issue when using CloudFormation to create jobs and pass in values retrieved from Fn::FindInMap, where I was declaring an empty string schemaSuffix : ''. Unfortunately when the Glue Job was created from this template, the argument appeared in the Job properties but with a blank value.

I've found a much simpler solution, I am not sure of the need to define a function to perform this. It relies on two pieces:

  1. Conditional list comprehensions, use this to construct your list before passing it to getResolvedOptions()
  2. Once the args dict is created, leverage the .get() method of a dict to avoid throwing an missing-key exception.
### In place of where you would use: 
### args = getResolvedOptions(sys.argv, ['l1', 'l2'])

argList = [ i for i in ['JOB_NAME', 'hostConn', 'schemaSuffix'] if f"--{i}" in sys.argv]

args = getResolvedOptions(sys.argv, argList)

schema_suffix = args.get("schemaSuffix", '') # Second arg's default is None

This got me the results I was wanting... a way to make the job argument optional, where I could conditionally retrieve and/or set it regardless of whether CloudFormation passed in an empty string (''), or no value at all.

print( f"Data type of schema_suffix arg: {type(schema_suffix)}")

print( f"suffix value is: {schema_suffix}...")
Bluebill answered 23/5 at 19:33 Comment(0)
R
0

An AWS Glue job expects all the parameters, optional parameter is not an option currently.

Solution which I tried to have optional parameters, pass parameters as a JSON.

 --glue_params = {"var1":"value1", "var2":"value2", "optional_var1":"value3", "optional_var2": null}

In your code, handle the parameters accordingly

import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,['JOB_NAME','glue_params'])

Hope this helps.

Refine answered 7/6, 2023 at 6:46 Comment(0)
F
-2

If you're using the interface, you must provide your parameter names starting with "--" like "--TABLE_NAME", rather than "TABLE_NAME", then you can use them like the following (python) code:

args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLE_NAME'])
table_name = args['TABLE_NAME']
Fruitage answered 17/10, 2018 at 13:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.