My experience with AWS Batch "parameters"
I have been working on a project for about 4 months now. One of my tasks was to connect several AWS services together to process in the Cloud the last uploaded file that an application had placed in a S3 bucket.
What I needed
The way this works is the following. Through a website, a user uploads a file that is sent to a back-end server, and then to a S3 bucket. This event triggers a AWS Lambda function, which inside creates and runs an instance of a AWS Batch Job, that has already been defined previously (based on a Docker image) and would retrieve from the S3 bucket the file to process it and the save in a database some results. By the way, all the code I am using is done with Python.
Everything worked as charm until I found it really hard to get as a parameter the filename of the file in the S3 bucket that generated the event, inside the python script that was being executed inside the Docker container, run by the AWS Batch Job.
What I did
After a lot of research and development, I came up with a solution for my problem. The issue was based on the fact that the word "parameter", for AWS Batch Jobs, is not what a user may expect. In return, we need to use containerOverrides, the way I show below: defining an "environment" variable value inside the running container by providing a pair of name and value of that variable.
# At some point we had defined aws_batch like this:
#
#aws_batch = boto3.client(
# service_name="batch",
# region_name='<OurRegion>',
# aws_access_key_id='<AWS_ID>',
# aws_secret_access_key='<AWS_KEY>',
#)
aws_batch.submit_job(
jobName='TheJobNameYouWant',
jobQueue='NameOfThePreviouslyDefinedQueue',
jobDefinition='NameOfThePreviouslyDefinedJobDefinition',
# parameters={ #THIS DOES NOT WORK
# 'FILENAME': FILENAME #THIS DOES NOT WORK
# }, #THIS DOES NOT WORK
containerOverrides={
'environment': [
{
'name': 'filename',
'value': 'name_of_the_file.png'
},
],
},
)
This way, from my Python script, inside the Docker container, I could access the environment variable value using the well-known os.getenv('<ENV_VAR_NAME>')
function.
You can also check on your AWS console, under the Batch menu, both Job configuration and Container details tabs, to make sure everything makes sense. The container that the Job is running will never see the Job parameters. In the opposite way, it will know the environment variables.
Final notes
I do not know if there is a better way to solve this. So far, I share with all the community something that does work.
I have tested it myself, and the main idea came from reading the links that I list below:
I honestly hope this helps you and wish you a happy coding!
Ref::date
supposed to do? – Morvin