Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lambda function at the end of a Glue job? Lambda functions can be triggered using SNS messages, but I couldn't find a way to send an SNS at the end of the Glue job.
No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.
But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.
@oreoluwa is right, this can be done using Cloudwatch Events.
From the Cloudwatch dashboard:
- Click on 'Rules' from the left menu
- For 'Event Source', choose 'Event Pattern' and in 'Service Name' choose 'Glue'
- For 'Event Type' choose 'Glue Job State Change'
- On the right side of the page, in the 'Targets' section, click 'Add Target' -> 'Lambda Function' and then choose your function.
The event you'll get in Lambda will be of the format:
{
'version': '0',
'id': 'a9bc90be-xx00-03e0-9bc5-a0a0a0a0a0a0',
'detail-type': 'GlueJobStateChange',
'source': 'aws.glue',
'account': 'xxxxxxxxxx',
'time': '2018-05-10T16: 17: 03Z',
'region': 'us-east-2',
'resources': [],
'detail': {
'jobName': 'xxxx_myjobname_yyyy',
'severity': 'INFO',
'state': 'SUCCEEDED',
'jobRunId': 'jr_565465465446788dfdsdf546545454654546546465454654',
'message': 'Jobrunsucceeded'
}
}
Since AWS Glue has started supporting python, you can probably follow the below path to achieve what you desire. Below sample script shows how to do that -
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3 ## Step-2
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## Do all ETL stuff here
## Once the ETL completes
lambda_client = boto3.client('lambda') ## Step-3
response = lambda_client.invoke(FunctionName='string') ## Step-4
- Create a python based Glue Job (to perform ETL on Redshift)
- In the job script, import boto3 (need to place this package as script library).
- Make a connection to lambda using boto3
- Invoke lambda function using the boto3 lambda invoke() once the ETL completes.
Please make sure that the role that you are using while creating the Glue job has permissions to invoke lambda functions.
Refer to the Boto3 documentation for lambda here.
No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.
But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.
@ace and @adeel, have part of the solution, but you could get this resolved by creating the CloudWatch Rule with the following event pattern:
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Job State Change"
],
"detail": {
"jobName": [
"<YourJobName>"
],
"state": [
"SUCCEEDED"
]
}
}
You can orchestrate your AWS Glue Jobs and AWS Lambda functions by using AWS Step Functions. Here is a blog post that explains how to do it and gives an example: https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
In essence, when a Glue job finishes (success or fail), your Step Function workflow can catch the event and invoke your Lambda function.
yes it is possible to trigger but for this we have to take help of EventBridge . Please follow below instruction go to EventBridge then Under Events you will find rules click on it then click on create rule give a suitable name to your rule by make sure radio button selected on Rule with an event pattern then click Next in event source it will be AWS events or EventBridge partner events then in creation method select Use pattern form. In event pattern select event source as "AWS service" and in AWS service select glue and then new drop down selection will be enabled there select "Glue Job State Change"
then right side event pattern is there click on edit pattern and do changes as per your need.
{
"detail-type": ["Glue Job State Change"],
"source": ["aws.glue"],
"detail": {
"jobName": ["Your glue Name"],
"state": ["FAILED"]
}
}
in state : STARTING , RUNNING , STOPPING , STOPPED , SUCCEEDED , FAILED , ERROR , WAITING and TIMEOUT you can choose this
don't use any other field unless you are using ec2 instance then you have to use resources field and you can place it next to source
then click on next select aws service in target type select Lambda function and then select your lambda function name in new drop down which appeared after selecting the target and then next , next and save.
congrats you have successfully created the configuration to trigger lambda function based on glue job.
Lambda can be triggered on S3 put. You can put a dummy file on S3 as the last glue job; which would in turn trigger lambda. I have tested this.
© 2022 - 2024 — McMap. All rights reserved.