Is there any way to trigger a AWS Lambda function at the end of an AWS Glue job?

O

7

11

Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lambda function at the end of a Glue job? Lambda functions can be triggered using SNS messages, but I couldn't find a way to send an SNS at the end of the Glue job.

Obvert answered 28/2, 2018 at 16:43 Comment(2)

I've not worked with AWS Glue before, but all AWS services create some sorts of event in Cloudwatch. From Cloudwatch I believe you can trigger SNS to invoke your Lambda function – Ulrick 28/2, 2018 at 22:32

If the cleanup can be done by Lambda, why can't it be done with Glue ? A Glue python shell job could be equivalent. – Megillah 26/8, 2020 at 1:53

M

6

No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.

But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.

Monger answered 1/3, 2018 at 17:0 Comment(5)

I was able to orchestrate the tasks from a python function. The cleanest way to do that, I guess, is to create a function that triggers the Glue Job, waits for the job to end, and the triggers the data cleansing tasks. But the problem is the execution time limit of Lambda (<=300s). My Glue jobs run for much longer than that. I know I can do it in a different way, for example with a lambda function that checks every n minutes if there's a new successful run for the Glue Job. But I don't like the idea, it seams very hard to monitor. Isn't a better way to orchestrate ETL tasks? – Obvert 7/3, 2018 at 22:14

@dd.You can split up your lambda function into multiple function and then trigger one after the other gets completed. Now, you can't directly invoke lambda function after a lambda function but you can trigger it through other components like S3. If first lambda gets completed, do some update inside an S3 object and then trigger second lambda and so on. I can think of this as the preliminary way but if I find a better way, I would let you know. – Monger 7/3, 2018 at 22:44

Hi CodeHunter, Can you please provide some sample lambda code to call a Glue job? When any object is uploaded to S3 bucket/folder, I have a lambda function listening to the S3 location, the lambda function should trigger to start my glue job. I searched for some references, but couldnt find one – Montoya 11/8, 2018 at 0:51

@Yuva: So I am triggering Glue job as soon as a file is uploaded in S3 by letting my upload service push a message inside a Kafka queue or maybe SNS event. Then use that Kafka message and try to listen it using Kafka Consumer and using that message, you can spawn Glue job as soon as you read a message from Kafka. I found this way to be better than using lambda triggers as firstly, you cannot trigger glue job based on upload in S3 and secondly, it makes it cloud agnostic using Kafka queue. – Monger 12/8, 2018 at 3:56

Hi @CodeHunter, I am trying to do what you did, when a file arrives to my S3 bucket start an ETL, but haven't succeed yet, I am a little bit confused with Lambda, can you tell me how you manage to do it? o where is a tutorial or something like that. Yestarday I even put the question here [#55367822 – Campestral 27/3, 2019 at 12:16

B

17

@oreoluwa is right, this can be done using Cloudwatch Events.

From the Cloudwatch dashboard:

Click on 'Rules' from the left menu
For 'Event Source', choose 'Event Pattern' and in 'Service Name' choose 'Glue'
For 'Event Type' choose 'Glue Job State Change'
On the right side of the page, in the 'Targets' section, click 'Add Target' -> 'Lambda Function' and then choose your function.

The event you'll get in Lambda will be of the format:

{
    'version': '0',
    'id': 'a9bc90be-xx00-03e0-9bc5-a0a0a0a0a0a0',
    'detail-type': 'GlueJobStateChange',
    'source': 'aws.glue',
    'account': 'xxxxxxxxxx',
    'time': '2018-05-10T16: 17: 03Z',
    'region': 'us-east-2',
    'resources': [],
    'detail': {
        'jobName': 'xxxx_myjobname_yyyy',
        'severity': 'INFO',
        'state': 'SUCCEEDED',
        'jobRunId': 'jr_565465465446788dfdsdf546545454654546546465454654',
        'message': 'Jobrunsucceeded'
    }
}

Bounds answered 10/5, 2018 at 16:23 Comment(0)

W

9

Since AWS Glue has started supporting python, you can probably follow the below path to achieve what you desire. Below sample script shows how to do that -

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3   ## Step-2

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

## Do all ETL stuff here

## Once the ETL completes
lambda_client = boto3.client('lambda')  ## Step-3
response = lambda_client.invoke(FunctionName='string')  ## Step-4

Create a python based Glue Job (to perform ETL on Redshift)
In the job script, import boto3 (need to place this package as script library).
Make a connection to lambda using boto3
Invoke lambda function using the boto3 lambda invoke() once the ETL completes.

Please make sure that the role that you are using while creating the Glue job has permissions to invoke lambda functions.

Refer to the Boto3 documentation for lambda here.

Wareroom answered 26/5, 2018 at 12:52 Comment(0)

M

6

No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.

But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.

Monger answered 1/3, 2018 at 17:0 Comment(5)

I was able to orchestrate the tasks from a python function. The cleanest way to do that, I guess, is to create a function that triggers the Glue Job, waits for the job to end, and the triggers the data cleansing tasks. But the problem is the execution time limit of Lambda (<=300s). My Glue jobs run for much longer than that. I know I can do it in a different way, for example with a lambda function that checks every n minutes if there's a new successful run for the Glue Job. But I don't like the idea, it seams very hard to monitor. Isn't a better way to orchestrate ETL tasks? – Obvert 7/3, 2018 at 22:14

@dd.You can split up your lambda function into multiple function and then trigger one after the other gets completed. Now, you can't directly invoke lambda function after a lambda function but you can trigger it through other components like S3. If first lambda gets completed, do some update inside an S3 object and then trigger second lambda and so on. I can think of this as the preliminary way but if I find a better way, I would let you know. – Monger 7/3, 2018 at 22:44

Hi CodeHunter, Can you please provide some sample lambda code to call a Glue job? When any object is uploaded to S3 bucket/folder, I have a lambda function listening to the S3 location, the lambda function should trigger to start my glue job. I searched for some references, but couldnt find one – Montoya 11/8, 2018 at 0:51

@Yuva: So I am triggering Glue job as soon as a file is uploaded in S3 by letting my upload service push a message inside a Kafka queue or maybe SNS event. Then use that Kafka message and try to listen it using Kafka Consumer and using that message, you can spawn Glue job as soon as you read a message from Kafka. I found this way to be better than using lambda triggers as firstly, you cannot trigger glue job based on upload in S3 and secondly, it makes it cloud agnostic using Kafka queue. – Monger 12/8, 2018 at 3:56

Hi @CodeHunter, I am trying to do what you did, when a file arrives to my S3 bucket start an ETL, but haven't succeed yet, I am a little bit confused with Lambda, can you tell me how you manage to do it? o where is a tutorial or something like that. Yestarday I even put the question here [#55367822 – Campestral 27/3, 2019 at 12:16

A

2

@ace and @adeel, have part of the solution, but you could get this resolved by creating the CloudWatch Rule with the following event pattern:

{
  "source": [
    "aws.glue"
  ],
  "detail-type": [
    "Glue Job State Change"
  ],
  "detail": {
    "jobName": [
      "<YourJobName>"
    ],
    "state": [
      "SUCCEEDED"
    ]
  }
}

Apostasy answered 1/4, 2020 at 18:33 Comment(0)

L

1

You can orchestrate your AWS Glue Jobs and AWS Lambda functions by using AWS Step Functions. Here is a blog post that explains how to do it and gives an example: https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/

In essence, when a Glue job finishes (success or fail), your Step Function workflow can catch the event and invoke your Lambda function.

Leapfrog answered 15/8, 2022 at 11:35 Comment(0)

J

1

yes it is possible to trigger but for this we have to take help of EventBridge . Please follow below instruction go to EventBridge then Under Events you will find rules click on it then click on create rule give a suitable name to your rule by make sure radio button selected on Rule with an event pattern then click Next in event source it will be AWS events or EventBridge partner events then in creation method select Use pattern form. In event pattern select event source as "AWS service" and in AWS service select glue and then new drop down selection will be enabled there select "Glue Job State Change"

then right side event pattern is there click on edit pattern and do changes as per your need.

{
  "detail-type": ["Glue Job State Change"],
  "source": ["aws.glue"],
  "detail": {
             "jobName": ["Your glue Name"],
             "state": ["FAILED"]
 }

}

in state : STARTING , RUNNING , STOPPING , STOPPED , SUCCEEDED , FAILED , ERROR , WAITING and TIMEOUT you can choose this

don't use any other field unless you are using ec2 instance then you have to use resources field and you can place it next to source

then click on next select aws service in target type select Lambda function and then select your lambda function name in new drop down which appeared after selecting the target and then next , next and save.

congrats you have successfully created the configuration to trigger lambda function based on glue job.

Janessajanet answered 23/11, 2022 at 4:4 Comment(0)

K

0

Lambda can be triggered on S3 put. You can put a dummy file on S3 as the last glue job; which would in turn trigger lambda. I have tested this.

Kristinkristina answered 3/4, 2018 at 14:37 Comment(0)

Recommended topics

Hot tags