How to get job_id from within the python script using AWS Glue python shell job?
Asked Answered
G

2

0

I am trying to access the AWS ETL Glue Python shell job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically within a AWS Glue python shell job?

Note: python shell jobs are not the same as pyspark jobs in AWS Glue.

Griffey answered 31/3, 2022 at 21:17 Comment(0)
G
2

Yeah, it will sound crazy, but I added the parameter to the job called JOB_NAME set the job name and then inside the script I used boto3 to query the job to get it's run id. Probably not the best, but the only way I found. If anyone has a better solution then I will change accepted answer.

def get_running_job_id(job_name):
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        response = glue_client.get_job_runs(JobName=job_name)
        for res in response['JobRuns']:
            print("Job Run id is:"+res.get("Id"))
            print("status is:"+res.get("JobRunState"))
            if res.get("JobRunState") == "RUNNING":
                return res.get("Id")
        else:
            return None
    except ClientError as e:
        raise Exception("boto3 client error in get_status_of_job_all_runs: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in get_status_of_job_all_runs: " + e.__str__())
Griffey answered 7/6, 2022 at 18:10 Comment(1)
Yeah, that's one of a hacky way to do it! Thanks!Conquian
C
0

I could not find a solution for this. There is no official documentation around this, and sys.argv (command line arguments) does not have JOB_RUN_ID parameter passed to the Python script while running a AWS Glue Python Shell job.

In my tests, I found that the arguments passed to a Python Shell job are:

  • job-bookmark-option
  • scriptLocation

However, while running a AWS Glue Spark job, following arguments get passed:

  • JOB_ID
  • JOB_NAME
  • JOB_RUN_ID
  • job-bookmark-option
  • TempDir

And hence, there is no official or obvious way of finding JOB_RUN_ID from inside a Python script running as a Python Shell job on AWS Glue. I will update if AWS fixes this in future. Thanks!

Conquian answered 7/6, 2022 at 6:26 Comment(1)
Yeah, it will sound crazy, but I used boto3 to query the job to see what is it's run id. Probably not the best, but the only way I found.Griffey

© 2022 - 2024 — McMap. All rights reserved.