I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and need to find a way to distinguish between the changes made to a job/job-script from different developers(managing the changes on a code). Is it possible to have something similar to versioning in informatica mappings and workflows in AWS Glue job/job-script. I can see there is versioning on objects in data-catalog. There isn't enough information on this in the aws documentation. Any help is appreciated. Thnx
This blog by amazon talked about how to implement source control, testing and CICD for glue applications
2022-10-10.
It seems that Glue Jobs have been updated to support GitHub and AWS CodeCommit integration for version control. It's a new tab in the job configuration page, named "Version Control" beside the "Schedules" one.
I couldn't find any news as of today regarding this functionality of AWS Glue Jobs to bring more information, neither have used it myself, but seems pretty straightforward.
Edit: here's an official tutorial-like blog post from AWS. https://aws.amazon.com/pt/blogs/big-data/code-versioning-using-aws-glue-studio-and-github/
Since Glue jobs are deployed in S3, I see two options:
- Adding the git commit SHA1 in the filename
- Using S3's own versioning capabilities
Adding the git commit SHA1 in the filename
The following suggestion by Tyler Treat might work to achieve this:
You’ll also notice that I append the Git commit SHA to the name of the file uploaded to S3. This way, you’ll know exactly what version of the code the script contains and the bucket will retain a history of each script. This is useful when you need to debug a job or revert to a previous version.
Source: https://blog.realkinetic.com/continuous-deployment-for-aws-glue-c8abd50d7d58
He uses this in the context of a continuous deployment setup, where each Glue job script deployed to S3 gets the Git Commit SHA1 hash embedded in its filename. In that way he builds up the history in the storage bucket.
Using S3's own versioning capabilities
Another way might be to enable versioning on the storage bucket itself. I have not tested how this will play out with Glue, but to try this follow these steps:
Enable versioning on the bucket itself using the following AWS CLI command:
aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration Status=Enabled
Source: https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html
Versioning will happen automatically from then on. In case you need to rollback the version you can:
a) Copy a previous version of the object into the same bucket. The copied object becomes the current version of that object and all object versions are preserved.
b) Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.
Source: https://docs.aws.amazon.com/AmazonS3/latest/userguide/RestoringPreviousVersions.html
© 2022 - 2024 — McMap. All rights reserved.