I have an ETL job written in python, which consist of multiple scripts with following directory structure;
my_etl_job
|
|--services
| |
| |-- __init__.py
| |-- dynamoDB_service.py
|
|-- __init__.py
|-- main.py
|-- logger.py
main.py
is the entrypoint script that imports other scripts from above directories. The above code runs perfectly fine on dev-endpoint, after uploading on the ETL cluster created by dev endpoint. Since now I want to run it in production, I want to create a proper glue job for it. But when I compress the whole directory my_etl_job
in .zip
format, upload it in artifacts s3 bucket, and specify the .zip file location into script location as follows
s3://<bucket_name>/etl_jobs/my_etl_job.zip
This is the code I see on glue job UI dashboard;
PK
���P__init__.pyUX�'�^"�^A��)PK#7�P logger.pyUX��^1��^A��)]�Mk�0����a�&v+���A�B���`x����q��} ...AND ALLOT MORE...
Seems like the glue job doesn't accepts .zip format ? if yes, then what compression format shall I use ?
UPDATE:
I checked out that glue job has option of taking in extra files Referenced files path
where I provided a comma separated list of all paths of the above files, and changed the script_location to refer to only main.py
file path. But that also didn't worked. Glue job throws error no module found logger (and I defined this module inside logger.py file)