How to pass environment variables to AWS Glue
Asked Answered
M

1

6

I'm using pyspark to write on a kafka broker, for that a JAAS security mechanism is set up thus we need to pass username and password as env variables

    data_frame \
        .selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
        .write \
        .format('kafka') \
        .option('topic', topic)\
        .option('kafka.ssl.endpoint.identification.algorithm', 'https') \
        .option('kafka.bootstrap.servers', os.environ['BOOTSTRAP_SERVER']) \
        .option('kafka.sasl.jaas.config', 
                 f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{os.environ["USERNAME"]}" password="{os.environ["PASSWORD"]}";')\
        .option('kafka.sasl.mechanism', 'PLAIN')\
        .option('kafka.security.protocol', 'SASL_SSL')\
        .mode('append') \
        .save()

locally I used python os.environ[""] to retrieve environment variables, how to pass these last into AWS Glue Job ?

Mateya answered 21/3, 2022 at 15:18 Comment(0)
T
5

You could use Job Parameters

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv,
                          ['JOB_NAME',
                           'BOOTSTRAP_SERVER',
                           'USERNAME',
                           'PASSWORD'])


data_frame \
        .selectExpr('CAST(id AS STRING) AS key', "to_json(struct(*)) AS value") \
        .write \
        .format('kafka') \
        .option('topic', topic)\
        .option('kafka.ssl.endpoint.identification.algorithm', 'https') \
        .option('kafka.bootstrap.servers', args['BOOTSTRAP_SERVER']) \
        .option('kafka.sasl.jaas.config', 
                 f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{args['USERNAME']}" password="{args['PASSWORD']}";')\
        .option('kafka.sasl.mechanism', 'PLAIN')\
        .option('kafka.security.protocol', 'SASL_SSL')\
        .mode('append') \
        .save()

then you could pass BOOTSTRAP_SERVER, USERNAME and Password in the glue job console or perhaps in something like boto3

response = client.start_job_run(
             JobName = 'myGlueJob',
             Arguments = {
               '--BOOTSTRAP_SERVER': 'myServer',
               '--USERNAME': 'myUsername',
               '--PASSWORD': 'myPassword'})

Note: you should consider storing creds in something like AWS Secrets Manager and retrieve them in your glue script

Triley answered 21/3, 2022 at 20:19 Comment(4)
still a good solution, but I want to keep my python code that uses os.environ thus it will be the same in the different levels ( need just to change the env variables in the server) ?Mateya
Yeah, not a direct replacement for env vars. I should mention that you can set Job Params once when you create your Job. You don't have to pass them at run-time.Triley
Want to use env vars, is there a way to do so ?Mateya
Not that I'm aware ofTriley

© 2022 - 2024 — McMap. All rights reserved.