I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE.
Has anyone been able to do so?
I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE.
Has anyone been able to do so?
I've tried the DROP/ TRUNCATE
scenario, but have not been able to do it with connections already created in Glue, but with a pure Python PostgreSQL driver, pg8000.
__init__.py
in the root folderPython lib path
of the job --
). Tick the "Server-side encryption" box.Then you can simply create a connection and execute SQL.
import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
import pg8000
args = getResolvedOptions(sys.argv, [
'JOB_NAME',
'PW',
'HOST',
'USER',
'DB'
])
# ...
# Create Spark & Glue context
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# ...
config_port = 5432
conn = pg8000.connect(
database=args['DB'],
user=args['USER'],
password=args['PW'],
host=args['HOST'],
port=config_port
)
query = "TRUNCATE TABLE {0};".format(".".join([schema, table]))
cur = conn.cursor()
cur.execute(query)
conn.commit()
cur.close()
conn.close()
pg8000
module with a Glue job is now simpler. Go to 'Security configuration, script libraries, and job parameters (optional)', scroll down to 'Job Parameters' and enter --additional-python-modules
for the key and pg8000
for the value.. and you're done :) –
Polarize
data=spark.sql(sql)
conf = glueContext.extract_jdbc_conf("jdbc-commerce")
data.write \
.mode('overwrite') \
.format("jdbc") \
.option("url", conf['url']) \
.option("database", 'Pacvue_Commerce') \
.option("dbtable", "dbo.glue_1") \
.option("user", conf['user']) \
.option('truncate','true') \
.option("password", conf['password']) \
.save()
glue api not support , but spark api support.
jdbc-commerce is your connection name at crawl. use extract_jdbc_conf to get url、username and password.
.option("connschema", "<database>")
to the options. –
Slenderize After following step (4) of @thenaturalist's response,
sc.addPyFile("/home/glue/downloads/python/pg8000.zip")
import pg8000
worked for me in a development endpoint (zeppelin notebook)
More info: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
To clarify @thenaturalist instructions for the zip as I still struggled with this
Download the tar.gz of pg8000 from pypi.org and extract.
Zip the contents so you have the below structure
pg8000-1.15.3.zip
|
| -- pg8000 <dir>
| -- __init__.py
| -- _version.py <optional>
| -- core.py
Upload to s3 and then you should be able to just do a simple import pg8000
.
NOTE: scramp is also required at the moment so follow the same procedure as above to include the scramp module. But you don't need to import it.
© 2022 - 2024 — McMap. All rights reserved.