Glue Job Succeeded but no data inserted into the target table (Aurora Mysql)
Asked Answered
S

1

6

I created a glue job using the visual tab like below. First I connected to a mysql table as data source which is already in my data catalog. Then in the transform node, I wrote a custom sql query to select only one column from the source table. Validated with the data preview feature and the transformation node works fine. Now I want to write the data to the existing database table that has only one column with 'string' data type. Glue job succeeded but I dont see the data in the table.

enter image description here Below is the automatic script generated from Glue Job Visual.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue import DynamicFrame


def sparkSqlQuery(glueContext, query, mapping, transformation_ctx) -> DynamicFrame:
    for alias, frame in mapping.items():
        frame.toDF().createOrReplaceTempView(alias)
    result = spark.sql(query)
    return DynamicFrame.fromDF(result, glueContext, transformation_ctx)


args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node MySQL
MySQL_node1650299412376 = glueContext.create_dynamic_frame.from_catalog(
    database="glue_rds_test",
    table_name="test_customer",
    transformation_ctx="MySQL_node1650299412376",
)

# Script generated for node SQL
SqlQuery0 = """
select CUST_CODE from customer
"""
SQL_node1650302847690 = sparkSqlQuery(
    glueContext,
    query=SqlQuery0,
    mapping={"customer": MySQL_node1650299412376},
    transformation_ctx="SQL_node1650302847690",
)

# Script generated for node MySQL
MySQL_node1650304163076 = glueContext.write_dynamic_frame.from_catalog(
    frame=SQL_node1650302847690,
    database="glue_rds_test",
    table_name="test_customer2",
    transformation_ctx="MySQL_node1650304163076",
)

job.commit()


Sicard answered 18/4, 2022 at 18:44 Comment(0)
O
0

For me the problem was the double-quotes of the selected fields in the SQL query. Dropping the use of double quotes solved it. There is no mention of it in the Spark SQL Syntax documentation

For example, I "wrongly" used this query syntax:

select "CUST_CODE" from customer

instead of this "correct" one :

select CUST_CODE from customer

Your shared sample code does not seem to have this syntax issue, but I thought putting the answer here might be of a help to others.

Obturate answered 9/9, 2022 at 6:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.