Getting the 'Exception thrown in awaitResult:' error when trying to copy table in glue to redshift
Asked Answered
R

4

9

I have been trying to copy a table over from glue to one in redshift. I created a job with the following code

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [TempDir, JOB_NAME]
args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "booker", table_name = "relationalized_parquet_itemdetailsnew_output_itemdetails", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "booker", table_name = "relationalized_parquet_itemdetailsnew_output_itemdetails", transformation_ctx = "datasource0")
## @type: ApplyMapping
## @args: [mapping = [("id", "long", "id", "long"), ("index", "int", "index", "int"), ("`itemdetails.val.calculatedweight`", "double", "calculated_weight", "double"), ("`itemdetails.val.weightsigma`", "double", "weight_sigma", "double"), ("`itemdetails.val.itemaggregationid.long`", "long", "item_aggregation_id_long", "long"), ("`itemdetails.val.quantity`", "int", "quantity", "int"), ("`itemdetails.val.expectedweight`", "double", "expected_weight", "double"), ("`itemdetails.val.weightoverridden`", "boolean", "weight_overridden", "boolean"), ("`itemdetails.val.itemid`", "string", "item_id", "string"), ("`itemdetails.val.itemaggregationid.int`", "int", "item_aggregation_id_int", "int"), ("`itemdetails.val.type`", "int", "type", "int")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("id", "long", "id", "long"), ("index", "int", "index", "int"), ("`itemdetails.val.calculatedweight`", "double", "calculated_weight", "double"), ("`itemdetails.val.weightsigma`", "double", "weight_sigma", "double"), ("`itemdetails.val.itemaggregationid.long`", "long", "item_aggregation_id_long", "long"), ("`itemdetails.val.quantity`", "int", "quantity", "int"), ("`itemdetails.val.expectedweight`", "double", "expected_weight", "double"), ("`itemdetails.val.weightoverridden`", "boolean", "weight_overridden", "boolean"), ("`itemdetails.val.itemid`", "string", "item_id", "string"), ("`itemdetails.val.itemaggregationid.int`", "int", "item_aggregation_id_int", "int"), ("`itemdetails.val.type`", "int", "type", "int")], transformation_ctx = "applymapping1")
## @type: SelectFields
## @args: [paths = ["expected_weight", "quantity", "item_aggregation_id_int", "item_id", "length", "index", "type", "gift_option", "weight_overridden", "width", "id", "weight_sigma", "calculated_weight", "item_aggregation_id_long", "height"], transformation_ctx = "selectfields2"]
## @return: selectfields2
## @inputs: [frame = applymapping1]
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["expected_weight", "quantity", "item_aggregation_id_int", "item_id", "length", "index", "type", "gift_option", "weight_overridden", "width", "id", "weight_sigma", "calculated_weight", "item_aggregation_id_long", "height"], transformation_ctx = "selectfields2")
## @type: ResolveChoice
## @args: [choice = "MATCH_CATALOG", database = "delphi_redshift", table_name = "delphi_shajeec_sd_item_details", transformation_ctx = "resolvechoice3"]
## @return: resolvechoice3
## @inputs: [frame = selectfields2]
resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "delphi_redshift", table_name = "delphi_shajeec_sd_item_details", transformation_ctx = "resolvechoice3")
## @type: ResolveChoice
## @args: [choice = "make_cols", transformation_ctx = "resolvechoice4"]
## @return: resolvechoice4
## @inputs: [frame = resolvechoice3]
resolvechoice4 = ResolveChoice.apply(frame = resolvechoice3, choice = "make_cols", transformation_ctx = "resolvechoice4")
## @type: DataSink
## @args: [database = "delphi_redshift", table_name = "delphi_shajeec_sd_item_details", redshift_tmp_dir = TempDir, transformation_ctx = "datasink5"]
## @return: datasink5
## @inputs: [frame = resolvechoice4]
datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice4, database = "delphi_redshift", table_name = "delphi_shajeec_sd_item_details", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
job.commit()

After running this job, I get the error" 'An error occurred while calling o116.pyWriteDynamicFrame. Exception thrown in awaitResult:'. The error log specifically says:

020-07-24 22:00:47,493 WARN  [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:retry$1(132)) - Exception thrown while running copy query. Exception message: Exception thrown in awaitResult: .Retrying 2 more times.
2020-07-24 22:00:47,524 WARN  [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:retry$1(135)) - Sleeping 30000 milliseconds before proceeding to retry redshift copy
2020-07-24 22:00:49,236 INFO  [dispatcher-event
    
2020-07-24T17:01:55.663-05:00
    
-loop-2] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.31.1.131:44498) with ID 4
2020-07-24 22:00:49,237 INFO  [spark-listener-group-executorManagement] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - New executor 4 has registered (new total is 3)
2020-07-24 22:00:49,353 INFO  [dispatcher-event-loop-0] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Registering block manager ip-172-31-1-131.ec2.internal:43747 with 2.8 GB RAM, BlockManagerId(4, ip-172-31-1-131.ec2.internal, 43747, None)
2020-07-24 22:00:50,043 INFO  [dispatcher-event-loop-2] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.31.2.115:56504) with ID 3
2020-07-24 22:00:50,044 INFO  [spark-listener-group-executorManagement] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - New executor 3 has registered (new total is 4)
2020-07-24 22:00:50,144 INFO  [dispatcher-event-loop-0] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Registering block manager ip-172-31-2-115.ec2.internal:42197 with 2.8 GB RAM, BlockManagerId(3, ip-172-31-2-115.ec2.internal, 42197, None)
2020-07-24 22:01:18,950 WARN  [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:retry$1(132)) - Exception thrown while running copy query. Exception message: Exception thrown in awaitResult: .Retrying 1 more times.
2020-07-24 22:01:18,988 WARN  [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:retry$1(135)) - Sleeping 30000 milliseconds before proceeding to retry redshift copy
2020-07-24 22:01:45,785 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Request to remove executorIds: 2
2020-07-24 22:01:45,786 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Requesting to kill executor(s) 2
2020-07-24 22:01:45,788 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Actual list of executor(s) to be killed is 2
2020-07-24 22:01:45,789 INFO  [dispatcher-event-loop-2] yarn.ApplicationMaster$AMEndpoint (Logging.scala:logInfo(54)) - Driver requested to kill executor(s) 2.
2020-07-24 22:01:45,790 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Removing executor 2 because it has been idle for 60 seconds (new desired total will be 3)
2020-07-24 22:01:45,891 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Request to remove executorIds: 1
2020-07-24 22:01:45,891 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Requesting to kill executor(s) 1
2020-07-24 22:01:45,891 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Actual list of executor(s) to be killed is 1
2020-07-24 22:01:45,891 INFO  [dispatcher-event-loop-3] yarn.ApplicationMaster$AMEndpoint (Logging.scala:logInfo(54)) - Driver requested to kill executor(s) 1.
2020-07-24 22:01:45,892 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Removing executor 1 because it has been idle for 60 seconds (new desired total will be 2)
2020-07-24 22:01:49,290 INFO  [dispatcher-event-loop-3] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Disabling executor 1.
2020-07-24 22:01:49,293 INFO  [dag-scheduler-event-loop] scheduler.DAGScheduler (Logging.scala:logInfo(54)) - Executor lost: 1 (epoch 1)
2020-07-24 22:01:49,294 INFO  [dispatcher-event-loop-2] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Trying to remove executor 1 from BlockManagerMaster.
2020-07-24 22:01:49,295 INFO  [dispatcher-event-loop-2] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Removing block manager BlockManagerId(1, ip-172-31-7-142.ec2.internal, 44815, None)
2020-07-24 22:01:49,295 INFO  [dag-scheduler-event-loop] storage.BlockManagerMaster (Logging.scala:logInfo(54)) - Removed 1 successfully in removeExecutor
2020-07-24 22:01:49,297 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Request to remove executorIds: 4
2020-07-24 22:01:49,297 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Requesting to kill executor(s) 4
2020-07-24 22:01:49,297 INFO  [spark-dynamic-executor-allocation] cluster.YarnClusterSchedulerBackend (Logging.scala:logInfo(54)) - Actual list of executor(s) to be killed is 4
2020-07-24 22:01:49,297 INFO  [dispatcher-event-loop-1] yarn.ApplicationMaster$AMEndpoint (Logging.scala:logInfo(54)) - Driver requested to kill executor(s) 4.
2020-07-24 22:01:49,298 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Removing executor 4 because it has been idle for 60 seconds (new desired total will be 1)
2020-07-24 22:01:49,302 INFO  [dispatcher-event-loop-3] cluster.YarnClusterScheduler (Logging.scala:logInfo(54)) - Executor 1 on ip-172-31-7-142.ec2.internal killed by driver.
2020-07-24 22:01:49,303 INFO  [spark-listener-group-executorManagement] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Existing executor 1 has been removed (new total is 3)
2020-07-24 22:01:49,818 INFO  [dispatcher-event-loop-2] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Disabling executor 2.
2020-07-24 22:01:49,819 INFO  [dag-scheduler-event-loop] scheduler.DAGScheduler (Logging.scala:logInfo(54)) - Executor lost: 2 (epoch 1)
2020-07-24 22:01:49,819 INFO  [dispatcher-event-loop-2] cluster.YarnClusterScheduler (Logging.scala:logInfo(54)) - Executor 2 on ip-172-31-11-12.ec2.internal killed by driver.
2020-07-24 22:01:49,819 INFO  [dispatcher-event-loop-3] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Trying to remove executor 2 from BlockManagerMaster.
2020-07-24 22:01:49,819 INFO  [dispatcher-event-loop-3] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Removing block manager BlockManagerId(2, ip-172-31-11-12.ec2.internal, 35007, None)
2020-07-24 22:01:49,820 INFO  [dag-scheduler-event-loop] storage.BlockManagerMaster (Logging.scala:logInfo(54)) - Removed 2 successfully in removeExecutor
2020-07-24 22:01:49,820 INFO  [spark-listener-group-executorManagement] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Existing executor 2 has been removed (new total is 2)
2020-07-24 22:01:50,099 INFO  [spark-dynamic-executor-allocation] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Request to remove executorIds: 3
2020-07-24 22:01:51,044 INFO  [dispatcher-event-loop-3] cluster.YarnSchedulerBackend$YarnDriverEndpoint (Logging.scala:logInfo(54)) - Disabling executor 4.
2020-07-24 22:01:51,044 INFO  [dag-scheduler-event-loop] scheduler.DAGScheduler (Logging.scala:logInfo(54)) - Executor lost: 4 (epoch 1)
2020-07-24 22:01:51,045 INFO  [dispatcher-event-loop-1] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Trying to remove executor 4 from BlockManagerMaster.
2020-07-24 22:01:51,045 INFO  [dispatcher-event-loop-3] cluster.YarnClusterScheduler (Logging.scala:logInfo(54)) - Executor 4 on ip-172-31-1-131.ec2.internal killed by driver.
2020-07-24 22:01:51,045 INFO  [dispatcher-event-loop-1] storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(54)) - Removing block manager BlockManagerId(4, ip-172-31-1-131.ec2.internal, 43747, None)
2020-07-24 22:01:51,045 INFO  [dag-scheduler-event-loop] storage.BlockManagerMaster (Logging.scala:logInfo(54)) - Removed 4 successfully in removeExecutor
2020-07-24 22:01:51,046 INFO  [spark-listener-group-executorManagement] spark.ExecutorAllocationManager (Logging.scala:logInfo(54)) - Existing executor 4 has been removed (new total is 1)
2020-07-24 22:01:52,841 ERROR [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:retry$1(142)) - SQLException thrown while running COPY query; will attempt to retrieve more information by querying the STL_LOAD_ERRORS table
java.sql.SQLException: Exception thrown in awaitResult: 
    at com.databricks.spark.redshift.JDBCWrapper.com$databricks$spark$redshift$JDBCWrapper$$executeInterruptibly(RedshiftJDBCWrapper.scala:133)
    at com.databricks.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:218)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:215)
    at scala.Option.foreach(Option.scala:257)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply$mcV$sp(RedshiftWriter.scala:215)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply(RedshiftWriter.scala:195)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply(RedshiftWriter.scala:195)
    at scala.util.Try$.apply(Try.scala:192)
    at com.databricks.spark.redshift.RedshiftWriter.retry$1(RedshiftWriter.scala:129)
    at com.databricks.spark.redshift.RedshiftWriter.doRedshiftLoad(RedshiftWriter.scala:195)
    at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:437)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:122)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at com.amazonaws.services.glue.util.RedshiftWrapper.writeDF(JDBCUtils.scala:975)
    at com.amazonaws.services.glue.RedshiftDataSink.writeDynamicFrame(RedshiftDataSink.scala:199)
    at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:55)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'sd_item_details' failed.  Check 'stl_load_errors' system table for details.;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(Unknown Source)
    at com.amazon.redshift.client.PGClient.executePreparedStatement(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.execute(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:127)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: Load into table 'sd_item_details' failed.  Check 'stl_load_errors' system table for details.;
    ... 20 more
2020-07-24 22:01:52,962 ERROR [Thread-9] redshift.RedshiftWriter (RedshiftWriter.scala:saveToRedshift(442)) - Exception thrown during Redshift load; will roll back transaction
java.sql.SQLException: Exception thrown in awaitResult: 
    at com.databricks.spark.redshift.JDBCWrapper.com$databricks$spark$redshift$JDBCWrapper$$executeInterruptibly(RedshiftJDBCWrapper.scala:133)
    at com.databricks.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:218)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:215)
    at scala.Option.foreach(Option.scala:257)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply$mcV$sp(RedshiftWriter.scala:215)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply(RedshiftWriter.scala:195)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply(RedshiftWriter.scala:195)
    at scala.util.Try$.apply(Try.scala:192)
    at com.databricks.spark.redshift.RedshiftWriter.retry$1(RedshiftWriter.scala:129)
    at com.databricks.spark.redshift.RedshiftWriter.doRedshiftLoad(RedshiftWriter.scala:195)
    at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:437)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:122)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at com.amazonaws.services.glue.util.RedshiftWrapper.writeDF(JDBCUtils.scala:975)
    at com.amazonaws.services.glue.RedshiftDataSink.writeDynamicFrame(RedshiftDataSink.scala:199)
    at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:55)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'sd_item_details' failed.  Check 'stl_load_errors' system table for details.;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(Unknown Source)
    at com.amazon.redshift.client.PGClient.executePreparedStatement(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.execute(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:127)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: Load into table 'sd_item_details' failed.  Check 'stl_load_errors' system table for details.;
    ... 20 more
Traceback (most recent call last):
  File "script_2020-07-24-21-59-59.py", line 45, in <module>
    datasink5 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice4, database = "delphi_redshift", table_name = "delphi_shajeec_sd_item_details", redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink5")
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 657, in from_catalog
    return self._glue_context.write_dynamic_frame_from_catalog(frame, db, table_name, redshift_tmp_dir, transformation_ctx, additional_options, catalog_id)
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/PyGlue.zip/awsglue/context.py", line 296, in write_dynamic_frame_from_catalog
    return DataSink(j_sink, self).write(frame)
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/PyGlue.zip/awsglue/data_sink.py", line 35, in write
    return self.writeFrame(dynamic_frame_or_dfc, info)
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/PyGlue.zip/awsglue/data_sink.py", line 31, in writeFrame
    return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/mnt/yarn/usercache/root/appcache/application_1595627243772_0003/container_1595627243772_0003_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o116.pyWriteDynamicFrame.
: java.sql.SQLException: Exception thrown in awaitResult: 
    at com.databricks.spark.redshift.JDBCWrapper.com$databricks$spark$redshift$JDBCWrapper$$executeInterruptibly(RedshiftJDBCWrapper.scala:133)
    at com.databricks.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:109)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:218)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1$$anonfun$apply$mcV$sp$2.apply(RedshiftWriter.scala:215)
    at scala.Option.foreach(Option.scala:257)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply$mcV$sp(RedshiftWriter.scala:215)
    at com.databricks.spark.redshift.RedshiftWriter$$anonfun$doRedshiftLoad$1.apply(RedshiftWriter.scala:195)
    ... 20 more

How can I correct this error?

Retinitis answered 24/7, 2020 at 23:19 Comment(1)
Actual error is (taken from your stacktrace): [Amazon](500310) Invalid operation: Load into table 'sd_item_details' failed. Check 'stl_load_errors' system table for details.; Could you check stl_load_errors, maybe there are some details there?Aby
B
3

I was hit with a similar error "o132.pyWriteDynamicFrame. Exception thrown in awaitResult:" for a job processing data from RDS to Redshift. I checked the write dynamic frame class multiple times and found no issues.

I then reviewed the data type mapping and made a few changes, for example: Mapping before the error was resolved = applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("client", "string", "client", "string"), ("id", "int", "id", "long")...

Mapping after the error was resolved = applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("client", "string", "client", "string"), ("id", "id", "numeric(20,0)")...

What was the mapping issue? Redshift does not support integer type with selected precision i.e (INT(11,0), hence the job failed when it was previously being mapped to BIGINT. Changing the target to a numeric type with selected precision solves the issue.

The following provides a guideline on how to handle data type mapping: https://docs.aws.amazon.com/redshift/latest/dg/federated-data-types.html

The 'pyWriteDynamicFrame. Exception thrown in awaitResult:' error might not be a direct result of the write dynamic frame class but underlines other issues within your script.

Bangui answered 3/2, 2022 at 12:9 Comment(1)
I'm getting it on putting single quotes around my stored proc parameters. When I removed the single quotes I get an error no such column exists.. Have you called a stored proc from glue using dynamicframe? ...,"postactions":"call stored_proc('"+myString+"','"+myDate+"')"Printing
F
1

notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical

in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles

also, when coping from RDS to RedShift notice this part in the dev guide:
Currently, an ETL job can use JDBC connections within only one subnet. If you have multiple data stores in a job, they must be on the same subnet.

see AWS Glue Developer Guide for more info

Footie answered 22/3, 2022 at 9:39 Comment(0)
U
0

Got a similar error while doing a copy from S3 to a Redshift table. Things that worked:

  • Making sure the S3 location referenced by RedshiftTempDir is accessible to Redshift. RedshiftTempDir has a manifest file with a list of S3 object paths that are needed to be loaded in Redshift. Further information can be found here: COPY from Amazon S3

COPY command in Redshift returns an error if the specified manifest file isn't > found or the manifest file isn't properly formed. The COPY command needs authorization to access data in another AWS resource, including in Amazon S3, Amazon EMR, Amazon DynamoDB, and Amazon EC2.

  • Command used in the AWS Glue script to copy data from S3 to Redshift:
val redshiftOutput = glueContext.getJDBCSink(catalogConnection = "your-connection-name", options = JsonOptions("{\"database\" : \"yourDB\", \"dbtable\" : \"your_table\" }"), redshiftTmpDir = "s3://yourRedshiftTmpDirPath/").writeDynamicFrame(yourframetocopy)
Uxorial answered 23/12, 2020 at 11:50 Comment(0)
A
0

I had the same issue, the issue could be because of the below problems.

your job and s3 bucket are in different region

while running the job, temporary file are created as a folder in S3 bucket. you should have full permission to that bucket because data gets loaded to redshift from there.

while creating the table in redshift, I set the varchar limit to 30 and it was expecting more as my data exceeds the limit. this can be found by running select * from stl_load_errors.

I resolved these three and the job run is successful

Ashelman answered 4/7, 2022 at 4:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.