aws-glue - 4

2

AWS Glue job hangs when calling the AWS Glue client API using boto3 from the context of a running AWS Glue Job?

I'm trying to create a Glue Job that enumerates all tables in a database in my catalog. In order to do so I use the following code snippet: session = boto3.Session(region_name='us-east-2') glue = ...

amazon-web-services boto3 aws-glue

Hallie asked 13/6, 2018 at 22:27

1

Glue Job Succeeded but no data inserted into the target bucket

I have used the new AWS Glue Studio visual tool to just try run a very simple SQL query, with Source as a Catalog Table, Transform as a simple SparkSQL, and Target as a CSV file(s) in an s3 bucket....

amazon-web-services apache-spark-sql aws-glue

Piperine asked 7/9, 2022 at 7:10

1

Glue Job Succeeded but no data inserted into the target table (Aurora Mysql)

I created a glue job using the visual tab like below. First I connected to a mysql table as data source which is already in my data catalog. Then in the transform node, I wrote a custom sql query t...

mysql amazon-web-services aws-glue amazon-aurora aws-glue-data-catalog

Sicard asked 18/4, 2022 at 18:44

0

Has anyone figured out how to use KSQLDB with AWS GLUE Schema Registry?

We're trying to use AWS MSK (managed Kafka), and we want to use AWS GLUE Schema registry with AVRO rather than Confluent Schema Registry. We have brought up KSQLDB, and connected that to MSK, but i...

amazon-web-services apache-kafka aws-glue ksqldb aws-msk

Heimer asked 11/8, 2022 at 15:54

2

Solved

Converting PySpark dataframe to a Delta Table

I am working in AWS Glue environment. I read the data from Glue catalog as a Dynamic dataframe and convert it to Pyspark dataframe for my custom transformations. To do an upsert of the new/updated ...

apache-spark pyspark aws-glue delta-lake

Bard asked 30/8, 2021 at 8:12

2

Import failure of s3fs library in AWS Glue

AWS glue is not importing s3fs module import s3fs I expect the library to be imported but AWS glue says ImportError : No module named s3fs

python amazon-s3 aws-glue

Preachy asked 9/4, 2019 at 8:53

3

Specify a SerDe serialization lib with AWS Glue Crawler

Every time I run a glue crawler on existing data, it changes the Serde serialization lib to LazySimpleSerDe, which doesn't classify correctly (e.g. for quoted fields with commas in) I then need ...

amazon-web-services amazon-athena aws-glue aws-glue-data-catalog

Obovoid asked 14/8, 2019 at 16:4

3

Solved

aws glue job dependency in step function

I've created 2 glue jobs (gluejob1, gluejob2). I want create a dependency as gluejob2 should run only after gluejob1 is completed. To orchestrate this, I created a step function with below defini...

amazon-web-services aws-glue aws-step-functions

Thearchy asked 16/1, 2019 at 1:55

4

Getting the 'Exception thrown in awaitResult:' error when trying to copy table in glue to redshift

I have been trying to copy a table over from glue to one in redshift. I created a job with the following code import sys from awsglue.transforms import * from awsglue.utils import getResolvedOption...

python apache-spark pyspark aws-glue

Retinitis asked 24/7, 2020 at 23:19

4

Solved

AWS Glue - Truncate destination postgres table prior to insert

I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE. Has anyone been able to ...

python postgresql pyspark aws-glue

Depression asked 2/11, 2017 at 17:16

3

How do I import JSON data from S3 using AWS Glue?

I have a whole bunch of data in AWS S3 stored in JSON format. It looks like this: s3://my-bucket/store-1/20190101/sales.json s3://my-bucket/store-1/20190102/sales.json s3://my-bucket/store-1/20190...

amazon-web-services amazon-s3 etl aws-glue

Riordan asked 20/3, 2019 at 14:1

3

AWS Glue RDD.saveAsTextFile() raises Class org.apache.hadoop.mapred.DirectOutputCommitter not found

I'm creating the simple ETL that reads a billion of files and re-partition them (in other words, compact to smaller amount for further processing). Simple AWS Glue application: import org.apache.sp...

scala apache-spark rdd aws-glue

Rider asked 22/12, 2020 at 13:26

2

Solved

Use pyarrow in Glue pythonshell - ModuleNotFoundError: No module named 'pyarrow.lib'

Created a egg and whl file of pyarrow and put this on s3, for call this in pythonshell job. Received this message: Job code: import pyarrow raise Error, same structure for whl: Traceback (most...

python python-3.x aws-glue egg pyarrow

Minimus asked 3/3, 2020 at 17:47

2

Solved

AWS Cloudformation: is there a way to capture Glue ARN for use in a step function?

My task is to create a cloudformation template that produces a glue job and then sets that glue job up as the first step function task. I have the two pieces working separately, but I don't seem t...

amazon-web-services aws-cloudformation aws-glue

Truculent asked 7/5, 2020 at 5:39

7

Solved

How do I write messages to the output log on AWS Glue?

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they ...

pyspark aws-glue

Aurora asked 21/2, 2018 at 19:51

2

Solved

How to get job_id from within the python script using AWS Glue python shell job?

I am trying to access the AWS ETL Glue Python shell job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf02...

python amazon-web-services aws-glue

Griffey asked 31/3, 2022 at 21:17

2

AWS Athena - GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters

I'm querying a table in Athena that is giving the error: GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters I was able to query it earlier, but added another parti...

amazon-web-services aws-glue presto amazon-athena

Sleeping asked 10/7, 2019 at 22:10

2

Solved

AWS Glue vs EMR Serverless

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very...

amazon-web-services amazon-emr aws-glue emr-serverless

Azaleah asked 12/12, 2021 at 8:10

2

AWS Glue: ETL to read S3 CSV files

I want to use ETL to read data from S3. Since with ETL jobs I can set DPU to hopefully speed things up. But how do I do it? I tried import sys from awsglue.transforms import * from awsglue.util...

amazon-web-services amazon-s3 pyspark etl aws-glue

Romeliaromelle asked 1/11, 2018 at 15:10

13

Solved

Use AWS Glue Python with NumPy and Pandas Python Packages

What is the easiest way to use packages such as NumPy and Pandas within the new ETL tool on AWS called Glue? I have a completed script within Python I would like to run in AWS Glue that utilizes Nu...

python pandas amazon-web-services aws-lambda aws-glue

Beheld asked 20/9, 2017 at 18:42

3

Solved

AWS GLUE import xls/xlsx file

I'm setting up a AWS GLUE job for my customers. Their files are excel with xls/xlsx extension and have multiple sheets and they don't want to do any convert job before uploading. How do I extract d...

amazon-web-services aws-glue

Ieshaieso asked 12/8, 2019 at 6:35

3

Load data from S3 into Aurora Serverless using AWS Glue

According to Moving data from S3 -> RDS using AWS Glue I found that an instance is required to add a connection to a data target. However, my RDS is a serverless, so there is no instance availa...

amazon-web-services amazon-s3 aws-glue aws-aurora-serverless

Coth asked 2/12, 2019 at 5:13

1

Solved

AWS User not authorized to perform PassRole

I'm trying to create a job in AWS Glue using the Windows AWS Client and I'm receiving that I'm not authorized to perform: iam:PassRole as you can see: Console>aws glue create-job --name "aw...

amazon-web-services amazon-s3 terraform aws-cli aws-glue

Puke asked 2/5, 2022 at 18:57

4

Solved

AWS glueContext read doesn't allow a sql query

I want to read filtered data from a Mysql instance using AWS glue job. Since a glue jdbc connection doesnt allow me to push down predicate, I am trying to explicitly create a jdbc connection in my ...

aws-glue mssql-jdbc

Sidestroke asked 8/1, 2019 at 14:55

1

Solved

Using S3 folder structure as meta data in AWS Glue

I am currently running an AWS Glue job the converts csvs to parquet files. The source & target of the data is an S3 bucket and this all works fine. However I would like to include information f...

apache-spark aws-glue

Nerine asked 4/4, 2022 at 14:54

aws-glue Questions

Recommended topics

Hot tags