aws-glue Questions

2

I'm trying to create a Glue Job that enumerates all tables in a database in my catalog. In order to do so I use the following code snippet: session = boto3.Session(region_name='us-east-2') glue = ...
Hallie asked 13/6, 2018 at 22:27

1

I have used the new AWS Glue Studio visual tool to just try run a very simple SQL query, with Source as a Catalog Table, Transform as a simple SparkSQL, and Target as a CSV file(s) in an s3 bucket....
Piperine asked 7/9, 2022 at 7:10

1

I created a glue job using the visual tab like below. First I connected to a mysql table as data source which is already in my data catalog. Then in the transform node, I wrote a custom sql query t...

0

We're trying to use AWS MSK (managed Kafka), and we want to use AWS GLUE Schema registry with AVRO rather than Confluent Schema Registry. We have brought up KSQLDB, and connected that to MSK, but i...
Heimer asked 11/8, 2022 at 15:54

2

Solved

I am working in AWS Glue environment. I read the data from Glue catalog as a Dynamic dataframe and convert it to Pyspark dataframe for my custom transformations. To do an upsert of the new/updated ...
Bard asked 30/8, 2021 at 8:12

2

AWS glue is not importing s3fs module import s3fs I expect the library to be imported but AWS glue says ImportError : No module named s3fs
Preachy asked 9/4, 2019 at 8:53

3

Every time I run a glue crawler on existing data, it changes the Serde serialization lib to LazySimpleSerDe, which doesn't classify correctly (e.g. for quoted fields with commas in) I then need ...

3

Solved

I've created 2 glue jobs (gluejob1, gluejob2). I want create a dependency as gluejob2 should run only after gluejob1 is completed. To orchestrate this, I created a step function with below defini...
Thearchy asked 16/1, 2019 at 1:55

4

I have been trying to copy a table over from glue to one in redshift. I created a job with the following code import sys from awsglue.transforms import * from awsglue.utils import getResolvedOption...
Retinitis asked 24/7, 2020 at 23:19

4

Solved

I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE. Has anyone been able to ...
Depression asked 2/11, 2017 at 17:16

3

I have a whole bunch of data in AWS S3 stored in JSON format. It looks like this: s3://my-bucket/store-1/20190101/sales.json s3://my-bucket/store-1/20190102/sales.json s3://my-bucket/store-1/20190...
Riordan asked 20/3, 2019 at 14:1

3

I'm creating the simple ETL that reads a billion of files and re-partition them (in other words, compact to smaller amount for further processing). Simple AWS Glue application: import org.apache.sp...
Rider asked 22/12, 2020 at 13:26

2

Solved

Created a egg and whl file of pyarrow and put this on s3, for call this in pythonshell job. Received this message: Job code: import pyarrow raise Error, same structure for whl: Traceback (most...
Minimus asked 3/3, 2020 at 17:47

2

Solved

My task is to create a cloudformation template that produces a glue job and then sets that glue job up as the first step function task. I have the two pieces working separately, but I don't seem t...
Truculent asked 7/5, 2020 at 5:39

7

Solved

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they ...
Aurora asked 21/2, 2018 at 19:51

2

Solved

I am trying to access the AWS ETL Glue Python shell job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf02...
Griffey asked 31/3, 2022 at 21:17

2

I'm querying a table in Athena that is giving the error: GENERIC_INTERNAL_ERROR: Number of partition values does not match number of filters I was able to query it earlier, but added another parti...
Sleeping asked 10/7, 2019 at 22:10

2

Solved

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very...
Azaleah asked 12/12, 2021 at 8:10

2

I want to use ETL to read data from S3. Since with ETL jobs I can set DPU to hopefully speed things up. But how do I do it? I tried import sys from awsglue.transforms import * from awsglue.util...
Romeliaromelle asked 1/11, 2018 at 15:10

13

Solved

What is the easiest way to use packages such as NumPy and Pandas within the new ETL tool on AWS called Glue? I have a completed script within Python I would like to run in AWS Glue that utilizes Nu...
Beheld asked 20/9, 2017 at 18:42

3

Solved

I'm setting up a AWS GLUE job for my customers. Their files are excel with xls/xlsx extension and have multiple sheets and they don't want to do any convert job before uploading. How do I extract d...
Ieshaieso asked 12/8, 2019 at 6:35

3

According to Moving data from S3 -> RDS using AWS Glue I found that an instance is required to add a connection to a data target. However, my RDS is a serverless, so there is no instance availa...

1

Solved

I'm trying to create a job in AWS Glue using the Windows AWS Client and I'm receiving that I'm not authorized to perform: iam:PassRole as you can see: Console>aws glue create-job --name "aw...

4

Solved

I want to read filtered data from a Mysql instance using AWS glue job. Since a glue jdbc connection doesnt allow me to push down predicate, I am trying to explicitly create a jdbc connection in my ...
Sidestroke asked 8/1, 2019 at 14:55

1

Solved

I am currently running an AWS Glue job the converts csvs to parquet files. The source & target of the data is an S3 bucket and this all works fine. However I would like to include information f...
Nerine asked 4/4, 2022 at 14:54

© 2022 - 2024 — McMap. All rights reserved.