aws-glue Questions

3

Solved

I use dynamic frames to write a parquet file in S3 but if a file already exists my program append a new file instead of replace it. The sentence that I use is this: glueContext.write_dynamic_frame...
Military asked 24/8, 2018 at 9:47

1

I am unable to run newly created AWS Glue Crawler. I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console Created new Crawler Role...
Lithesome asked 8/1, 2023 at 6:4

6

Solved

I have Glue job, a python shell code. When I try to run it I end up getting the below error. Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid i...

2

Solved

I'm trying to update a CSV table definition that has been created by the Glue Data Crawler. One of the columns contains decimal data that is currently being classified as double precision. I'm fin...
Archaeological asked 8/3, 2020 at 4:29

2

Solved

In my main.tf I have the following: data "template_file" "lambda_script_temp_file" { template = "${file("../../../fn/lambda_script.py")}" } data "template_file" "library_temp_file" { template =...
Maishamaisie asked 26/11, 2018 at 8:50

3

We're evaluating AWS Glue for a big data project, with some ETL. We added a crawler, which is correctly picking up a CSV file from S3. Initially, we simply want to transform that CSV to JSON, and d...
Limber asked 12/12, 2017 at 19:58

6

Solved

According to the article by Databricks, it is possible to integrate delta lake with AWS Glue. However, I am not sure if it is possible to do it also outside of Databricks platform. Has someone done...
Availability asked 2/10, 2019 at 6:0

6

I am unable to connect AWS Glue with RDS VPC S3 endpoint validation failed for SubnetId: subnet-7e8a2. VPC: vpc-4d2d25. Reason: Could not find S3 endpoint or NAT gateway for subnetId: subnet-7ea32...
Counterproductive asked 3/5, 2019 at 15:25

2

I am trying a simple python shell job where I am trying to read a config file which is in S3 bucket folder. The Glue service role has bucket object read/write permission. I have set --extra-files s...
Camembert asked 4/8, 2019 at 3:10

4

I had used the AWS Glue Job with the PySpark to read the data from the s3 parquet files which is more than 10 TB, but the Job was failing during the execution of the Spark SQL Query with the error ...
Felipe asked 28/12, 2020 at 13:38

3

Solved

I have a compressed gzip file in an S3 bucket. The files will be uploaded to the S3 bucket daily by the client. The gzip when uncompressed will contain 10 files in CSV format, but with the same sch...
Tetrachloride asked 23/2, 2018 at 18:1

2

Solved

I'm very new to AWS Glue, and I want to use AWS Glue to unzip a huge file present in a S3 bucket, and write the contents back to S3. I couldn't find anything while trying to google this requirement...
Futrell asked 21/5, 2021 at 5:24

4

Solved

I'm trying to create a workflow where AWS Glue ETL job will pull the JSON data from external REST API instead of S3 or any other AWS-internal sources. Is that even possible? Anyone does it? Please ...
Barner asked 13/1, 2020 at 9:55

7

Solved

Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lamb...
Obvert asked 28/2, 2018 at 16:43

2

I'm setting up a new Jupyter Notebook in AWS Glue as a dev endpoint in order to test out some code for running an ETL script. So far I created a basic ETL script using AWS Glue but, for some reason...
Oleaceous asked 9/7, 2019 at 18:43

6

I have created a pyspark script(glue job) and trying it to run through EC2 instance with the cli command aws glue start-job-run --arguments (Here I am passing list of argument). I have tried both t...
Deplorable asked 28/11, 2017 at 10:26

1

I am looking to see if there is something like AWS Glue "bookmark" in spark. I know there is checkpoint in spark which works well on individual data source. In Glue we could use bookmark ...

1

Solved

I'm using this command to write a dataframe to S3: df.write.option("delimiter","|").option("header",True).option("compression", "gzip").mode("...
Zecchino asked 26/10, 2022 at 16:59

1

Does AWS Schema Registry support being used as an SQL Catalog within Flink SQLK applications? For instance, the documentation shows an example of using a Hive Catalog: CREATE CATALOG hive WITH ( 't...
Hermit asked 3/4, 2022 at 15:33

4

I have a Kinesis Firehose configuration in Terraform, which reads data from Kinesis stream in JSON, converts it to Parquet using Glue and writes to S3. There is something wrong with data format con...

3

I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and need to find a way to distinguish betw...
Infatuation asked 4/2, 2020 at 15:58

2

I have a glue table with column tlc and its datatype is Bigint. I am trying to do the following using PySpark: Read the Glue table and write it in a Dataframe Join with another table Write the re...
Fortuitous asked 24/3, 2020 at 3:34

4

Solved

Is it possible to trigger an AWS Glue crawler on new files, that get uploaded into a S3 bucket, given that the crawler is "pointed" to that bucket? In other words: a file upload generates an event,...
Zena asked 16/2, 2018 at 13:47

7

Solved

I am trying to run a crawler across an s3 datastore in my account which contains two csv files. However, when I try to run the crawler, no tables are loaded, and I see the following errors in cloud...
Sleepyhead asked 17/8, 2018 at 16:19

8

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes. The crawler takes roughly 20 seconds to run and the logs show it successfu...
Fluoroscope asked 1/11, 2017 at 17:2

© 2022 - 2024 — McMap. All rights reserved.