aws-glue - 3

3

Solved

Overwrite parquet files from dynamic frame in AWS Glue

I use dynamic frames to write a parquet file in S3 but if a file already exists my program append a new file instead of replace it. The sentence that I use is this: glueContext.write_dynamic_frame...

amazon-web-services parquet aws-glue

Military asked 24/8, 2018 at 9:47

1

Unable to run AWS Glue Crawler due to IAM Permissions

I am unable to run newly created AWS Glue Crawler. I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console Created new Crawler Role...

amazon-web-services etl amazon-iam aws-glue

Lithesome asked 8/1, 2023 at 6:4

6

Solved

AWS Glue error - Invalid input provided while running python shell program

I have Glue job, a python shell code. When I try to run it I end up getting the below error. Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid i...

python amazon-web-services amazon-s3 aws-glue aws-glue-spark

Unstained asked 27/7, 2022 at 11:5

2

Solved

Setting the number of decimal places when updating Glue Table Schema

I'm trying to update a CSV table definition that has been created by the Glue Data Crawler. One of the columns contains decimal data that is currently being classified as double precision. I'm fin...

amazon-web-services aws-glue aws-glue-data-catalog

Archaeological asked 8/3, 2020 at 4:29

2

Solved

terraform does not detect changes to lambda source files

In my main.tf I have the following: data "template_file" "lambda_script_temp_file" { template = "${file("../../../fn/lambda_script.py")}" } data "template_file" "library_temp_file" { template =...

amazon-web-services aws-lambda terraform aws-glue

Maishamaisie asked 26/11, 2018 at 8:50

3

Simple ETL job in AWS Glue says "File Already Exists"

We're evaluating AWS Glue for a big data project, with some ETL. We added a crawler, which is correctly picking up a CSV file from S3. Initially, we simply want to transform that CSV to JSON, and d...

apache-spark aws-glue

Limber asked 12/12, 2017 at 19:58

6

Solved

Can AWS Glue crawl Delta Lake table data?

According to the article by Databricks, it is possible to integrate delta lake with AWS Glue. However, I am not sure if it is possible to do it also outside of Databricks platform. Has someone done...

apache-spark amazon-s3 aws-glue delta-lake

Availability asked 2/10, 2019 at 6:0

6

Could not find S3 endpoint or NAT gateway for subnetId

I am unable to connect AWS Glue with RDS VPC S3 endpoint validation failed for SubnetId: subnet-7e8a2. VPC: vpc-4d2d25. Reason: Could not find S3 endpoint or NAT gateway for subnetId: subnet-7ea32...

amazon-web-services apache-spark amazon-rds amazon-iam aws-glue

Counterproductive asked 3/5, 2019 at 15:25

2

Extra files are not copied to job run directory

I am trying a simple python shell job where I am trying to read a config file which is in S3 bucket folder. The Glue service role has bucket object read/write permission. I have set --extra-files s...

amazon-web-services aws-glue

Camembert asked 4/8, 2019 at 3:10

4

How to overcome Spark "No Space left on the device" error in AWS Glue Job

I had used the AWS Glue Job with the PySpark to read the data from the s3 parquet files which is more than 10 TB, but the Job was failing during the execution of the Spark SQL Query with the error ...

amazon-s3 pyspark aws-glue

Felipe asked 28/12, 2020 at 13:38

3

Solved

Decompress a zip file in AWS Glue

I have a compressed gzip file in an S3 bucket. The files will be uploaded to the S3 bucket daily by the client. The gzip when uncompressed will contain 10 files in CSV format, but with the same sch...

amazon-web-services aws-glue

Tetrachloride asked 23/2, 2018 at 18:1

2

Solved

AWS Glue job to unzip a file from S3 and write it back to S3

I'm very new to AWS Glue, and I want to use AWS Glue to unzip a huge file present in a S3 bucket, and write the contents back to S3. I couldn't find anything while trying to google this requirement...

amazon-web-services amazon-s3 aws-glue

Futrell asked 21/5, 2021 at 5:24

4

Solved

AWS Glue job consuming data from external REST API

I'm trying to create a workflow where AWS Glue ETL job will pull the JSON data from external REST API instead of S3 or any other AWS-internal sources. Is that even possible? Anyone does it? Please ...

aws-glue aws-glue-data-catalog

Barner asked 13/1, 2020 at 9:55

7

Solved

Is there any way to trigger a AWS Lambda function at the end of an AWS Glue job?

Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lamb...

aws-lambda etl aws-glue

Obvert asked 28/2, 2018 at 16:43

2

I have an error "java.io.FileNotFoundException: No such file or directory" while trying to create a dynamic frame using a notebook in AWS Glue

I'm setting up a new Jupyter Notebook in AWS Glue as a dev endpoint in order to test out some code for running an ETL script. So far I created a basic ETL script using AWS Glue but, for some reason...

amazon-s3 pyspark etl aws-glue

Oleaceous asked 9/7, 2019 at 18:43

6

"GlueArgumentError: argument --input_file_path is required"

I have created a pyspark script(glue job) and trying it to run through EC2 instance with the cli command aws glue start-job-run --arguments (Here I am passing list of argument). I have tried both t...

aws-glue

Deplorable asked 28/11, 2017 at 10:26

1

Is there something like Glue "Bookmark" feature in spark which keeps track at job level?

I am looking to see if there is something like AWS Glue "bookmark" in spark. I know there is checkpoint in spark which works well on individual data source. In Glue we could use bookmark ...

apache-spark pyspark spark-streaming aws-glue incremental-load

Kaolin asked 14/9, 2021 at 6:59

1

Solved

File already exists error while writing Spark dataframe to S3 using AWS Glue

I'm using this command to write a dataframe to S3: df.write.option("delimiter","|").option("header",True).option("compression", "gzip").mode("...

apache-spark amazon-s3 pyspark apache-spark-sql aws-glue

Zecchino asked 26/10, 2022 at 16:59

1

Does AWS Glue Scheme Registry support being used as Flink SQL Catalog?

Does AWS Schema Registry support being used as an SQL Catalog within Flink SQLK applications? For instance, the documentation shows an example of using a Hive Catalog: CREATE CATALOG hive WITH ( 't...

amazon-web-services apache-flink aws-glue flink-sql

Hermit asked 3/4, 2022 at 15:33

4

Invalid Schema error in AWS Glue created via Terraform

I have a Kinesis Firehose configuration in Terraform, which reads data from Kinesis stream in JSON, converts it to Parquet using Glue and writes to S3. There is something wrong with data format con...

amazon-web-services terraform aws-glue amazon-kinesis amazon-kinesis-firehose

Enclose asked 25/6, 2021 at 4:36

3

Is version control possible in AWS Glue ETL jobs?

I am fairly new to AWS Glue. I have tried creating some jobs and it works fine, now i want to take it a step further. Say we have other developers working and need to find a way to distinguish betw...

version-control aws-glue

Infatuation asked 4/2, 2020 at 15:58

2

Parquet column cannot be converted in file, Expected: bigint, Found: INT32

I have a glue table with column tlc and its datatype is Bigint. I am trying to do the following using PySpark: Read the Glue table and write it in a Dataframe Join with another table Write the re...

apache-spark pyspark amazon-emr parquet aws-glue

Fortuitous asked 24/3, 2020 at 3:34

4

Solved

Event based trigger of AWS Glue Crawler after a file is uploaded into a S3 Bucket?

Is it possible to trigger an AWS Glue crawler on new files, that get uploaded into a S3 bucket, given that the crawler is "pointed" to that bucket? In other words: a file upload generates an event,...

amazon-web-services amazon-s3 aws-glue

Zena asked 16/2, 2018 at 13:47

7

Solved

AWS Glue Access denied for crawler with administrator policy attached

I am trying to run a crawler across an s3 datastore in my account which contains two csv files. However, when I try to run the crawler, no tables are loaded, and I see the following errors in cloud...

amazon-s3 aws-glue

Sleepyhead asked 17/8, 2018 at 16:19

8

AWS Glue Crawler Not Creating Table

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes. The crawler takes roughly 20 seconds to run and the logs show it successfu...

amazon-web-services aws-glue

Fluoroscope asked 1/11, 2017 at 17:2

aws-glue Questions

Recommended topics

Hot tags