aws-glue - 2

2

AWS Glue - GlueContext: read partitioned data from S3, add partitions as columns of DynamicFrame

I have some data stored in an S3 bucket in parquet format, following a hive-like partitioning style, with these partition keys: retailer - year - month - day. Eg my-bucket/ retailer=a/ year=202...

pyspark aws-glue

Leukocyte asked 26/2, 2020 at 11:50

1

Push to repository button is disabled in AWS glue job for Jupyter Notebook job even after configuring Git in Version Control

I have configured my Jupyter Notebook job in aws glue etl with Git PAT and repo names, but the Push to repository button is still disabled. I have another visual ETL job for which button is enabled...

git jupyter-notebook aws-glue

Byline asked 25/9, 2023 at 6:2

2

How to convert json files stored in s3 to csv using glue?

I have some json files stored in s3, and I need to convert them, at the folder folder they are, to csv format. Currently I'm using glue to map them to athena, but, as I said, now I need to map the...

amazon-web-services amazon-s3 aws-glue

Lissie asked 21/5, 2019 at 18:30

5

Solved

AWS Glue: How to add a column with the source filename in the output?

Does anyone know of a way to add the source filename as a column in a Glue job? We created a flow where we crawled some files in S3 to create a schema. We then wrote a job that transforms the file...

amazon-web-services apache-spark pyspark aws-glue

Nowicki asked 10/5, 2018 at 16:35

2

Solved

AWS Glue: Removing quote character from a CSV file while writing

I have a csv file in S3, which does not have any quotes. eg. dVsdfsCcn7j6,r:werwerwerwerwerwerwerwer,_User$SSSSSBFwJ,login,password,false,2011-10-27 10:46:55,d24c2465e-9945645c5-4645509-a74574...

aws-glue

Perspicacious asked 27/3, 2018 at 4:8

1

How to set "zstd" compression level in AWS Glue job?

Background "zstd" compression codec has 22 compression levels. I read this Uber blog. Regarding compressing time and file size, I verified using df.to_parquet with our data and got same e...

amazon-web-services apache-spark aws-glue delta-lake zstd

Cymophane asked 29/9, 2023 at 19:12

1

Solved

Optimizing pyspark code by calculating Dataframe size

I'm using the following function (partly from a code snippet I got from this post: Compute size of Spark dataframe - SizeEstimator gives unexpected results and adding my calculations according to w...

amazon-web-services pyspark optimization aws-glue

Incisive asked 28/9, 2023 at 22:28

5

Is AWS Lambda preferred over AWS Glue Job?

In AWS Glue job, we can write some script and execute the script via job. In AWS Lambda too, we can write the same script and execute the same logic provided in above job. So, my query is not whats...

amazon-web-services aws-lambda aws-glue

Conspicuous asked 26/8, 2020 at 14:29

2

Solved

How AWS Athena deals with single line JSONs?

I am currently using Athena along with Kinesis Firehose, Glue Crawler. Kinesis Firehose is saving JSON to single line files as below {"name": "Jone Doe"}{"name": "Jane Doe"}{"name": "Jack Doe"} ...

aws-glue amazon-athena amazon-kinesis-firehose

Filial asked 7/6, 2020 at 16:4

3

Solved

Is there a way to run aws glue crawler after job is finished?

For example I run ETL and new fields or columns may be added for target table. To detect table changes a crawler should be run but it has only manual or schedule run. Can crawler be triggered afte...

amazon-web-services aws-glue

Alien asked 11/1, 2018 at 5:46

7

Solved

How to run arbitrary / DDL SQL statements or stored procedures using AWS Glue

Is it possible to execute arbitrary SQL commands like ALTER TABLE from AWS Glue python job? I know I can use it to read data from tables but is there a way to execute other database specific comman...

pyspark aws-glue py4j

Keyway asked 10/11, 2020 at 19:46

2

AWS Glue transform a struct into dynamicframe

I am a little new to AWSGlue. I am working on transform a raw cloudwatch json out into csv with AWSGlue. The transformation script is pretty straight forward, however documentation and example does...

python amazon-web-services aws-glue

Speedwriting asked 13/12, 2017 at 5:20

5

AWS Glue Jupyter Notebook Failed to authenticate user

When I started job with IAM Role AWSGlueServiceNotebookRoleDefault I have this error: Failed to authenticate user due to missing information in request. No information in docs about this error. I d...

amazon-web-services aws-glue

Sollars asked 31/3, 2022 at 10:2

1

Solved

Spark dynamic frame show method yields nothing

So I am using AWS Glue auto-generated code to read csv file from S3 and write it to a table over a JDBC connection. Seems simple, Job runs successfully with no error but it writes nothing. When I c...

python pyspark apache-spark-sql aws-glue

Demarche asked 6/5, 2019 at 22:51

3

Solved

AWS Athena - duplicate columns due to partitionning

We have a glue crawler that read avro files in S3 and create a table in glue catalog accordingly. The thing is that we have a column named 'foo' that came from the avro schema and we also have some...

amazon-web-services amazon-s3 avro aws-glue amazon-athena

Resplendent asked 10/12, 2019 at 13:47

4

AWS glue delete all partitions

I defined several tables in AWS glue. Over the past few weeks, I've had different issues with the table definition which I had to fix manually - I want to change column names, or types, or change ...

amazon-web-services aws-glue amazon-athena aws-glue-data-catalog

Yeung asked 30/3, 2020 at 9:35

8

Solved

Why is input_file_name() empty for S3 catalog sources in pyspark?

I'm trying to get the input file name (or path) for every file loaded through an S3 data catalog in AWS Glue. I've read in a few places that input_file_name() should provide this information (tho...

amazon-web-services apache-spark amazon-s3 pyspark aws-glue

Gillmore asked 28/6, 2019 at 16:58

4

AWS Glue and update duplicating data

I'm using AWS Glue to move multiple files to an RDS instance from S3. Each day I get a new file into S3 which may contain new data, but can also contain a record I have already saved with some upda...

python amazon-web-services pyspark etl aws-glue

Suzysuzzy asked 22/11, 2018 at 19:21

4

Solved

How to list all databases and tables in AWS Glue Catalog?

I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console. How can I access the catalog and list all databases and tables...

apache-spark-sql aws-glue

Alum asked 6/9, 2017 at 16:45

4

Solved

Problems when writing parquet with timestamps prior to 1900 in AWS Glue 3.0

When switching from Glue 2.0 to 3.0, which means also switching from Spark 2.4 to 3.1.1, my jobs start to fail when processing timestamps prior to 1900 with this error: An error occurred while call...

amazon-web-services apache-spark pyspark aws-glue

Addie asked 23/8, 2021 at 10:51

3

Solved

Cross-account access to S3 for AWS Glue in another account

I want to set up cross account access to an S3 bucket for AWS Glue in another account to crawl. We have two accounts in our environment (A & B): AccountA has an S3 bucket with ACL permissions ...

amazon-s3 amazon-iam aws-glue

Kosaka asked 2/10, 2020 at 16:49

5

How set name for crawled table?

AWS crawler has prefix property for adding new tables. So If I leave prefix empty and start crawler to s3://my-bucket/some-table-backup it creates table with name some-table-backup. Is there a way ...

amazon-web-services aws-glue

Athene asked 18/1, 2018 at 13:18

7

AWS Glue Job getting Access Denied when writing to S3

I have a Glue ETL job, created by CloudFormation. This job extracts data from RDS Aurora and write to S3. When I run this job, I get the error below. The job has an IAM service role. This servic...

amazon-web-services amazon-s3 aws-glue

Curtal asked 28/6, 2019 at 19:14

1

Solved

Glue dynamic frame is not populating from s3 bucket

I have a glue job that is not working because the dynamic frame is not populating from a parquet in s3. I have pointed it directly to an object that has data in it, but the dynamic frame is still b...

dataframe amazon-s3 pyspark aws-glue parquet

Unesco asked 22/3, 2023 at 16:52

6

Spark Catalog w/ AWS Glue: database not found

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via spark.catalog.setCurrentDatabase("test...

apache-spark amazon-emr aws-glue

Dewain asked 19/9, 2017 at 3:29

aws-glue Questions

Recommended topics

Hot tags