aws-glue - 5

1

AWS Glue Bad value for type BigDecimal : NaN

I'm trying to export a table I crawled from a postgres(rds) database into glue. There's one field with a decimal(10, 2) type. Now I have several problems. Exporting the table from glue(using spark...

apache-spark pyspark aws-glue aws-glue-data-catalog aws-glue-spark

Pooley asked 13/9, 2021 at 13:3

1

How to pass environment variables to AWS Glue

I'm using pyspark to write on a kafka broker, for that a JAAS security mechanism is set up thus we need to pass username and password as env variables data_frame \ .selectExpr('CAST(id AS STRING)...

pyspark environment-variables aws-glue

Mateya asked 21/3, 2022 at 15:18

3

AWS Glue PySpark replace NULLs

I am running an AWS Glue job to load a pipe delimited file on S3 into an RDS Postgres instance, using the auto-generated PySpark script from Glue. Initially, it complained about NULL values in som...

pyspark aws-glue

Koontz asked 20/12, 2017 at 23:25

3

Solved

How to use extra files for AWS glue job

python amazon-s3 aws-glue

Americana asked 14/4, 2020 at 21:50

1

INVALID_TABLE_PROPERTY: For input string: "0-23" (property: projection.hour.digits) on Athena

Having error while running this query Query on Athena SELECT * FROM "db"."thermostat" where id='95686' and "date" = '2022/03/07' AND hour =13 Projection Partition D...

aws-glue database-partitioning amazon-athena

Leodora asked 7/3, 2022 at 14:2

4

How to move data from Glue to Dynamodb

We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream ...

amazon-s3 amazon-dynamodb etl aws-glue

Existence asked 2/3, 2018 at 5:58

2

spark.sql.files.maxPartitionBytes not limiting max size of written partitions

I'm trying to copy parquet data from another s3 bucket to my s3 bucket. I want to limit the size of each partition to a max of 128 MB. I thought by default spark.sql.files.maxPartitionBytes would h...

apache-spark apache-spark-sql aws-glue

Elson asked 30/6, 2020 at 0:36

2

AWS Glue - JDBC Connection test failed

I'm starting with AWS Glue, and want to connect to my on premise mysql server via JDBC. I follow the documentation, create for glue the IAM Role, policy, security group and connection with correct...

amazon-web-services aws-glue

Kendakendal asked 8/6, 2019 at 0:46

4

Solved

AWS Glue not detecting header in CSV

Hi I have a bunch of CSV's located in S3, a crawler setup via AWS Glue, this crawler builds about 10 tables as it scan 10 folders and only 1 of them where the headers are not being detected. The st...

amazon-web-services amazon-s3 aws-glue

Diminutive asked 17/5, 2020 at 18:53

2

Can you have permanent IP address with AWS Glue so that it can be whitelisted in Snowflake?

The scenario is this: Our snowflake will only be accessible by whitelisted IP addresses. If we plan to use AWS Glue what IP address can we use so that it will allow us to connect to snowflake? I ne...

snowflake-cloud-data-platform aws-glue whitelist

Ovum asked 18/10, 2020 at 5:24

2

Adding a column in AWS glue dynamic dataframe

I am very new to AWS Glue. I am working on a small project and the ask is to read a file from S3 bucket, transpose it and load it in a mysql table. The source data in S3 bucket looks as below +---...

amazon-web-services aws-glue

Lianaliane asked 11/11, 2019 at 20:1

1

Solved

Update some rows of a dataframe or create new dataframe in PySpark

I am new to PySpark and my objective is to use PySpark script in AWS Glue for: reading a dataframe from input file in Glue => done changing columns of some rows which satisfy a condition => ...

dataframe pyspark aws-glue

Elis asked 27/1, 2022 at 16:21

6

Solved

AWS Glue executor memory limit

I found that AWS Glue set up executor's instance with memory limit to 5 Gb --conf spark.executor.memory=5g and some times, on a big datasets it fails with java.lang.OutOfMemoryError. The same is fo...

amazon-web-services apache-spark aws-glue

Josephjosepha asked 28/2, 2018 at 16:21

2

Solved

AWS Glue unable to access input data set

I have a dataset registered in Glue / Athena, call it my_db.table. I'm able to query it via Athena and everything generally seems to be in order. I'm trying to use this table in a Glue job, but am...

amazon-web-services pyspark amazon-athena aws-glue

Elimination asked 7/9, 2017 at 21:59

2

Solved

Can AWS Glue write to DynamoDB?

I need to do some grouping job from a Source DynamoDB table, then write each resulting Item to another Target DynamoDB table (or a secondary index of the Source one). Here I see that DynamoDB can ...

amazon-web-services amazon-dynamodb aws-glue

Auberge asked 13/4, 2020 at 19:39

1

'Log group does not exist' when AWS Glue fails

I'm using jobs from AWS Glue for very fist time, so it is normal that my job does not work but I can't see any detail log about what is wrong, because when I click in "Error Logs" link, o...

amazon-cloudwatch aws-glue aws-glue-spark

Bumkin asked 7/8, 2020 at 12:53

9

Solved

AWS Glue Crawler Cannot Extract CSV Headers

At my wits end here... I have 15 csv files that I am generating from a beeline query like: beeline -u CONN_STR --outputformat=dsv -e "SELECT ... " > data.csv I chose dsv because some ...

csv amazon-athena aws-glue

Durman asked 25/1, 2019 at 21:57

4

Solved

At least one security group must open all ingress ports. AWS Glue connecting to RDS

I am still starting out with AWS Glue and I am trying to connect it to my publicly accessible MySql database hosted on RDS Aurora to get its data. So I start by creating a crawler and in the data ...

amazon-web-services aws-glue amazon-vpc

Impossibly asked 17/7, 2018 at 6:10

5

AWS Glue: Crawler does not recognize Timestamp columns in CSV format

When running the AWS Glue crawler it does not recognize timestamp columns. I have correctly formatted ISO8601 timestamps in my CSV file. First I expected Glue to automatically classify these as ti...

aws-glue

Taipan asked 16/5, 2019 at 23:12

2

How to write user-defined function in AWS-Glue Script?

How can we write user-defined functions in AWS-Glue script using PySpark (Python) on either Dynamic-frame or Data-frame?

python pyspark aws-glue

Conlon asked 21/9, 2018 at 9:26

3

Solved

How to stop / exit a AWS Glue Job (PySpark)?

I have a successfully running AWS Glue Job that transform data for predictions. I would like to stop processing and output status message (which is working) if I reach a specific condition: if spec...

amazon-web-services aws-glue aws-glue-spark

Racoon asked 9/4, 2021 at 21:14

3

Solved

Wait until AWS Glue crawler has finished running

In the documentation, I cannot find any way of checking the run status of a crawler. The only way I am doing it currently is constantly checking AWS to check if the file/table has been created. Is ...

amazon-web-services boto3 aws-glue

Nystatin asked 25/10, 2018 at 19:18

3

What is transformation_ctx used for in aws glue?

There are a lot of methods in API which received this with default "" value. Is it just string marker but again what it purpose?

amazon-web-services aws-glue

Protective asked 17/1, 2018 at 12:2

2

HIVE_PARTITION_SCHEMA_MISMATCH

I'm getting this error from AWS Athena: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'i...

amazon-athena aws-glue

Tenorio asked 26/9, 2019 at 17:38

4

Read Headers from Data Source in an AWS Glue Job

I have an AWS Glue job that reads from a data source like so: datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev-data", table_name = "contacts", transformation_ctx = "data...

amazon-web-services pyspark aws-glue

Consols asked 30/5, 2018 at 18:3

aws-glue Questions

Recommended topics

Hot tags