aws-glue Questions

1

I'm trying to export a table I crawled from a postgres(rds) database into glue. There's one field with a decimal(10, 2) type. Now I have several problems. Exporting the table from glue(using spark...

1

I'm using pyspark to write on a kafka broker, for that a JAAS security mechanism is set up thus we need to pass username and password as env variables data_frame \ .selectExpr('CAST(id AS STRING)...
Mateya asked 21/3, 2022 at 15:18

3

I am running an AWS Glue job to load a pipe delimited file on S3 into an RDS Postgres instance, using the auto-generated PySpark script from Glue. Initially, it complained about NULL values in som...
Koontz asked 20/12, 2017 at 23:25

3

Solved

I have an ETL job written in python, which consist of multiple scripts with following directory structure; my_etl_job | |--services | | | |-- __init__.py | |-- dynamoDB_service.py | |-- __i...
Americana asked 14/4, 2020 at 21:50

1

Having error while running this query Query on Athena SELECT * FROM "db"."thermostat" where id='95686' and "date" = '2022/03/07' AND hour =13 Projection Partition D...
Leodora asked 7/3, 2022 at 14:2

4

We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream ...
Existence asked 2/3, 2018 at 5:58

2

I'm trying to copy parquet data from another s3 bucket to my s3 bucket. I want to limit the size of each partition to a max of 128 MB. I thought by default spark.sql.files.maxPartitionBytes would h...
Elson asked 30/6, 2020 at 0:36

2

I'm starting with AWS Glue, and want to connect to my on premise mysql server via JDBC. I follow the documentation, create for glue the IAM Role, policy, security group and connection with correct...
Kendakendal asked 8/6, 2019 at 0:46

4

Solved

Hi I have a bunch of CSV's located in S3, a crawler setup via AWS Glue, this crawler builds about 10 tables as it scan 10 folders and only 1 of them where the headers are not being detected. The st...
Diminutive asked 17/5, 2020 at 18:53

2

The scenario is this: Our snowflake will only be accessible by whitelisted IP addresses. If we plan to use AWS Glue what IP address can we use so that it will allow us to connect to snowflake? I ne...
Ovum asked 18/10, 2020 at 5:24

2

I am very new to AWS Glue. I am working on a small project and the ask is to read a file from S3 bucket, transpose it and load it in a mysql table. The source data in S3 bucket looks as below +---...
Lianaliane asked 11/11, 2019 at 20:1

1

Solved

I am new to PySpark and my objective is to use PySpark script in AWS Glue for: reading a dataframe from input file in Glue => done changing columns of some rows which satisfy a condition => ...
Elis asked 27/1, 2022 at 16:21

6

Solved

I found that AWS Glue set up executor's instance with memory limit to 5 Gb --conf spark.executor.memory=5g and some times, on a big datasets it fails with java.lang.OutOfMemoryError. The same is fo...
Josephjosepha asked 28/2, 2018 at 16:21

2

Solved

I have a dataset registered in Glue / Athena, call it my_db.table. I'm able to query it via Athena and everything generally seems to be in order. I'm trying to use this table in a Glue job, but am...
Elimination asked 7/9, 2017 at 21:59

2

Solved

I need to do some grouping job from a Source DynamoDB table, then write each resulting Item to another Target DynamoDB table (or a secondary index of the Source one). Here I see that DynamoDB can ...
Auberge asked 13/4, 2020 at 19:39

1

I'm using jobs from AWS Glue for very fist time, so it is normal that my job does not work but I can't see any detail log about what is wrong, because when I click in "Error Logs" link, o...
Bumkin asked 7/8, 2020 at 12:53

9

Solved

At my wits end here... I have 15 csv files that I am generating from a beeline query like: beeline -u CONN_STR --outputformat=dsv -e "SELECT ... " > data.csv I chose dsv because some ...
Durman asked 25/1, 2019 at 21:57

4

Solved

I am still starting out with AWS Glue and I am trying to connect it to my publicly accessible MySql database hosted on RDS Aurora to get its data. So I start by creating a crawler and in the data ...
Impossibly asked 17/7, 2018 at 6:10

5

When running the AWS Glue crawler it does not recognize timestamp columns. I have correctly formatted ISO8601 timestamps in my CSV file. First I expected Glue to automatically classify these as ti...
Taipan asked 16/5, 2019 at 23:12

2

How can we write user-defined functions in AWS-Glue script using PySpark (Python) on either Dynamic-frame or Data-frame?
Conlon asked 21/9, 2018 at 9:26

3

Solved

I have a successfully running AWS Glue Job that transform data for predictions. I would like to stop processing and output status message (which is working) if I reach a specific condition: if spec...
Racoon asked 9/4, 2021 at 21:14

3

Solved

In the documentation, I cannot find any way of checking the run status of a crawler. The only way I am doing it currently is constantly checking AWS to check if the file/table has been created. Is ...
Nystatin asked 25/10, 2018 at 19:18

3

There are a lot of methods in API which received this with default "" value. Is it just string marker but again what it purpose?
Protective asked 17/1, 2018 at 12:2

2

I'm getting this error from AWS Athena: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'i...
Tenorio asked 26/9, 2019 at 17:38

4

I have an AWS Glue job that reads from a data source like so: datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev-data", table_name = "contacts", transformation_ctx = "data...
Consols asked 30/5, 2018 at 18:3

© 2022 - 2024 — McMap. All rights reserved.