aws-glue Questions
2
I have some data stored in an S3 bucket in parquet format, following a hive-like partitioning style, with these partition keys: retailer - year - month - day.
Eg
my-bucket/
retailer=a/
year=202...
1
I have configured my Jupyter Notebook job in aws glue etl with Git PAT and repo names, but the Push to repository button is still disabled. I have another visual ETL job for which button is enabled...
Byline asked 25/9, 2023 at 6:2
2
I have some json files stored in s3, and I need to convert them, at the folder folder they are, to csv format.
Currently I'm using glue to map them to athena, but, as I said, now I need to map the...
Lissie asked 21/5, 2019 at 18:30
5
Solved
Does anyone know of a way to add the source filename as a column in a Glue job?
We created a flow where we crawled some files in S3 to create a schema. We then wrote a job that transforms the file...
Nowicki asked 10/5, 2018 at 16:35
2
Solved
I have a csv file in S3, which does not have any quotes.
eg.
dVsdfsCcn7j6,r:werwerwerwerwerwerwerwer,_User$SSSSSBFwJ,login,password,false,2011-10-27
10:46:55,d24c2465e-9945645c5-4645509-a74574...
Perspicacious asked 27/3, 2018 at 4:8
1
Background
"zstd" compression codec has 22 compression levels. I read this Uber blog. Regarding compressing time and file size, I verified using df.to_parquet with our data and got same e...
Cymophane asked 29/9, 2023 at 19:12
1
Solved
I'm using the following function (partly from a code snippet I got from this post: Compute size of Spark dataframe - SizeEstimator gives unexpected results
and adding my calculations according to w...
Incisive asked 28/9, 2023 at 22:28
5
In AWS Glue job, we can write some script and execute the script via job.
In AWS Lambda too, we can write the same script and execute the same logic provided in above job.
So, my query is not whats...
Conspicuous asked 26/8, 2020 at 14:29
2
Solved
I am currently using Athena along with Kinesis Firehose, Glue Crawler. Kinesis Firehose is saving JSON to single line files as below
{"name": "Jone Doe"}{"name": "Jane Doe"}{"name": "Jack Doe"}
...
Filial asked 7/6, 2020 at 16:4
3
Solved
For example I run ETL and new fields or columns may be added for target table. To detect table changes a crawler should be run but it has only manual or schedule run.
Can crawler be triggered afte...
Alien asked 11/1, 2018 at 5:46
7
Solved
Is it possible to execute arbitrary SQL commands like ALTER TABLE from AWS Glue python job? I know I can use it to read data from tables but is there a way to execute other database specific comman...
2
I am a little new to AWSGlue. I am working on transform a raw cloudwatch json out into csv with AWSGlue. The transformation script is pretty straight forward, however documentation and example does...
Speedwriting asked 13/12, 2017 at 5:20
5
When I started job with IAM Role AWSGlueServiceNotebookRoleDefault I have this error:
Failed to authenticate user due to missing information in request.
No information in docs about this error.
I d...
Sollars asked 31/3, 2022 at 10:2
1
Solved
So I am using AWS Glue auto-generated code to read csv file from S3 and write it to a table over a JDBC connection. Seems simple, Job runs successfully with no error but it writes nothing. When I c...
Demarche asked 6/5, 2019 at 22:51
3
Solved
We have a glue crawler that read avro files in S3 and create a table in glue catalog accordingly.
The thing is that we have a column named 'foo' that came from the avro schema and we also have some...
Resplendent asked 10/12, 2019 at 13:47
4
I defined several tables in AWS glue.
Over the past few weeks, I've had different issues with the table definition which I had to fix manually - I want to change column names, or types, or change ...
Yeung asked 30/3, 2020 at 9:35
8
Solved
I'm trying to get the input file name (or path) for every file loaded through an S3 data catalog in AWS Glue.
I've read in a few places that input_file_name() should provide this information (tho...
Gillmore asked 28/6, 2019 at 16:58
4
I'm using AWS Glue to move multiple files to an RDS instance from S3. Each day I get a new file into S3 which may contain new data, but can also contain a record I have already saved with some upda...
Suzysuzzy asked 22/11, 2018 at 19:21
4
Solved
I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console.
How can I access the catalog and list all databases and tables...
Alum asked 6/9, 2017 at 16:45
4
Solved
When switching from Glue 2.0 to 3.0, which means also switching from Spark 2.4 to 3.1.1,
my jobs start to fail when processing timestamps prior to 1900 with this error:
An error occurred while call...
Addie asked 23/8, 2021 at 10:51
3
Solved
I want to set up cross account access to an S3 bucket for AWS Glue in another account to crawl. We have two accounts in our environment (A & B):
AccountA has an S3 bucket with ACL permissions ...
Kosaka asked 2/10, 2020 at 16:49
5
AWS crawler has prefix property for adding new tables. So If I leave prefix empty and start crawler to s3://my-bucket/some-table-backup it creates table with name some-table-backup. Is there a way ...
Athene asked 18/1, 2018 at 13:18
7
I have a Glue ETL job, created by CloudFormation. This job extracts data from RDS Aurora and write to S3.
When I run this job, I get the error below.
The job has an IAM service role.
This servic...
Curtal asked 28/6, 2019 at 19:14
1
Solved
I have a glue job that is not working because the dynamic frame is not populating from a parquet in s3.
I have pointed it directly to an object that has data in it, but the dynamic frame is still b...
6
Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via
spark.catalog.setCurrentDatabase("test...
Dewain asked 19/9, 2017 at 3:29
© 2022 - 2024 — McMap. All rights reserved.