aws-glue Questions
2
Solved
I have bucket which is used as destination for a Kinesis Firehose stream.
Firehose automatically creates date-based prefixes on that bucket using the yyyy/mm/dd/HH format.
Then I created a crawle...
Arielle asked 6/4, 2018 at 19:22
1
Having a very weird problem with Glue. Using it to run some ETL on data I'm moving from MySQL RDS to Redshift. Using the same code I used on another table, where it worked fine and copied all the d...
Arianaariane asked 31/1, 2019 at 19:39
0
I have a datalake constructed on top of AWS S3. I'm using Glue catalog for storing the metadata of datalake tables. These tables will be queried using Athena and spark for various purpose.
While de...
Melidamelilot asked 14/8, 2021 at 3:9
3
I developed a pandas etl script locally and works fine.
I prepared a wheel file and uploaded to s3. All packages are installed properly.
However, when the script run, it shows ImportError: cannot i...
Pritchard asked 18/9, 2020 at 8:58
1
headersAPI = {
'Content-Type': 'application/json'
, 'accept': 'application/json'
,'Authorization': 'Bearer XXXXXXXXXXXXXXXXXXXXXXXXXX',
}
skill_response=requests.get("XXXXXX",headers=h...
Jespersen asked 27/7, 2021 at 20:17
2
Solved
Have been using aws glue python shell jobs to build simple data etl jobs, for spark job, only have used once or twice for converting to orc format or executing spark sql on JDBC data. So wondering ...
Rus asked 7/2, 2020 at 16:34
1
For supporting schema registry on my MSK topic, I found two options -
AWS Glue Schema Registry; and
Confluent Schema Registry
Since, Glue SR is fully managed by AWS, I would prefer to use that. H...
Cesaro asked 28/1, 2021 at 3:19
1
I'm trying to kick off an AWS Glue ETL job in my Python script and check the status of it until the job finishes.
Initially I just did a simple while loop, which waits for 1 minute and checks the ...
Salpingotomy asked 29/8, 2018 at 18:3
5
Solved
I following the tutorial steps as show in https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html
There's no issue connection between local zepplin to AWS Glue. Howeve...
Louise asked 24/9, 2019 at 4:59
1
Solved
Have searched the AWS Glue documents, but could not find the pricing details for AWS Glue worker types G.1X and G.2X. Can someone please explain if there is no cost difference between Standard, G.1...
Farcy asked 17/6, 2021 at 5:19
0
I have read shapefile in a zip format from my S3 bucket successfully through geopandas, but I get error when trying to output the same geodataframe as a shapefile to the same S3 bucket.
The code be...
Nikolas asked 20/6, 2021 at 18:9
3
I have been searching for an example of how to set up Cloudformation for a glue workflow which includes triggers, jobs, and crawlers, but I haven't been able to find much information on it.
This ...
Mobcap asked 8/10, 2019 at 21:30
2
Exactly like in this AWS forum question I was running 2 Jobs concurrently. The Job was configured with Max concurrency: 10 but when executing job.commit() I receive this error message:
py4j.protoc...
Mandibular asked 2/6, 2020 at 9:16
1
Solved
I created a AWS Glue Job using Glue Studio.
It takes data from a Glue Data Catalog, does some transformations, and writes to a different Data Catalog.
When configuring the target node, I enabled th...
Subdebutante asked 18/3, 2021 at 19:32
1
I am trying to transform the JSON dataset from S3 to Glue table schema into an Redshift spectrum for data analysis. While creating external tables, how to transform the DATE fields?
Need to highli...
Locoism asked 19/3, 2019 at 21:59
2
I am trying to setup AWS Glue environment on my ubuntu Virtual box by following AWS documentation.
I have done the needful like downloading aws glue libs, spark package and setting up spark home a...
2
Solved
I am using AWS Glue to join two tables. By default, it performs INNER JOIN. I want to do a LEFT OUTER JOIN. I referred the AWS Glue documentation but there is no way to pass the join type to the Jo...
5
I am using AWS Glue to create metadata tables.
AWS Glue Crawler data store path: s3://bucket-name/
Bucket structure in S3 is like
├── bucket-name
│ ├── pt=2011-10-11-01
│ │ ├── file1
| | ├...
Toogood asked 9/1, 2018 at 10:27
0
I'm new to Glue jobs and I'm looking to try to use Glue 2.0 to run PySpark jobs (python 3) that require the following python libraries as defined in my requirements.txt. I'm sort of at a loss on ho...
Indiscernible asked 21/3, 2021 at 1:43
0
I am creating Glue Workflow using CDK as shown below. It is composed of Glue jobs and crawlers. Is it possible to mark the status of the Workflow as Error when any of the components fail? Currently...
Eagleeyed asked 12/3, 2021 at 11:6
1
Solved
I would like to do the following in pyspark (for AWS Glue jobs):
JOIN a and b ON a.name = b.name AND a.number= b.number AND a.city LIKE b.city
So for example:
Table a:
Number
Name
City
100...
Elyse asked 11/3, 2021 at 11:54
6
How to I add a current timestamp (extra column) in the glue job so that the output data has an extra column. In this case:
Schema Source Table:
Col1, Col2
After Glue job.
Schema of Destination:
...
Sager asked 22/1, 2018 at 18:46
2
I have partitioned data in CSV files on S3:
s3://bucket/dataset/p=1/*.csv (partition #1)
...
s3://bucket/dataset/p=100/*.csv (partition #100)
I run a classifier over s3://bucket/dataset/ and th...
Lon asked 11/9, 2019 at 13:25
1
I'm trying to copy some files over to the tmp folder using boto3 in a glue job. Here's my code:
import pandas as pd
import numpy as np
import boto3
bucketname = "<bucket_name>"
s3 ...
Wollastonite asked 25/2, 2021 at 20:50
2
Solved
Is it possible to crawl S3 file encrypted using CSE-KMS in AWS Glue? I know that Athena can do that, but haven't found similar functionality in Glue crawler
Quincuncial asked 14/2, 2018 at 14:6
© 2022 - 2024 — McMap. All rights reserved.