aws-glue - McMap

2

Unable to read json files in AWS Glue using Apache Spark

For our use case we need to load in json files from an S3 bucket. As processing tool we are using AWS Glue. But because we will soon be migrating to Amazon EMR, we are already developing our Glue j...

scala apache-spark amazon-s3 aws-glue

Weinrich asked 24/1, 2023 at 15:38

7

Solved

What is the difference between AWS Glue ETL Job and AWS EMR?

If I had to perform ETL on a huge dataset(say 1Tb) stored in S3 as csv files, Both AWS Glue ETL job and AWS EMR steps can be used. Then how is AWS Glue different from AWS EMR. And which is the bett...

amazon-web-services amazon-s3 etl amazon-emr aws-glue

Circassia asked 7/6, 2020 at 20:19

3

Solved

Glue crawler exclude patterns

I have an s3 bucket that I'm trying to crawl and catalog. The format is something like this, where the SQL files are DDL queries (CREATE TABLE statements) that match the schema of the different dat...

aws-glue

Gaikwar asked 15/2, 2018 at 16:55

4

Solved

What permission am I missing for AWS Glue and Development Endpoint?

I'm getting the following error when I try to create a development endpoint for AWS Glue. { "service":"AWSGlue", "statusCode":400, "errorCode":"Validati...

amazon-web-services amazon-iam aws-glue

Soudan asked 12/2, 2018 at 19:30

2

Solved

AWS Glue python install - Could not find a version

I am trying to use the AWSGlue module in Python, but cannot install the module in the terminal. sh-4.2$ pip install awsglue Collecting awsglue Could not find a version that satisfies the requireme...

python amazon-web-services aws-glue

Unutterable asked 28/3, 2019 at 15:28

5

Solved

AWS Glue cannot create database from crawler: permission denied

I am trying to use an AWS Glue crawler on an S3 bucket to populate a Glue database. I run the Create Crawler wizard, select my datasource (the S3 bucket with the avro files), have it create the IAM...

amazon-web-services amazon-athena aws-glue

Clarence asked 20/8, 2019 at 20:54

4

Solved

Why does AWS Glue say "Max concurrent runs exceeded", when there are no jobs running?

I have an AWS Glue job, with max concurrent runs set to 1. The job is currently not running. But when I try to run it, I keep getting the error: "Max concurrent runs exceeded". Deleting a...

amazon-web-services aws-glue

Essary asked 18/3, 2021 at 12:3

2

get list of tables in database using boto3

I’m trying to get a list of the tables from a database in my aws data catalog. I’m trying to use boto3. I’m running the code below on aws, in a sagemaker notebook. It runs forever (like over 30 min...

python-3.x boto3 aws-glue aws-glue-data-catalog

Heisser asked 7/8, 2019 at 20:1

9

Solved

Optional job parameter in AWS Glue?

How can I implement an optional parameter to an AWS Glue Job? I have created a job that currently have a string parameter (an ISO 8601 date string) as an input that is used in the ETL job. I would...

python amazon-web-services aws-glue

Durango asked 4/9, 2018 at 8:27

8

AWS Glue psycopg2 installation

I'm trying to run a code that uses psycopg2 to manipulate a Redshift instance. I have tried by importing a wheel file as I see they are supported in Glue python jobs. I see the library is installed...

python amazon-web-services psycopg2 aws-glue

Communistic asked 4/8, 2020 at 11:34

10

AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

Part One : I tried glue crawler to run on dummy csv loaded in s3 it created a table but when I try view table in athena and query it it shows Zero Records returned. But the demo data of ELB in At...

amazon-web-services csv amazon-redshift amazon-athena aws-glue

Juni asked 13/11, 2017 at 14:41

5

How can I use an external python library in AWS Glue?

First stack overflow question here. Hope I do this correctly: I need to use an external python library in AWS glue. "Openpyxl" is the name of the library. I follow these directions: https://docs....

python amazon-web-services openpyxl aws-glue

Avraham asked 2/10, 2019 at 16:55

1

Performance of PySpark DataFrames vs Glue DynamicFrames

So I recently started using Glue and PySpark for the first time. The task was to create a Glue job that does the following: Load data from parquet files residing in an S3 bucket Apply a filter to ...

pyspark aws-glue

Hoick asked 20/4, 2022 at 13:53

4

Solved

Is there a temporary folder that I can access while using AWS Glue?

Is there a temporary folder that I can access to hold files temporarily while running processes within AWS glue? For example, in Lambda we have access to a /tmp directory as long as the process is ...

amazon-web-services pyspark aws-glue

Symphony asked 12/1, 2018 at 18:29

2

Solved

What options can be passed to AWS Glue DynamicFrame.toDF()?

The documentation on toDF() method specifies that we can pass an options parameter to this method. But it does not specify what those options can be (https://docs.aws.amazon.com/glue/latest/dg/aws-...

amazon-web-services aws-glue aws-glue-spark

Malpighi asked 5/10, 2020 at 19:54

1

AWS Glue Job: SchemaColumnConvertNotSupportedException when trying to write parquet file to S3

I have a table in the AWS Glue catalog that has datatypes of all strings and the files are stored as parquet files in S3. I want to create a Glue job that will simply read the data in from that cat...

python apache-spark amazon-s3 pyspark aws-glue

Reduplication asked 8/8, 2019 at 13:28

5

Solved

How to create AWS Glue table where partitions have different columns? ('HIVE_PARTITION_SCHEMA_MISMATCH')

As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table...

amazon-web-services amazon-s3 amazon-athena aws-glue

Hymn asked 15/9, 2017 at 13:44

2

AWS Glue Job Input Parameters

I am relatively new to AWS and this may be a bit less technical question, but at present AWS Glue notes a maximum of 25 jobs permitted to be created. We are loading in a series of tables that each ...

amazon-web-services aws-glue

Blacking asked 13/9, 2018 at 15:8

2

Logger output in AWS Sagemaker Jupyter notebook

I would like to see the custom logs that I create inside an AWS Sagemaker JupyterLab notebook (that uses a Glue development endpoint). I want to see them as the output of a notebook cell. I tried ...

python jupyter-notebook aws-glue amazon-sagemaker jupyter-lab

Valentino asked 28/2, 2020 at 12:31

3

convert spark dataframe to aws glue dynamic frame

I tried converting my spark dataframes to dynamic to output as glueparquet files but I'm getting the error 'DataFrame' object has no attribute 'fromDF'" My code uses heavily spark dataframes....

apache-spark pyspark aws-glue

Bregma asked 24/11, 2019 at 4:25

2

Access AWS Glue from local Spark

Is there any way to run local master Spark SQL queries against AWS Glue? Launch this code on my local PC: SparkSession.builder() .master("local") .enableHiveSupport() .config("hive.metastore.c...

amazon-web-services apache-spark apache-spark-sql aws-glue

Distant asked 15/9, 2018 at 12:49

2

AWS Glue - Don't know how to save NullType to REDSHIFT

I have the below simple script for AWS Glue. I have a text file with empty cells and a table which accepts NULL values. When I run the glue job it fails with the exception, "Don't know how to save ...

python-3.x amazon-redshift etl aws-glue

Mizuki asked 28/11, 2017 at 0:24

1

Glue Dynamic Frame is way slower than regular Spark

In the image below we have the same glue job run with three different configurations in terms of how we write to S3: We used a dynamic frame to write to S3 We used a pure spark frame to write to S...

amazon-web-services apache-spark amazon-s3 aws-glue

Delgado asked 21/12, 2021 at 8:25

2

How to include AWS Glue crawler in Step Function

This is my requirement: I have a crawler and a pyspark job in AWS Glue. I have to setup the workflow using step function. Questions: How can I add Crawler as the first state. What are the paramete...

amazon-web-services aws-glue aws-step-functions

Superintendency asked 29/1, 2020 at 11:20

4

(AWS) Athena: Query Results seem too short

My Athena queries appear to be too short in their results. Trying to figure out Why? Setup: Glue Catalogs (118.6 Gig in size). Data: Stored in S3 in both CSV and JSON format. Athena Query: Wh...

amazon-web-services amazon-s3 amazon-athena aws-glue

Must asked 18/1, 2018 at 19:26

aws-glue Questions

Recommended topics

Hot tags