amazon-emr - 2

2

Solved

Is there a way to send EMR logs to CloudWatch instead of S3. We would like to have all our services logs in one location. Seems like the only thing you can do is set up alarms for monitoring but th...

amazon-web-services amazon-emr amazon-cloudwatch amazon-cloudwatchlogs

Bejarano asked 2/12, 2019 at 22:7

3

Solved

How can I use graphframes with pyspark on AWS EMR?

I'm trying to use the graphframes package in pyspark in Jupyter Notebook (using Sagemaker and sparkmagic) on AWS EMR. I've tried adding a configuration option when creating the EMR cluster in the A...

apache-spark pyspark jupyter-notebook amazon-emr graphframes

Glassworks asked 4/6, 2019 at 14:47

2

Solved

AWS Glue vs EMR Serverless

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very...

amazon-web-services amazon-emr aws-glue emr-serverless

Azaleah asked 12/12, 2021 at 8:10

2

AWS EMR perform "bootstrap" script on all the already running machines in cluster

I have one EMR cluster which is running 24/7. I can't turn it off and launch the new one. What I would like to do is to perform something like bootstrap action on the already running cluster, pre...

python amazon-web-services boto emr amazon-emr

Oralee asked 26/10, 2014 at 17:18

3

How to fix error on pyspark EMR Notebook - AnalysisException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I am trying to run SQL queries using the spark.sql() or sqlContext.sql() method (here spark is the variable for SparkSession object available to us when we start EMR Notebook) on a public dataset u...

apache-spark hadoop pyspark amazon-emr hive-metastore

Benefic asked 4/9, 2019 at 0:56

3

What is the difference between AWS Elastic MapReduce and AWS Kinesis Data Analytics?

I'm executing a Flink Job with this tools. I think both can do exactly the same with the proper configuration. Does Kinesis Data Analytics do something that EMR can not do or vice versa? Amazon Ki...

amazon-web-services apache-flink amazon-emr amazon-kinesis-firehose

Brosine asked 17/5, 2019 at 12:26

5

Solved

AWS EMR: Error parsing parameter: Expected: '=', received: 'EOF' for input:

I'm trying to create a cluster from inside one of my EC2 instances. Typing the following command to start my cluster- aws emr create-cluster --release-label emr-5.20.0 --instance-groups instance-g...

amazon-web-services amazon-ec2 aws-cli amazon-emr

Leavis asked 15/3, 2019 at 20:35

2

Solved

How to add functions from custom JARs to EMR cluster?

I created an EMR cluster on AWS with Spark and Livy. I submitted a custom JAR with some additional libraries (e.g. datasources for custom formats) as a custom JAR step. However, the stuff from the ...

apache-spark amazon-emr livy

Vociferation asked 19/6, 2019 at 11:18

2

Solved

Amazon Elastic MapReduce - mass insert from S3 to DynamoDB is incredibly slow

I need to perform an initial upload of roughly 130 million items (5+ Gb total) into a single DynamoDB table. After I faced problems with uploading them using the API from my application, I decided ...

amazon-s3 hive amazon-dynamodb amazon-emr

Woolpack asked 21/5, 2012 at 9:58

0

Render HTML in Jupyter notebook on EMR

I need to render a HTML from a cell in Jupyter Notebook on EMR cluster. Things that have not worked so far: using IPython display from IPython.core.display import display, HTML example = '<htm...

python html amazon-s3 amazon-emr jupyter-lab

Calistacalisthenics asked 25/11, 2021 at 15:55

3

spark execution - a single way to access file contents in both the driver and executors

According to this question - --files option in pyspark not working the sc.addFiles option should work for accessing files in both the driver and executors. But I cannot get it to work on the execut...

apache-spark pyspark amazon-emr

Efficient asked 27/1, 2021 at 15:42

2

AWS EMR: Pyspark: Rdd: mappartitions: Could not find valid SPARK_HOME while searching: Spark closures

I'm having a pyspark job which runs without any issues when ran locally, but when It runs from the aws cluster, it gets stuck at the point when it reaches the below code. The job just process 100 r...

apache-spark pyspark apache-spark-sql python-requests amazon-emr

Ladykiller asked 16/10, 2021 at 2:14

1

amazon emr jupyterhub and spark cluster; notebook has no autocomplete

The pyspark3, pyspark, and spark kearnels in jupyterhub docker on amazon emr do not seem to allow autocomplete of function names or the doc string , shift-tab. Has anyone else noticed this behaviou...

pyspark jupyter-notebook amazon-emr

Trimmer asked 9/9, 2018 at 14:10

4

AWS VPC identify private and public subnet

I have a VPC in AWS account and there are 5 subnets associated with that VPC. Subnets are of 2 types - Public and private. How to identify which subnet is public and which is private ? Each subnet ...

amazon-web-services amazon-emr amazon-vpc subnet

Itemize asked 16/2, 2018 at 16:17

2

Solved

EMR ignores spark submit parameters (memory/cores/etc)

I'm trying to use all resources on my EMR cluster. The cluster itself is 4 m4.4xlarge machines (1 driver and 3 workers) with 16 vCore, 64 GiB memory, EBS Storage:128 GiB When launching the cluster ...

amazon-web-services apache-spark amazon-emr

Astonish asked 22/9, 2021 at 14:46

2

AWS EMR pandas conflict with numpy in pyspark after bootstrapping

After launching cluster with the below bootstrap code and getting the below stdout, when I try to import pandas in pyspark, i get the following error due to conflict with a different numpy version ...

pandas amazon-web-services numpy pyspark amazon-emr

Malvina asked 16/7, 2021 at 9:31

2

Install pandas on EMR cluster

TLDR - I want to run the command sudo yes | sudo pip3 uninstall numpy twice in EMR bootstrap actions but it runs only once. I will first say that my goal is to run a Pyspark-enabled EMR managed not...

python amazon-web-services pyspark amazon-emr

Transposition asked 10/8, 2021 at 9:14

0

Show EMR stdout logs in console

How can I make stdout logs appear in the EMR Step tabs. The logs are in the S3 bucket but only the stdout won't show.

amazon-web-services amazon-emr

Ebon asked 27/8, 2021 at 19:2

1

Install more Python package/library to each cluster after creating an AWS EMR

I'm newly use Spark with PySpark on JupyterHub. I understand that before creating an EMR I can set the bootstrap to setup the environment in each cluster, like Python package/library. But If I alre...

apache-spark pyspark amazon-emr jupyterhub

Scorecard asked 22/5, 2020 at 12:9

2

Solved

When creating notebook, got 'Service role does not have permission to access the LocationUri {}' error

When I create an AWS EMR Notebook, got the below error. The service role is EMR_Notebook_DefaultRole. Service role does not have permission to access the LocationUri {} What would be the root caus...

amazon-web-services jupyter-notebook amazon-emr

Vauntcourier asked 28/1, 2021 at 16:48

4

Solved

What is the correct syntax for running a bash script as a step in EMR?

I am trying to run a bash script as a step after EMR completes bootstrapping. Following is my terraform code: step { action_on_failure = "CONTINUE" name = "Setup Hadoop configuration" hadoop_jar...

bash amazon-emr

Chloe asked 11/8, 2018 at 21:42

1

Solved

Running EMR job from ECS Docker container

I have containerized ML job code written in python into a docker container and able to run as docker service using Amazon ECS. I would like to run in distributed way using Spark - Pyspark and deplo...

amazon-emr amazon-ecs

Graybeard asked 25/5, 2017 at 12:5

6

How can I get Zeppelin to restart cleanly on an EMR cluster?

I am running an EMR cluster and trying to use a Zeppelin notebook for data analysis. Versions: Release label:emr-5.2.1 Hadoop distribution: Amazon 2.7.3 Hive 2.1.0 Spark 2.0.2 Zeppelin 0.6.2 I ...

amazon-web-services hadoop pyspark amazon-emr apache-zeppelin

Pudendum asked 3/2, 2017 at 20:26

4

Automatic AWS DynamoDB to S3 export failing with "role/DataPipelineDefaultRole is invalid"

Precisely following the step-by-step instructions on this page I am trying to export contents of one of my DynamoDB tables to an S3 bucket. I create a pipeline exactly as instructed but it fails to...

export amazon-dynamodb amazon-emr amazon-iam amazon-data-pipeline

Kellene asked 6/3, 2015 at 20:21

3

Solved

Pyspark - Load file: Path does not exist

I am a newbie to Spark. I'm trying to read a local csv file within an EMR cluster. The file is located in: /home/hadoop/. The script that I'm using is this one: spark = SparkSession \ .builder \ ...

apache-spark pyspark emr amazon-emr apache-spark-sql

Cigarette asked 7/2, 2017 at 13:51

amazon-emr Questions

Recommended topics

Hot tags