elastic-map-reduce

10

Solved

It has been suggested on Amazon docs http://aws.amazon.com/dynamodb/ among other places, that you can backup your dynamodb tables using Elastic Map Reduce, I have a general understanding of how thi...

amazon-s3 backup amazon-dynamodb elastic-map-reduce

Leonaleonanie asked 29/11, 2012 at 16:49

3

installing GIT on EMR

1) I have been told that git comes stock installed on EMR. Is this true ? I believe not, as I can confirm that "git" is not found in my elastic-mapreduce ssh terminal. See: https://raw.github.com/g...

git elastic-map-reduce

Podiatry asked 25/7, 2012 at 15:59

1

Elastic Mapreduce Map output lost

I'm running a large (more than 100 nodes) series of mapreduce jobs on Amazon Elastic MapReduce. In the reduce phase, already-completed map tasks keep failing with Map output lost, rescheduling: g...

hadoop amazon-web-services jetty elastic-map-reduce amazon-emr

Pennoncel asked 19/4, 2012 at 6:39

2

Slow Performance with Apache Spark Gradient Boosted Tree training runs

I'm experimenting with Gradient Boosted Trees learning algorithm from ML library of Spark 1.4. I'm solving a binary classification problem where my input is ~50,000 samples and ~500,000 features. M...

amazon-web-services machine-learning apache-spark elastic-map-reduce

Boyles asked 21/9, 2015 at 19:22

4

copy files from amazon s3 to hdfs using s3distcp fails

I am trying to copy files from s3 to hdfs using workflow in EMR and when I run the below command the jobflow successfully starts but gives me an error when it tries to copy the file to HDFS .Do i n...

hadoop amazon-s3 hdfs elastic-map-reduce

Coacervate asked 31/1, 2013 at 17:0

2

Solved

Why does Yarn on EMR not allocate all nodes to running Spark jobs?

I'm running a job on Apache Spark on Amazon Elastic Map Reduce (EMR). Currently I'm running on emr-4.1.0 which includes Amazon Hadoop 2.6.0 and Spark 1.5.0. When I start the job, YARN correctly ha...

apache-spark hadoop-yarn emr amazon-emr elastic-map-reduce

Horrorstruck asked 26/11, 2015 at 14:16

3

How to write data in Elasticsearch from Pyspark?

I have integrated ELK with Pyspark. saved RDD as ELK data on local file system rdd.saveAsTextFile("/tmp/ELKdata") logData = sc.textFile('/tmp/ELKdata/*') errors = logData.filter(lambda line: "r...

elasticsearch apache-spark pyspark elastic-map-reduce

Lunetta asked 19/1, 2016 at 6:19

3

Solved

How to register S3 Parquet files in a Hive Metastore using Spark on EMR

I am using Amazon Elastic Map Reduce 4.7.1, Hadoop 2.7.2, Hive 1.0.0, and Spark 1.6.1. Use case: I have a Spark cluster used for processing data. That data is stored in S3 as Parquet files. I want...

apache-spark hive elastic-map-reduce apache-spark-1.6

Man asked 21/7, 2016 at 0:36

0

How to upgrade Data Pipeline definition from EMR 3.x to 4.x/5.x?

I would like to upgrade my AWS data pipeline definition to EMR 4.x or 5.x, so I can take advantage of Hive's latest features (version 2.0+), such as CURRENT_DATE and CURRENT_TIMESTAMP, etc. The c...

amazon-web-services amazon-emr elastic-map-reduce amazon-data-pipeline

Brooklynese asked 17/12, 2017 at 18:17

2

Solved

Use S3DistCp to copy file from S3 to EMR

I am struggling to find a way to use S3DistCp in my AWS EMR Cluster. Some old examples which show how to add s3distcp as an EMR step use elastic-mapreduce command which is not used anymore. Some ...

amazon-s3 aws-sdk amazon-emr elastic-map-reduce s3distcp

Glassworker asked 8/9, 2016 at 11:38

7

Solved

Scheduling A Job on AWS EC2

I have a website running on AWS EC2. I need to create a nightly job that generates a sitemap file and uploads the files to the various browsers. I'm looking for a utility on AWS that allows this fu...

amazon-ec2 amazon-web-services cron jobs elastic-map-reduce

Challis asked 10/1, 2012 at 23:21

1

Am I fully utilizing my EMR cluster?

Total Instances: I have created an EMR with 11 nodes total (1 master instance, 10 core instances). job submission: spark-submit myApplication.py graph of containers: Next, I've got these gra...

amazon-web-services apache-spark pyspark elastic-map-reduce

Subcartilaginous asked 22/1, 2017 at 1:8

3

Solved

How to read a file from s3 in EMR?

I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option. I have tried two solutions: org.apache.hadoop.fs.S3FileSystem: throws a NullPointerException. com.amaz...

java hadoop amazon-s3 elastic-map-reduce

Gyatt asked 12/6, 2014 at 12:43

5

Solved

Drop all partitions from a hive table?

How can I drop all partitions currently loaded in a Hive table? I can drop a single partition with alter table <table> drop partition(a=, b=...); I can load all partitions with the recover ...

hive elastic-map-reduce

Cheery asked 19/3, 2013 at 5:52

4

Solved

In Hadoop, where can i change default url ports 50070 and 50030 for namenode and jobtracker webpages

There must be a way to change the ports 50070 and 50030 so that the following urls display the clustr statuses on the ports i pick NameNode - http://localhost:50070/ JobTracker - http://localhost:...

hadoop nosql mapreduce hbase elastic-map-reduce

Sadick asked 16/11, 2012 at 19:1

1

Solved

Get a yarn configuration from commandline

In EMR, is there a way to get a specific value of the configuration given the configuration key using the yarn command? For example I would like to do something like this yarn get-config yarn.sch...

hadoop hadoop-yarn hadoop2 emr elastic-map-reduce

Millican asked 7/1, 2016 at 22:31

3

Spark + EMR using Amazon's "maximizeResourceAllocation" setting does not use all cores/vcores

I'm running an EMR cluster (version emr-4.2.0) for Spark using the Amazon specific maximizeResourceAllocation flag as documented here. According to those docs, "this option calculates the maximum c...

apache-spark hadoop-yarn emr amazon-emr elastic-map-reduce

Lovesome asked 30/11, 2015 at 16:51

3

Solved

Exporting Hive Table to a S3 bucket

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this: CREATE TABLE csvimport(id BIGINT, time STRING, log STRING) ROW FORMAT DELIMIT...

amazon-s3 hive elastic-map-reduce emr

Tremulant asked 28/2, 2012 at 20:48

2

How to mute apache zookeeper debug messages (AWS EMR)?

How to mute DEBUG messages on AWS Elastic MapReduce Master node? hbase(main):003:0> list TABLE mydb 1 row(s) in 0.0510 seconds hbase(main):004:0> 00:25:17.104 [main-SendThread(ip-172-31-1...

hadoop amazon-web-services apache-zookeeper elastic-map-reduce mute

Quintuplet asked 23/10, 2014 at 0:28

7

Solved

Deleting file/folder from Hadoop

I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails: Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsEx...

hadoop amazon-web-services amazon-s3 elastic-map-reduce

Enphytotic asked 28/5, 2013 at 16:47

2

ColumnFamilyInputFormat - Could not get input splits

I am getting a weird exception when I try to access Cassandra from hadoop, by using ColumnFamilyInputFormat class. In my hadoop process, this is how I connect to cassandra, after including cassand...

hadoop nosql cassandra elastic-map-reduce

Keeper asked 26/11, 2012 at 14:33

3

How can I wait for completion of an Elastic MapReduce job flow in a Java application?

Recently I've been working with Amazon Web Services (AWS) and I've noticed there is not much documentation on the subject, so I added my solution. I was writing an application using Amazon Elastic...

java amazon-web-services elastic-map-reduce amazon-emr

Electrostriction asked 25/5, 2012 at 16:47

3

Solved

Spark/Hadoop throws exception for large LZO files

I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.: ... s3://mylogfiles/2014-08-11-00111.lzo s3://mylogfiles/201...

hadoop apache-spark elastic-map-reduce lzo

Gebhardt asked 11/8, 2014 at 16:37

2

Solved

parallel generation of random forests using scikit-learn

Main question: How do I combine different randomForests in python and scikit-learn? I am currently using the randomForest package in R to generate randomforest objects using elastic map reduce. Th...

python r scikit-learn random-forest elastic-map-reduce

Boomerang asked 18/9, 2014 at 13:39

1

Solved

AWS EMR and Spark 1.0.0

I've been running into some issues recently while trying to use Spark on an AWS EMR cluster. I am creating the cluster using something like : ./elastic-mapreduce --create --alive \ --name "ll_Sp...

amazon-web-services apache-spark elastic-map-reduce

Caryloncaryn asked 21/8, 2014 at 7:43

elastic-map-reduce Questions

Recommended topics

Hot tags