elastic-map-reduce Questions
10
Solved
It has been suggested on Amazon docs http://aws.amazon.com/dynamodb/ among other places, that you can backup your dynamodb tables using Elastic Map Reduce,
I have a general understanding of how thi...
Leonaleonanie asked 29/11, 2012 at 16:49
3
1) I have been told that git comes stock installed on EMR. Is this true ? I believe not, as I can confirm that "git" is not found in my elastic-mapreduce ssh terminal. See: https://raw.github.com/g...
Podiatry asked 25/7, 2012 at 15:59
1
I'm running a large (more than 100 nodes) series of mapreduce jobs on Amazon Elastic MapReduce.
In the reduce phase, already-completed map tasks keep failing with
Map output lost, rescheduling: g...
Pennoncel asked 19/4, 2012 at 6:39
2
I'm experimenting with Gradient Boosted Trees learning algorithm from ML library of Spark 1.4. I'm solving a binary classification problem where my input is ~50,000 samples and ~500,000 features. M...
Boyles asked 21/9, 2015 at 19:22
4
I am trying to copy files from s3 to hdfs using workflow in EMR and when I run the below command the jobflow successfully starts but gives me an error when it tries to copy the file to HDFS .Do i n...
Coacervate asked 31/1, 2013 at 17:0
2
Solved
I'm running a job on Apache Spark on Amazon Elastic Map Reduce (EMR). Currently I'm running on emr-4.1.0 which includes Amazon Hadoop 2.6.0 and Spark 1.5.0.
When I start the job, YARN correctly ha...
Horrorstruck asked 26/11, 2015 at 14:16
3
I have integrated ELK with Pyspark.
saved RDD as ELK data on local file system
rdd.saveAsTextFile("/tmp/ELKdata")
logData = sc.textFile('/tmp/ELKdata/*')
errors = logData.filter(lambda line: "r...
Lunetta asked 19/1, 2016 at 6:19
3
Solved
I am using Amazon Elastic Map Reduce 4.7.1, Hadoop 2.7.2, Hive 1.0.0, and Spark 1.6.1.
Use case: I have a Spark cluster used for processing data. That data is stored in S3 as Parquet files. I want...
Man asked 21/7, 2016 at 0:36
0
I would like to upgrade my AWS data pipeline definition to EMR 4.x or 5.x, so I can take advantage of Hive's latest features (version 2.0+), such as CURRENT_DATE and CURRENT_TIMESTAMP, etc.
The c...
Brooklynese asked 17/12, 2017 at 18:17
2
Solved
I am struggling to find a way to use S3DistCp in my AWS EMR Cluster.
Some old examples which show how to add s3distcp as an EMR step use elastic-mapreduce command which is not used anymore.
Some ...
Glassworker asked 8/9, 2016 at 11:38
7
Solved
I have a website running on AWS EC2. I need to create a nightly job that generates a sitemap file and uploads the files to the various browsers. I'm looking for a utility on AWS that allows this fu...
Challis asked 10/1, 2012 at 23:21
1
Total Instances: I have created an EMR with 11 nodes total (1 master instance, 10 core instances).
job submission: spark-submit myApplication.py
graph of containers: Next, I've got these gra...
Subcartilaginous asked 22/1, 2017 at 1:8
3
Solved
I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option.
I have tried two solutions:
org.apache.hadoop.fs.S3FileSystem: throws a NullPointerException.
com.amaz...
Gyatt asked 12/6, 2014 at 12:43
5
Solved
How can I drop all partitions currently loaded in a Hive table?
I can drop a single partition with alter table <table> drop partition(a=, b=...);
I can load all partitions with the recover ...
Cheery asked 19/3, 2013 at 5:52
4
Solved
In Hadoop, where can i change default url ports 50070 and 50030 for namenode and jobtracker webpages
There must be a way to change the ports 50070 and 50030 so that the following urls display the clustr statuses on the ports i pick
NameNode - http://localhost:50070/
JobTracker - http://localhost:...
Sadick asked 16/11, 2012 at 19:1
1
Solved
In EMR, is there a way to get a specific value of the configuration given the configuration key using the yarn command?
For example I would like to do something like this
yarn get-config yarn.sch...
Millican asked 7/1, 2016 at 22:31
3
I'm running an EMR cluster (version emr-4.2.0) for Spark using the Amazon specific maximizeResourceAllocation flag as documented here. According to those docs, "this option calculates the maximum c...
Lovesome asked 30/11, 2015 at 16:51
3
Solved
I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:
CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMIT...
Tremulant asked 28/2, 2012 at 20:48
2
How to mute DEBUG messages on AWS Elastic MapReduce Master node?
hbase(main):003:0> list
TABLE
mydb
1 row(s) in 0.0510 seconds
hbase(main):004:0> 00:25:17.104 [main-SendThread(ip-172-31-1...
Quintuplet asked 23/10, 2014 at 0:28
7
Solved
I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsEx...
Enphytotic asked 28/5, 2013 at 16:47
2
I am getting a weird exception when I try to access Cassandra from hadoop, by using ColumnFamilyInputFormat class.
In my hadoop process, this is how I connect to cassandra, after including cassand...
Keeper asked 26/11, 2012 at 14:33
3
Recently I've been working with Amazon Web Services (AWS) and I've noticed there is not much documentation on the subject, so I added my solution.
I was writing an application using Amazon Elastic...
Electrostriction asked 25/5, 2012 at 16:47
3
Solved
I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.:
...
s3://mylogfiles/2014-08-11-00111.lzo
s3://mylogfiles/201...
Gebhardt asked 11/8, 2014 at 16:37
2
Solved
Main question: How do I combine different randomForests in python and scikit-learn?
I am currently using the randomForest package in R to generate randomforest objects using elastic map reduce. Th...
Boomerang asked 18/9, 2014 at 13:39
1
Solved
I've been running into some issues recently while trying to use Spark on an AWS EMR cluster.
I am creating the cluster using something like :
./elastic-mapreduce --create --alive \
--name "ll_Sp...
Caryloncaryn asked 21/8, 2014 at 7:43
1 Next >
© 2022 - 2024 — McMap. All rights reserved.