elastic-map-reduce - 2

3

Solved

Broken Pipe Error causes streaming Elastic MapReduce job on AWS to fail

Everything works fine locally when I do as follows: cat input | python mapper.py | sort | python reducer.py However, when I run the streaming MapReduce job on AWS Elastic Mapreduce, the job does...

python hadoop amazon-web-services mapreduce elastic-map-reduce

Toney asked 26/3, 2012 at 23:15

1

How to find the right portion between hadoop instance types

I am trying to find out how many MASTER, CORE, TASK instances are optimal to my jobs. I couldn't find any tutorial that explains how do I figure it out. How do I know if I need more than 1 core i...

hadoop elastic-map-reduce instancetype

Uella asked 29/4, 2014 at 9:29

2

Solved

Getting data in and out of Elastic MapReduce HDFS

I've written a Hadoop program which requires a certain layout within HDFS, and which afterwards, I need to get the files out of HDFS. It works on my single-node Hadoop setup and I'm eager to get it...

hadoop elastic-map-reduce

Backhand asked 9/10, 2011 at 5:42

1

Solved

Trouble using hbase from java on Amazon EMR

So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. Im my jar (inside the map function) I call Hbase as so: public void map( Text key, BytesWritab...

hadoop amazon-web-services hbase apache-zookeeper elastic-map-reduce

Semifinal asked 28/2, 2014 at 20:22

5

Solved

Are there any distributed machine learning libraries for using Python with Hadoop? [closed]

I have set myself up with Amazon Elastic MapReduce in order to preform various standard machine learning tasks. I have used Python extensively for local machine learning in the past and I do ...

python hadoop mapreduce hadoop-streaming elastic-map-reduce

Purvey asked 9/1, 2013 at 11:3

2

AWS DynamoDB and MapReduce in Java

I have a huge DynamoDB table that I want to analyze to aggregate data that is stored in its attributes. The aggregated data should then be processed by a Java application. While I understand the re...

java amazon-web-services mapreduce amazon-dynamodb elastic-map-reduce

Embroidery asked 8/4, 2012 at 23:5

2

Amazon Elastic Map Reduce - Creating a job flow

I'm very new to amazon services. I'm facing problems in creating job flows. Every time i create any job flow it fails or shuts down. Input, output or mapper function upload techniques are not clear...

hadoop amazon-s3 amazon-ec2 elastic-map-reduce emr

Cuckoopint asked 22/1, 2013 at 11:57

2

Solved

Python Dependency Management on EMR

i'm sending code to amazon's EMR via the mrjob/boto modules. i've got some external python dependencies (ie. numpy, boto, etc) and currently have to download the source of the python packages, and ...

python virtualenv pip elastic-map-reduce mrjob

Occlusive asked 9/7, 2013 at 21:24

1

Solved

Getting "No space left on device" for approx. 10 GB of data on EMR m1.large instances

I am getting an error "No space left on device" when I am running my Amazon EMR jobs using m1.large as the instance type for the hadoop instances to be created by the jobflow. The job generates app...

hadoop amazon-web-services amazon-ec2 elastic-map-reduce diskspace

Reinhardt asked 24/10, 2013 at 9:7

1

Solved

AWS Elastic mapreduce doesn't seem to be correctly converting the streaming to jar

I have a mapper and reducer that work fine when I run them in the piped version: cat data.csv | ./mapper.py | sort -k1,1 | ./reducer.py I used the elastic mapreducer wizard, loaded inputs, outpu...

python hadoop amazon-web-services hadoop-streaming elastic-map-reduce

Stoffel asked 1/9, 2013 at 7:34

1

Solved

create hive table from tab separated file in s3 using interactive mode

I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00 each subfolder has 1 file that represent on hour of my ...

amazon-web-services amazon-s3 hive elastic-map-reduce

Contributory asked 14/7, 2013 at 13:33

2

Solved

Map Reduce output to CSV or do I need Key Values?

My map function produces a Key\tValue Value = List(value1, value2, value3) then my reduce function produces: Key\tCSV-Line Ex. 2323232-2322 fdsfs,sdfs,dfsfs,0,0,0,2,fsda,3,23,3,s, 2323555...

hadoop mapreduce hadoop-streaming elastic-map-reduce

Strive asked 26/6, 2013 at 23:38

2

Configuring external data source for Elastic MapReduce

We want to use Amazon Elastic MapReduce on top of our current DB (we are using Cassandra on EC2). Looking at the Amazon EMR FAQ, it should be possible: Amazon EMR FAQ: Q: Can I load my data from th...

amazon-web-services cassandra elastic-map-reduce

Lorenzoloresz asked 29/8, 2012 at 12:0

2

Solved

Using s3distcp with Amazon EMR to copy a single file

I want to copy just a single file to HDFS using s3distcp. I have tried using the srcPattern argument but it didn't help and it keeps on throwing java.lang.Runtime exception. It is possible that the...

hadoop amazon-s3 mapreduce elastic-map-reduce emr

Vander asked 21/11, 2012 at 13:38

2

jar containing org.apache.hadoop.hive.dynamodb

I was trying to programmatically Load a dynamodb table into HDFS (via java, and not hive), I couldnt find examples online on how to do it, so thought I'd download the jar containing org.apache.hado...

mapreduce amazon-dynamodb elastic-map-reduce emr

Villous asked 13/6, 2013 at 1:5

1

Hive -- split data across files

Is there a way to instruct Hive to split data into multiple output files? Or maybe cap the size of the output files. I'm planning to use Redshift, which recommends splitting data into multiple fil...

amazon-web-services hive elastic-map-reduce amazon-redshift

Lapierre asked 8/5, 2013 at 20:28

2

Solved

Amazon Elastic MapReduce - SIGTERM

I have an EMR streaming job (Python) which normally works fine (e.g. 10 machines processing 200 inputs). However, when I run it against large data sets (12 machines processing a total of 6000 input...

python hadoop-streaming elastic-map-reduce amazon-emr

Myogenic asked 15/8, 2012 at 13:59

1

Understanding a mapreduce algorithm for overlap calculation

I want help understanding the algorithm. I ve pasted the algorithm explanation first and then my doubts. Algorithm:( For calculating the overlap between record pairs) Given a user defined paramet...

java hadoop mapreduce elastic-map-reduce hadoop-partitioning

Tucson asked 10/3, 2013 at 6:5

3

Amazon Elastic Map Reduce for analyzing s3 logs

I am using EMR to analyze web nginx logs. But I need to process the logs so that it can fall into rows and columns in order to make it easy for querying. Thus i made two tables - rawlog, processedl...

amazon-s3 amazon-web-services transform hive elastic-map-reduce

Beefsteak asked 8/6, 2012 at 10:21

1

Solved

Elastic Map Reduce: difference between CANCEL_AND_WAIT and CONTINUE?

I just found that using Amazon's Elastic Map Reduce, I can specify a step to have one of three ActionOnFailure choices: TERMINATE_JOB_FLOW CANCEL_AND_WAIT CONTINUE TERMINATE_JOB_FLOW is the def...

boto elastic-map-reduce amazon-emr

Gratia asked 7/3, 2013 at 21:19

2

Solved

The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

The reduce phase of the job fails with: of failed Reduce Tasks exceeded allowed limit. The reason why each task fails is: Task attempt_201301251556_1637_r_000005_0 failed to report status for 60...

java eclipse hadoop mapreduce elastic-map-reduce

Adina asked 7/3, 2013 at 20:42

3

How to use external data with Elastic MapReduce

From Amazon's EMR FAQ: Q: Can I load my data from the internet or somewhere other than Amazon S3? Yes. Your Hadoop application can load the data from anywhere on the internet or from other AWS ser...

elastic-map-reduce

Promote asked 6/6, 2012 at 16:41

1

Solved

Loading data with Hive, S3, EMR, and Recover Partitions

SOLVED: See Update #2 below for the 'solution' to this issue. ~~~~~~~ In s3, I have some log*.gz files stored in a nested directory structure like: s3://($BUCKET)/y=2012/m=11/d=09/H=10/ I'm at...

hadoop amazon-s3 amazon-web-services hive elastic-map-reduce

Pina asked 10/11, 2012 at 3:53

2

Solved

How to specify mapred configurations & java options with custom jar in CLI using Amazon's EMR?

I would like to know how to specify mapreduce configurations such as mapred.task.timeout , mapred.min.split.size etc. , when running a streaming job using custom jar. We can use the following way ...

java hadoop mapreduce elastic-map-reduce emr

Litotes asked 14/2, 2012 at 20:45

2

Solved

DynamoDB InputFormat for Hadoop

I have to process some data which is persisted in Amazon Dynamo DB using Hadoop map reduce. I was searching over internet for Hadoop InputFormat for Dynamo DB and couldn't find it. I'm not famili...

hadoop amazon-web-services mapreduce amazon-dynamodb elastic-map-reduce

Marlea asked 22/10, 2012 at 21:22

elastic-map-reduce Questions

Recommended topics

Hot tags