Re-use Amazon Elastic MapReduce instance
Asked Answered
P

3

11

I have tried a simple Map/Reduce task using Amazon Elastic MapReduce and it took just 3 mins to complete the task. Is it possible to re-use the same instance to run another task.

Even though I have just used the instance for 3 mins Amazon will charge for 1 hr, so I want to use the balance 57 mins to run several other tasks.

Poundfoolish answered 30/7, 2011 at 0:27 Comment(1)
did we help to answer your question?Kneedeep
K
14

The answer is yes.

here's how you do it using the command line client:

When you create an instance pass the --alive flag, this tells emr to keep the cluster around after your job has run.

Then you can submit more tasks to the cluster:

elastic-mapreduce --jobflow <job-id> --stream --input <s3dir> --output <s3dir> --mapper <script1> --reducer  <script2>

To terminate the cluster later, simply run:

elastic-mapreduce <jobid> --terminate

try running elastic-mapreduce --help to see all the commands you can run.

If you don't have the command line client, get it here.

Kneedeep answered 10/8, 2011 at 19:23 Comment(4)
wasn't there a limitation of 255 steps or something for alive clusters? so you can reuse it 255 times, because you need to add "steps" each time you run a job? long time since I looked into this, so please let me know if you have enlightening updates on this.Kensell
How to do the same using aws java sdkBeneficent
@Kensell - that 256 steps limit has been removed: docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/…Pembrook
@Alex Dean as per the documentation, "Beginning with AMI 3.1.1 (Hadoop 2.x) and AMI 2.4.8 (Hadoop 1.x), you can submit an unlimited number of steps over the lifetime of a long-running cluster, but only 256 can be active or pending at any given time" -> does that mean Amazon EMR by default maintain a job queue for more incoming job submission?Gambier
R
2

Using:

elastic-mapreduce --jobflow job-id \
    --jar s3n://some-path/x.jar \
    --step-name "New step name" \
    --args ...

you can also add non-streaming steps to your cluster. (just so you don't have to try it your yourself ;-) )

Ryals answered 16/8, 2011 at 11:46 Comment(0)
B
0

http://aws.amazon.com/elasticmapreduce/faqs/#dev-6

Q: Can I run a persistent job flow? Yes. Amazon Elastic MapReduce job flows that are started with the –alive flag will continue until explicitly terminated. This allows customers to add steps to a job flow on demand. You may want to use this to debug your job flow logic without having to repeatedly wait for job flow startup. You may also use a persistent job flow to run a long-running data warehouse cluster. This can be combined with data warehouse and analytics packages that runs on top of Hadoop such as Hive and Pig.

Buckjump answered 30/7, 2011 at 0:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.