How do I run Spark jobs concurrently in the same AWS EMR cluster ?
Asked Answered
S

2

5

Is it possible to submit and run Spark jobs concurrently in the same AWS EMR cluster ? If yes then could you please elaborate ?

Scrip answered 9/5, 2018 at 5:13 Comment(0)
B
4

You should use the tag --deploy-mode cluster that will allow you to deploy multiple executions to your cluster. That will make yarn handle the resources and the queues for you.

The full example:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

More details here.

Benempt answered 9/5, 2018 at 5:45 Comment(2)
Hello Thiago could you please let me knowthe equivalent command for doing the same in awsScrip
The reason I ask this is because the spark-submit is always treated as a step in EMR which is executed sequentially. Please read docs.aws.amazon.com/emr/latest/ReleaseGuide/…Scrip
P
2

Currently, EMR doesn't support running multiple steps in parallel. As far as I know such experimental feature is already implemented but not released due to some issues.

Pentachlorophenol answered 10/5, 2018 at 3:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.