Is it possible to submit and run Spark jobs concurrently in the same AWS EMR cluster ? If yes then could you please elaborate ?
How do I run Spark jobs concurrently in the same AWS EMR cluster ?
Asked Answered
You should use the tag --deploy-mode cluster
that will allow you to deploy multiple executions to your cluster. That will make yarn handle the resources and the queues for you.
The full example:
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
More details here.
Hello Thiago could you please let me knowthe equivalent command for doing the same in aws –
Scrip
The reason I ask this is because the spark-submit is always treated as a step in EMR which is executed sequentially. Please read docs.aws.amazon.com/emr/latest/ReleaseGuide/… –
Scrip
Currently, EMR doesn't support running multiple steps in parallel. As far as I know such experimental feature is already implemented but not released due to some issues.
© 2022 - 2024 — McMap. All rights reserved.