Spark - How many Executors and Cores are allocated to my spark job
Asked Answered
R

3

8

Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster.

I was trying to use below snippet in my application but no luck.

val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")

Is there any way to get those values using SparkContext Object or SparkConf object etc..

Rebozo answered 26/8, 2016 at 8:44 Comment(6)
You can look in the Spark UI. Go to http://<driver_ip>:4040 and press the "Executors" tab. This varies between cluster managers.Morpheme
Krishna, were you able to get ? feel free to ask questionsSobersided
Were you able to test?Sobersided
Thanks alot @RamPrasad. It helps alot. Tried with different datasets with different sizes and was able to get the executor nodes.Rebozo
@yuval-itzchakov : Thanks Yuval. It is working but when the spark application gets finished, the webUI with driverIP is shutting down. So, i was able to track through driverIP while the application is running. So, alternative i tried through JobTracker and was able to track the executors history. Thanks again.Rebozo
@KrishnaReddy You can use the history server for that.Morpheme
S
6

Scala (Programmatic way) :

getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver. like below example snippet.

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

sc.getConf.getInt("spark.executor.instances", 1)

similarly get all properties and print like below you may get cores information as well..

sc.getConf.getAll.mkString("\n")

OR

sc.getConf.toDebugString

Mostly spark.executor.cores for executors spark.driver.cores driver should have this value.

Python :

Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented

EDIT But can be accessed using Py4J bindings exposed from SparkSession.

sc._jsc.sc().getExecutorMemoryStatus()

Sobersided answered 26/8, 2016 at 9:36 Comment(3)
This is an old answer at this point, but I'm wondering how to accomplish this in R using sparklyr. Any advice?Staggs
Pls ask another question with respect to sparkyrSobersided
Regarding python - It doesn't seem to work for me. I asked a question and included a minimal example for that. I'd appreciate some help if you can.Salinasalinas
C
8

This is an old question, but this is my code for figuring this out on Spark 2.3.0:

+ 414     executor_count = len(spark.sparkContext._jsc.sc().statusTracker().getExecutorInfos()) - 1
+ 415     cores_per_executor = int(spark.sparkContext.getConf().get('spark.executor.cores','1'))
Crispation answered 19/9, 2018 at 17:10 Comment(2)
Thanks, confirmed this today on pyspark 2.4 and it works.Wraparound
Cannot confirm that the second one works: This will simply always output the number in the get() function! Example: sc.getConf().get('anytexthere', "152") will output 152...Puberty
S
6

Scala (Programmatic way) :

getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver. like below example snippet.

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

sc.getConf.getInt("spark.executor.instances", 1)

similarly get all properties and print like below you may get cores information as well..

sc.getConf.getAll.mkString("\n")

OR

sc.getConf.toDebugString

Mostly spark.executor.cores for executors spark.driver.cores driver should have this value.

Python :

Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented

EDIT But can be accessed using Py4J bindings exposed from SparkSession.

sc._jsc.sc().getExecutorMemoryStatus()

Sobersided answered 26/8, 2016 at 9:36 Comment(3)
This is an old answer at this point, but I'm wondering how to accomplish this in R using sparklyr. Any advice?Staggs
Pls ask another question with respect to sparkyrSobersided
Regarding python - It doesn't seem to work for me. I asked a question and included a minimal example for that. I'd appreciate some help if you can.Salinasalinas
S
-3

This is python Example to get number of cores (including master's) def workername(): import socket return str(socket.gethostname()) anrdd=sc.parallelize(['','']) namesRDD = anrdd.flatMap(lambda e: (1,workername())) namesRDD.count()

Stewart answered 18/10, 2016 at 10:7 Comment(1)
This snippet is only expected to return the number of executors that were used to calculate the lambda in flatmap (and that, given some corrections as well: using countByKey and swapping the constant 1 and the call for the method) which would in general be very different than the number of executors assigned to the application.Salinasalinas

© 2022 - 2024 — McMap. All rights reserved.