Spark Driver Memory and Executor Memory
Asked Answered
A

2

15

I am beginner to Spark and I am running my application to read 14KB data from text filed, do some transformations and actions(collect, collectAsMap) and save data to Database

I am running it locally in my macbook with 16G memory, with 8 logical cores.

Java Max heap is set at 12G.

Here is the command I use to run the application.

bin/spark-submit --class com.myapp.application --master local[*] --executor-memory 2G --driver-memory 4G /jars/application.jar

I am getting the following warning

2017-01-13 16:57:31.579 [Executor task launch worker-8hread] WARN org.apache.spark.storage.MemoryStore - Not enough space to cache rdd_57_0 in memory! (computed 26.4 MB so far)

Can anyone guide me on what is going wrong here and how can I improve performance? Also how to optimize on suffle-spill ? Here is a view of the spill that happens in my local system

enter image description here

Alexandraalexandre answered 14/1, 2017 at 0:59 Comment(3)
In local mode, spark.executor.memory has no effect . so just try by spark.driver.memory to more than 6g since you have 16g ram.Halvah
what is the size of the file you are trying to read?Halvah
@RajatMishra I tried with 6g driver memory and 8g java max heap. I still get the same message .Alexandraalexandre
W
19

Running executors with too much memory often results in excessive garbage collection delays. So it is not a good idea to assign more memory. Since you have only 14KB data 2GB executors memory and 4GB driver memory is more than enough. There is no use of assigning this much memory. You can run this job with even 100MB memory and performance will be better then 2GB.

Driver memory are more useful when you run the application, In yarn-cluster mode, because the application master runs the driver. Here you are running your application in local mode driver-memory is not necessary. You can remove this configuration from you job.

In your application you have assigned

Java Max heap is set at: 12G.
executor-memory: 2G 
driver-memory: 4G

Total memory allotment= 16GB and your macbook having 16GB only memory. Here you have allocated total of your RAM memory to your spark application.

This is not good. Operating system itself consume approx 1GB memory and you might have running other applications which also consume the RAM memory. So here you are actually allocating more memory then you have. And this is the root cause that your application is throwing error Not enough space to cache the RDD

  1. There is no use of assigning Java Heap to 12 GB. You need to reduce it to 4GB or less.
  2. Reduce the executor memory to executor-memory 1G or less
  3. Since you are running locally, Remove driver-memory from your configuration.

Submit your job. It will run smoothly.

If you are very keen to know spark memory management techniques, refer this useful article.

Spark on yarn executor resource allocation

Wisniewski answered 14/1, 2017 at 3:26 Comment(3)
Since the application is being run in local mode ,don't you think executor memory has no effect as the worker lives within the driver jvm process?Halvah
@RajatMishra Yeah !!, You are right, seems there is no use of executor-memory in local mode. I will do some more test and update my answer accordingly :)Wisniewski
Does anybody have a source on memory management in Spark 2.0+, I'm not finding anything similar to the great source you provided. ThanksFinery
M
-2

In local mode,you don't need to specify master,useing default arguments is ok. The official website said,"The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application specially for each one.".So you'd better use spark-submit in cluster,locally you can use spark-shell.

Myatt answered 14/1, 2017 at 3:25 Comment(1)
By spark-shell,you can debug your application to find which step is wrong.Myatt

© 2022 - 2024 — McMap. All rights reserved.