How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?
Asked Answered
J

2

18

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition.

So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql.

Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run serially one by one. When I submit this Spark job in YARN cluster, almost all the time my executor gets lost because of shuffle not found exception.

Now this is happening because YARN is killing my executor because of memory overload. I don't understand why, as I have a very small data set for each hive partition, but still it causes YARN to kill my executor.

Will the following code do everything in parallel and try to accommodate all hive partition data in memory at the same time?

public static void main(String[] args) throws IOException {   
    SparkConf conf = new SparkConf(); 
    SparkContext sc = new SparkContext(conf); 
    HiveContext hc = new HiveContext(sc); 

    DataFrame partitionFrame = hiveContext.sql(" show partitions dbdata partition(date="2015-08-05")"); 
  
    Row[] rowArr = partitionFrame.collect(); 
    for(Row row : rowArr) { 
        String[] splitArr = row.getString(0).split("/"); 
        String server = splitArr[0].split("=")[1]; 
        String date =  splitArr[1].split("=")[1]; 
        String csvPath = "hdfs:///user/db/ext/"+server+".csv"; 
        if(fs.exists(new Path(csvPath))) { 
            hiveContext.sql("ADD FILE " + csvPath); 
        } 
        createInsertIntoTableABC(hc,entity, date); 
        createInsertIntoTableDEF(hc,entity, date); 
        createInsertIntoTableGHI(hc,entity,date); 
        createInsertIntoTableJKL(hc,entity, date); 
        createInsertIntoTableMNO(hc,entity,date); 
   } 
}
Jaquenette answered 5/8, 2015 at 18:48 Comment(0)
F
19

Generally, you should always dig into logs to get the real exception out (at least in Spark 1.3.1).

tl;dr
safe config for Spark under Yarn
spark.shuffle.memoryFraction=0.5 - this would allow shuffle use more of allocated memory
spark.yarn.executor.memoryOverhead=1024 - this is set in MB. Yarn kills executors when its memory usage is larger then (executor-memory + executor.memoryOverhead)

Little more info

From reading your question you mention that you get shuffle not found exception.

In case of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle you should increase spark.shuffle.memoryFraction, for example to 0.5

Most common reason for Yarn killing off my executors was memory usage beyond what it expected. To avoid that you increase spark.yarn.executor.memoryOverhead , I've set it to 1024, even if my executors use only 2-3G of memory.

Fulford answered 14/10, 2015 at 6:52 Comment(2)
Hmm Barak what about repartitioning the dataset so that every partition holds less data?District
@District Data resides in different memory area, and in spark 1.3.1 it wasn't dynamic. So, you wouldn't actually "free" some memory on the executor for the shuffle. You have to explicitly increase shuffle area. That said, you might have smaller shuffle memory needs on the map side if you decrease data per partition, so it might help somewhat. Bear in mind that repartitioning has other effects on the process, so I wouldn't use it as solution to this specific problem. It is might be a good idea, but it is a bigger subject :)Fulford
G
0

This is my assumption: you must be having limited executors on your cluster and job might be running in shared environment.

As you said, your file size is small, you can set a smaller number of executors and increase executor cores and setting the memoryOverhead property is important here.

  1. Set number of executors = 5
  2. Set number of execuotr cores = 4
  3. Set memory overhead = 2G
  4. shuffle partition = 20 (to use maximum parallelism based on executors and cores)

Using above property I am sure you will avoid any executor out of memory issues without compromising performance.

Goran answered 22/6, 2019 at 21:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.