How to optimize shuffle spill in Apache Spark application

Asked 12/6, 2015 at 7:36 Answered 24/10, 2019 at 19:21

Solved apache-spark spark-streaming apache-spark-1.4

I am running a Spark streaming application with 2 workers. Application has a join and an union operations.

All the batches are completing successfully but noticed that shuffle spill metrics are not consistent with input data size or output data size (spill memory is more than 20 times).

Please find the spark stage details in the below image: enter image description here

After researching on this, found that

Shuffle spill happens when there is not sufficient memory for shuffle data.

Shuffle spill (memory) - size of the deserialized form of the data in memory at the time of spilling

shuffle spill (disk) - size of the serialized form of the data on disk after spilling

Since deserialized data occupies more space than serialized data. So, Shuffle spill (memory) is more.

Noticed that this spill memory size is incredibly large with big input data.

My queries are:

Does this spilling impacts the performance considerably?

How to optimize this spilling both memory and disk?

Are there any Spark Properties that can reduce/ control this huge spilling?

Amari answered 12/6, 2015 at 7:36 Comment(1)

@mitchus Partially Yes, I Just increased the no of tasks and allocated more fraction memory for shuffle. Also, I had optimized my code to compact the data structure size... – Amari 30/7, 2015 at 5:40

Learning to performance-tune Spark requires quite a bit of investigation and learning. There are a few good resources including this video. Spark 1.4 has some better diagnostics and visualisation in the interface which can help you.

In summary, you spill when the size of the RDD partitions at the end of the stage exceed the amount of memory available for the shuffle buffer.

You can:

Manually repartition() your prior stage so that you have smaller partitions from input.
Increase the shuffle buffer by increasing the memory in your executor processes (spark.executor.memory)
Increase the shuffle buffer by increasing the fraction of executor memory allocated to it (spark.shuffle.memoryFraction) from the default of 0.2. You need to give back spark.storage.memoryFraction.
Increase the shuffle buffer per thread by reducing the ratio of worker threads (SPARK_WORKER_CORES) to executor memory

If there is an expert listening, I would love to know more about how the memoryFraction settings interact and their reasonable range.

Plunder answered 12/6, 2015 at 11:24 Comment(4)

repartition can shuffle unnecessary data, use coalesce internally it use combiner so minimize shuffling. – Masha 8/6, 2016 at 13:27

@VenuAPositive I think he was suggesting repartition to more partitions not less. If he were going to fewer partitions, then coalesce would make sense. – Foetus 14/7, 2016 at 0:14

spark.shuffle.memoryFraction is not longer used since spark 1.5 unless you enable legacy mode. See: spark.apache.org/docs/latest/configuration.html – Abbacy 1/11, 2017 at 22:1

This answer (while useful) doesn't really address the question of why the shuffle spill is so much larger than the shuffle read. – Quizmaster 12/11, 2019 at 7:29

To add to the above answer, you may also consider increasing the default number (spark.sql.shuffle.partitions) of partitions from 200 (when shuffle occurs) to a number that will result in partitions of size close to the hdfs block size (i.e. 128mb to 256mb)

If your data is skewed, try tricks like salting the keys to increase parallelism.

Read this to understand spark memory management:

https://0x0fff.com/spark-memory-management/

https://www.tutorialdocs.com/article/spark-memory-management.html

Fleet answered 24/10, 2019 at 19:21 Comment(0)

Recommended topics

Hot tags