Spark Indefinite Waiting with "Asked to send map output locations for shuffle"
Asked Answered
H

1

13

My jobs often hang with this kind of message:

14/09/01 00:32:18 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to spark@*:37619

Would be great if someone could explain what Spark is doing when it spits out this message. What does this message mean? What could the user be doing wrong to cause this? What configurables should be tuned?

It's really hard to debug because it doesn't OOM, it doesn't give an ST, it just sits and sits and sits.

This has been an issue from Spark at least as far back as 1.0.0 and is still ongoing with Spark 1.5.0

Harpp answered 1/9, 2014 at 7:41 Comment(2)
Maybe a deadlock? Could you paste the stack traces of threads by jstack?Killoran
Can you reproduce it, or it just happens sometimes?Killoran
T
2

Based on this thread more recent versions of spark have gotten better at shuffling (and reporting errors if it fails anyway). Also, the following tips were mentioned:

This is very likely because the serialized map output locations buffer exceeds the akka frame size. Please try setting "spark.akka.frameSize" (default 10 MB) to some higher number, like 64 or 128.

In the newest version of Spark, this would throw a better error, for what it's worth.

A possible workaround:

If the distribution of the keys in your groupByKey is skewed (some keys appear way more often than others) you should consider modifying your job to use reduceByKey instead wherever possible.

And a side track:

The issue was fixed for me by allocating just one core per executor.

maybe your executor-memory config should be divided by executor-cores

Turbofan answered 31/7, 2020 at 11:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.