Resolving "Kryo serialization failed: Buffer overflow" Spark exception

Asked 8/6, 2016 at 18:4 Answered 20/2, 2023 at 18:19

I am trying to run Spark (Java) code and getting the error

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 27".

Other posts have suggested setting the buffer to its max value. When I tried this with max buffer value of 512MB I got the error

java.lang.ClassNotFoundException: org.apache.spark.serializer.KryoSerializer.buffer.max', '512'

How can I solve this problem?

Harp answered 8/6, 2016 at 18:4 Comment(1)

While doing a spark-submit, use --conf "spark.kryoserializer.buffer.max=512m" – Kaykaya 8/6, 2016 at 18:16

Try using "spark.kryoserializer.buffer.max.mb", "512" instead spark.kryoserializer.buffer.max", "512MB"

Lashay answered 6/12, 2016 at 7:42 Comment(0)

The property name is correct, spark.kryoserializer.buffer.max, the value should include the unit, so in your case is 512m.

Also, dependending where you are setting up the configuration you might have to write --conf spark.kryoserializer.buffer.max=512m. For instance, with a spark-submit or within the <spark-opts>...</spark-opts> of an Oozie worflow action.

Bibulous answered 12/6, 2018 at 16:3 Comment(0)

This is an old question but the first hit when I googled, so answering here to help others.

For Spark 3.2 (in Azure Synapse environment, but not sure if that matters) I tried all of these combinations but the only one that worked to convert a large spark DataFrame toPandas() was spark.kryoserializer.buffer.max=512. No letters after the number, no ".mb" at the end.

Litter answered 20/2, 2023 at 18:19 Comment(2)

Hi 8forty- could you provide a little more detail on what you did? I'm in the same situation with Synapse and changing the configuration settings in a notebook where the session is already started seems to be a bit different. – Beaner 6/5, 2023 at 0:7

@CodyDance I added the setting to the "Apache Spark Configuration" of the spark pool, so it's already set for all workers/notebooks that use the pool. – Litter 7/5, 2023 at 2:44

either you can set this in spark configuration while creating spark session as

SparkSession.builder
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer.max", "512m")

or you can pass with your spark submit command as

spark-submit \
--verbose \
--name "JOB_NAME" \
--master MASTER_IP \
--conf "spark.kryoserializer.buffer.max=512m" \
main.py

Silesia answered 18/5, 2022 at 10:25 Comment(0)

Recommended topics

Hot tags