How Kryo serializer allocates buffer in Spark
Asked Answered
S

6

27

Please help to understand how Kryo serializer allocates memory for its buffer.

My Spark app fails on a collect step when it tries to collect about 122Mb of data to a driver from workers.

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 57197
    at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
    at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:312)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

This exception is shown after I've increased the driver memory to 3Gb and executor memory to 4Gb and increased buffer size for kryoserializer (I'm using Spark 1.3)

conf.set('spark.kryoserializer.buffer.mb', '256')
conf.set('spark.kryoserializer.buffer.max', '512')

I think I've set buffer to be big enough, but my spark app keeps crashing. How can I check what objects are using Kryo buffer on a executor? Is there way to clean it up?

Spagyric answered 11/8, 2015 at 16:37 Comment(4)
Looks like the problem is that Spark 1.3 doesn't have property spark.kryoserializer.buffer.max - it has spark.kryoserializer.buffer.max.mb . I'm testing the app now with the correct property set.Spagyric
@AlbertoBonsanto did my answer helped you with your issue?Spagyric
@ vvladymyrov It didn't, I get this problem every time I run NaiveBayes fit on a big dataset where the features are in the form of SparseVectors with millions of featuresMonkey
No one is answering the question "how can I check what objects are using Kryo [...]?" :(Crepuscule
S
16

In my case, the problem was using the wrong property name for the max buffer size.

Up to Spark version 1.3 the property name is spark.kryoserializer.buffer.max.mb - it has ".mb" in the end. But I used property name from Spark 1.4 docs - spark.kryoserializer.buffer.max .

As a result spark app was using the default value - 64mb. And it was not enough for the amount of data I was processing.

After I fixed the property name to spark.kryoserializer.buffer.max.mb my app worked fine.

Spagyric answered 8/9, 2015 at 17:17 Comment(0)
U
4

Use conf.set('spark.kryoserializer.buffer.max.mb', 'val') to set kryoserializer buffer and keep in mind val should be less than 2048 otherwise you will get some error again indicating buffer should be less than 2048MB

Utham answered 6/6, 2016 at 15:54 Comment(0)
M
4

Solution is to setup spark.kryoserializer.buffer.max to 1g in spark-default.conf and restarting spark services

This at least worked for me.

Marketplace answered 29/9, 2016 at 14:19 Comment(0)
S
4

Now spark.kryoserializer.buffer.max.mb is deprecated

WARN spark.SparkConf: The configuration key 'spark.kryoserializer.buffer.max.mb' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.kryoserializer.buffer.max' instead.

You should rather use:

import org.apache.spark.SparkConf
val conf = new SparkConf()
conf.set("spark.kryoserializer.buffer.max", "val")
Scooter answered 9/7, 2018 at 6:55 Comment(0)
G
2

This question is old but for Spark (version 2.4.0), if you're looking to change the 'spark.kryoserializer.buffer.max' property go to,

/etc/spark/conf/spark-defaults.conf

and add/change

spark.kryoserializer.buffer.max = "value you desire".

Reference: Eli's Blog

Goulette answered 20/7, 2020 at 15:46 Comment(0)
A
1

I am using spark 1.5.2 and I had the same issue. Setting spark.kryoserializer.buffer.max.mb to 256 fixed it.

Aloes answered 10/2, 2016 at 6:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.