Please help to understand how Kryo serializer allocates memory for its buffer.
My Spark app fails on a collect step when it tries to collect about 122Mb of data to a driver from workers.
com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 57197
at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:312)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
This exception is shown after I've increased the driver memory to 3Gb and executor memory to 4Gb and increased buffer size for kryoserializer (I'm using Spark 1.3)
conf.set('spark.kryoserializer.buffer.mb', '256')
conf.set('spark.kryoserializer.buffer.max', '512')
I think I've set buffer to be big enough, but my spark app keeps crashing. How can I check what objects are using Kryo buffer on a executor? Is there way to clean it up?
spark.kryoserializer.buffer.max
- it hasspark.kryoserializer.buffer.max.mb
. I'm testing the app now with the correct property set. – Spagyric