In spark, java serialization is the default, if kryo is that efficient then why it is not set as default. Is there some cons using kryo or in what scenarios we should use kryo or java serialization?
Here is comment from documentation:
Kryo is significantly faster and more compact than Java serialization (often as much as 10x), but does not support all Serializable types and requires you to register the classes you’ll use in the program in advance for best performance.
So it is not used by default because:
- Not every
java.io.Serializable
is supported out of the box - if you have custom class that extendsSerializable
it still cannot be serialized with Kryo, unless registered. - One needs to register custom classes.
Note according to documentation:
Spark automatically includes Kryo serializers for the many commonly-used core Scala classes covered in the AllScalaRegistrar from the Twitter chill library.
Kryo Pros : Memory consumption is low
The time kryo didnt work for me as is was when I was dealing with google protobufs. Thats when I had to first register the proto class
https://mvnrepository.com/artifact/de.javakaffee/kryo-serializers/0.45
© 2022 - 2024 — McMap. All rights reserved.