Does protobuf-net have built-in compression for serialization?
Asked Answered
U

2

33

I was doing some comparison between BinaryFormatter and protobuf-net serializer and was quite pleased with what I found, but what was strange is that protobuf-net managed to serialize the objects into a smaller byte array than what I would get if I just wrote the value of every property into an array of bytes without any metadata.

I know protobuf-net supports string interning if you set AsReference to true, but I'm not doing that in this case, so does protobuf-net provide some compression by default?

Here's some code you can run to see for yourself:

var simpleObject = new SimpleObject
                       {
                           Id = 10,
                           Name = "Yan",
                           Address = "Planet Earth",
                           Scores = Enumerable.Range(1, 10).ToList()
                       };

using (var memStream = new MemoryStream())
{
    var binaryWriter = new BinaryWriter(memStream);
    // 4 bytes for int
    binaryWriter.Write(simpleObject.Id);      
    // 3 bytes + 1 more for string termination
    binaryWriter.Write(simpleObject.Name);    
    // 12  bytes + 1 more for string termination
    binaryWriter.Write(simpleObject.Address); 
    // 40 bytes for 10 ints
    simpleObject.Scores.ForEach(binaryWriter.Write); 

    // 61 bytes, which is what I expect
    Console.WriteLine("BinaryWriter wrote [{0}] bytes",
      memStream.ToArray().Count());
}

using (var memStream = new MemoryStream())
{
    ProtoBuf.Serializer.Serialize(memStream, simpleObject);

    // 41 bytes!
    Console.WriteLine("Protobuf serialize wrote [{0}] bytes",
      memStream.ToArray().Count());
}

EDIT: forgot to add, the SimpleObject class looks like this:

[Serializable]
[DataContract]
public class SimpleObject
{
    [DataMember(Order = 1)]
    public int Id { get; set; }

    [DataMember(Order = 2)]
    public string Name { get; set; }

    [DataMember(Order = 3)]
    public string Address { get; set; }

    [DataMember(Order = 4)]
    public List<int> Scores { get; set; }
}
Unifilar answered 24/8, 2011 at 11:22 Comment(0)
E
42

No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header. Note that varint by itself goes pretty loopy for negative numbers, so "zigzag" encoding is also supported which allows small magnitude numbers to be small (basically, it interleaves positive and negative pairs).

Actually, in your case for Scores you should also look at "packed" encoding, which requires either [ProtoMember(4, IsPacked = true)] or the equivalent via TypeModel in v2 (v2 supports either approach). This avoids the overhead of a header per value, by writing a single header and the combined length. "Packed" can be used with varint/zigzag. There are also fixed-length encodings for scenarios where you know the values are likely large and unpredictable.

Note also: but if your data has lots of text you may benefit from additionally running it through gzip or deflate; if it doesn't, then both gzip and deflate could cause it to get bigger.

An overview of the wire format is here; it isn't very tricky to understand, and may help you plan how best to further optimize.

Ellerey answered 24/8, 2011 at 20:32 Comment(2)
Why does protobuf make 1 byte only for the 128 values? 8 bits allows to write 256 different values.Molybdenite
@Molybdenite it uses "varint" encoding for the field number - which means 7 bits payload, and 1 bit "there's another byte to read". You keep reading until the MSB is zero.Ellerey
S
0

At least the c++ library does support writing to and from compressed streams:

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/io/gzip_stream.h

I'm not sure though if that has been ported to the .Net implementation.

Stopper answered 12/9, 2020 at 0:12 Comment(1)
in .NET, you'd just connect any serializer to a GZipStream instance, and: you're doneEllerey

© 2022 - 2024 — McMap. All rights reserved.