Binary object graph serialization

Asked 3/8, 2011 at 8:50 Answered 3/8, 2011 at 9:4

Solved c#.net serialization protocol-buffers binaryformatter

I'm looking for advice on serialization in a .net app. The app is a desktop/thick client app and the serialization represents the persisted document format. The requirements for the serializer is

Must allow serializing fields, not public properties only.
Must not require parameterless constructors.
Must handle general object graphs, i.e. not only DAG but shared/bidirectional references.
Must work with framework classes (e.g. Serialize Dictionaries).

Currently we use the BinaryFormatter which handles all of the above quite well, but size/performance and version tolerance is an issue. We use the [OnDeserialized/ing] attributes to provide compatibility, but it does not allow for large refactorings (say a namespace change) without complex use of surrogates and so on.

An ideal solution would be a drop-in replacement for BinaryFormatter that works with our existing [NonSerialized] annotations etc., but performs better, and produces a format that is smaller and easier to maintain.

I have looked at the different protobuf implementations, and even though it seems possible to serialize general object graphs/enums/structs these days, it does not appear trivial to serialize a complex graph with a lot of framework collection types etc. Also, even if we could make it work with fields rather than properties I understand it would still mean having to add parameterless constructors and protobuf annotations to all classes (The domain is around 1000 classes).

So the questions:

Are there any "alternative" Binary formatters, that provide a well documented format, perform better?
Are protocol buffers ever suitable for persisting large general object graphs including framework types?

Terefah answered 3/8, 2011 at 8:50 Comment(0)

Protocol buffers as a format has no official support for object graphs, but protobuf-net does provide this, and meets your other requirements. To take the points in turn:

Must allow serializing fields, not public properties only

Sure; protobuf-net can do that for both public and non-public fields; tell it about the fields at either runtime or via attributes

Must not require parameterless constructors.

That is available in "v2" - again, you can tell it to skip the constructor at runtime or via attributes (SkipConstructor=true on the contract)

Must handle general object graphs, i.e. not only DAG but shared/bidirectional references.

Sure; mark AsReference=true on a member

Must work with framework classes (e.g. Serialize Dictionaries).

Standard lists and dictionaries work fine; however, I have an outstanding change request to support AsReference inside a dictionary. Meaning, Dictionary<string, Foo> won't currently run the graph code for Foo, but I can probably find a few moments to look at this if it is causing you significant pain

We use the [OnDeserialized/ing] attributes to provide compatibility

Serialization callbacks are fully supported

but it does not allow for large refactorings (say a namespace change) without complex use of surrogates and so on.

Namespaces etc are not at all interesting to protobuf-net (unless you are using the DynamicType options)

it would still mean having to add parameterless constructors and protobuf annotations to all classes

Not necessarily; if you can guarantee that you won't change the field names, you can ask it to infer the field numbers internally - and ultimately in "v2" everything can be specified at runtime, so you can often write a small configuration loop that runs at app-startup and uses reflection to configure the system. Then you do not need to change your existing code at all.

Parang answered 3/8, 2011 at 9:4 Comment(9)

Thanks for your quick and detailed responser Marc. I'm going to have a look and see what has happened lately in "v2". Could I implement this strategy: Assume an un-annotated class has implicit protobuf annotations such that all serializable fields are included, say in alphabetical order. All reference fields AsReference. Once I change a class, I could add explicit attributes? Also, does ISerializable/IObjectReference/IDeserializationCallback work (Say I have ? – Terefah 3/8, 2011 at 9:50

@Anders - ISerializable; no. IDeserializationCallback/IObjectReference - I'd need to check. Re the "change a class" - if I understand the question correctly, then yes - you could add attributes at a later date to preserve compatibility with an earlier definition. – Parang 3/8, 2011 at 11:58

Marc, I have successfully created a model by reflecting my domain types and adding all fields AsReference. One issue I have with framework classes is the AsReference (perhaps this is what you are saying in your answer): If I have var shapes = new List<Shape> { circle1, circle1, rect1 }; i.e., the same object occuring twice, I'd like to retain that graph through serialization. Could I persuade the writer to skip sequence serialization for collections, and just serialize fields (e.g. the internal array of a List<T>), and be reference-aware for all reference fields including framework classes? – Terefah 4/8, 2011 at 10:33

@Anders the issue of tracking reference inside collections is simply a "damn I need to find a few minutes to tweak that" (and find a way to instruct it to do it). If this is a barrier, I can probably look at that this evening (about 8/9 hours time). – Parang 4/8, 2011 at 10:43

I don't want to push you to bastardize your brainchild Marc. As you might notice, I'm trying to make a drop-in replacement for your nemesis the BinaryFormatter. As such, I'd probably ideally want to completely skip the "repeated data" format in protobuf, exect for arrays. That way a List would be serialized like any other object through a proto model, with its two fields "_items" and "_size". If I could do that, and the arrays did refs properly I'd be set. – Terefah 4/8, 2011 at 10:51

@Anders see, I wouldn't ;p I'd still want to treat it as repeated - all I need to do is plug the already existing reference-tracking decorator into the tail. It isn't a big job. To me, a list is interesting but only as a container. This is also important to retain (where-ever possible) contract semantics (rather than type-based semantics) – Parang 4/8, 2011 at 10:54

I know, but I'd easily give up some contractness if I could serialize nested collections without having to rewrite domain objects... What happens with a collection type that does not support an "add" operation? – Terefah 4/8, 2011 at 11:4

@Anders I guess that could be handled by the existing "surrogate" code, but that is a bit of a pathalogical example. There is also another way to handle that via shim properties, that could be quite useful here. – Parang 4/8, 2011 at 11:24

I think I'm starting to undestand what would need to be done to my domain model (as well as to protobuf-net), if I want to use it as a binary formatter (in the same sense as the BinaryFormatter class). For example I definitely want to distinguish between null and empty collections. Thanks for all your help Marc. – Terefah 4/8, 2011 at 12:52

Try db4o, it's not realy a serializer but as far as I can tell it meets your requirements (complex types, deep graph, inheritance?, dictionaries etc), you don't have to change anything on your objects, and the API is extremely easy to use.

It supports schema versioning/merging.

Theresita answered 3/8, 2011 at 9:1 Comment(0)

Recommended topics

Hot tags