Backwards compatibility in .NET with BinaryFormatter
Asked Answered
C

3

11

We use BinaryFormatter in a C# game, to save user game progress, game levels, etc. We are running into the problem of backwards compatibility.

The aims:

  • Level designer creates campaign (levels&rules), we change the code, the campaign should still work fine. This can happen everyday during development before release.
  • User saves game, we release a game patch, user should still be able to load game
  • The invisible data-conversion process should work no matter how distant the two versions are. For example an user can skip our first 5 minor updates and get the 6th directly. Still, his saved games should still load fine.

The solution needs to be completely invisible to users and level designers, and minimally burden coders who want to change something (e.g. rename a field because they thought of a better name).

Some object graphs we serialize are rooted in one class, some in others. Forward compatibility is not needed.

Potentially breaking changes (and what happens when we serialize the old version and deserialize into the new):

  • add field (gets default-initialized)
  • change field type (failure)
  • rename field (equivalent to removing it and adding a new one)
  • change property to field and back (equivalent to a rename)
  • change autoimplemented property to use backing field (equivalent to a rename)
  • add superclass (equivalent to adding its fields to the current class)
  • interpret a field differently (e.g. was in degrees, now in radians)
  • for types implementing ISerializable we may change our implementation of the ISerializable methods (e.g. start using compression within the ISerializable implementation for some really large type)
  • Rename a class, rename an enum value

I have read about:

My current solution:

  • We make as many changes as possible non-breaking, by using stuff like the OnDeserializing callback.
  • We schedule breaking changes for once every 2 weeks, so there's less compatibility code to keep around.
  • Everytime before we make a breaking change, we copy all the [Serializable] classes we use, into a namespace/folder called OldClassVersions.VersionX (where X is the next ordinal number after the last one). We do this even if we aren't going to be making a release soon.
  • When writing to file, what we serialize is an instance of this class: class SaveFileData { int version; object data; }
  • When reading from file, we deserialize the SaveFileData and pass it to an iterative "update" routine that does something like this:

.

for(int i = loadedData.version; i < CurrentVersion; i++)
{
    // Update() takes an instance of OldVersions.VersionX.TheClass
    // and returns an instance of OldVersions.VersionXPlus1.TheClass
    loadedData.data = Update(loadedData.data, i);
}
  • For convenience, the Update() function, in its implementation, can use a CopyOverlappingPart() function that uses reflection to copy as much data as possible from the old version to the new version. This way, the Update() function can only handle stuff that actually changed.

Some problems with that:

  • the deserializer deserializes to class Foo rather than to class OldClassVersions.Version5.Foo - because class Foo is what was serialized.
  • almost impossible to test or debug
  • requires to keep around old copies of a lot of classes, which is error-prone, fragile and annoying
  • I don't know what to do when we want to rename a class

This should be a really common problem. How do people usually solve it?

Carleycarli answered 27/8, 2010 at 11:29 Comment(9)
Did you decide to switch to xml serialization or did you find a better way to do this? XML serialization has limitations that will not work for my program so I am planning on following your method with some of K.Hoffmann's additions.Soutache
@i8abug: I switched to XML serialization, yes. If you tell me the limitations that are troubling you, I might tell you a way around them, if I know one.Carleycarli
Thanks! For starters, I need to serialize private and protected members and generic dictionaries. I tried looking at DataContract serialization but I need to serialize unknown inherited classes(written by other developers) and this is not possible with DataContracts. Binary serialization seems to be the only thing that works. I also checked out protobuf-net but ran into some limitations as well. When you switched to XML serialization, how did you handle verisioning? Did you create a specific interpreter for each version of your XML or did you still do something like you have described above?Soutache
About private members - yes, I also needed that, so I'm using DataContractSerializer to do my serialization to XML. Also, I didn't want to mark up my classes with [DataMember] attributes, so I made my classes [Serializable] rather than [DataContract] (but still used DCS to serialize them. it actually works). Apparently this fails in the "unknown inherited classes" case, though, unless you require your inheritors to add themselves at app startup to some static "List<Type> globalKnownTypes" that your serialization code later passes as the "knownTypes" arg to DCS. Come to think of it, you [...]Carleycarli
[...] could maybe do this automatically: add this code to the ctor of your base class: "Foo.globalKnownTypes.Add(this.GetType());". This way, when the derived class is instantiated, it will auto-register itself as a "known type" (because it calls the base class ctor). But I may be totally off the mark - I guess MS didn't intend you to do this. About versioning: I ended up never implementing it so far. All I did was start using XML, so, in the future, if I really need to make a breaking change, I can hand-edit the XML. In the meantime I've just been using [OptionalField] and [OnDeserialized].Carleycarli
@i8abug: Also, if I have to do versioning eventually, it won't be the way I've described it in the question - I've realized that the "OldClassVersions.VersionX" thing would be a mess. Instead, I'd write code that updates the XML before passing it to DCS. AZ's answer gave me the idea as he mentioned XSLT. This has some problems though, which I haven't yet solved (e.g. in class Foo I want to rename Field1 to Field2. But when reading the XML I only see that an element is of type <Field1> - I don't know whether it's a Field1 contained in a "Foo" or an unrelated Field1 from some other class.)Carleycarli
Awesome comments. Thanks. I did not think of setting the known types at runtime. I may try that. I did run into a couple other problems using DataContract serialization but I didn't try too hard because of the known types issue. I really appreciate the advice.Soutache
@i8abug: Let me know if you implement it and it works for you. Then I'd be interested to hear how you've handled the problem of updating the XML from version A to version B.Carleycarli
any final solution with full source code sample?Cyd
U
3

Tough one. I would dump binary and use XML serialization (easier to manage, tolerant to changes that are not too extreme - like adding / removing fields). In more extreme cases it is easier to write a transform (xslt perhaps) from one version to another and keep the classes clean. If opacity and small disk footprint are a requirement you can try to compress the data before writing to disk.

Up answered 27/8, 2010 at 11:47 Comment(2)
BinarySerialization is version-tolerant for small changes, including adding/removing fields. What exactly do you mean by "easier to manage"? XSLT sounds like a great solution, actually. And no, file size and performance aren't an issue.Carleycarli
by "easier to manage" i'm referring to the human readable nature of the XML format and to the decoupling of the data representation from the class structure (you can control how the serialized XML can look like through attributes and the structure does not need to mirror the actual class)Up
P
3

We got the same problem in our application with storing user profile data (grid column arrangement, filter settings ...).

In our case the problem was the AssemblyVersion.

For this problem i create a SerializationBinder which reads the actual assembly version of the assemblies (all assemblies get a new version number on new deployment) with Assembly.GetExecutingAssembly().GetName().Version.

In the overriden method BindToType the type info is created with the new assembly version.

The deserialization is implemented 'by hand', that means

  • Deserialize via normal BinaryFormatter
  • get all fields which have to be deserialized (annotated with own attribute)
  • fill object with data from the deserialized object

Works with all our data and since three or four releases.

Piscary answered 27/8, 2010 at 11:47 Comment(0)
U
3

Tough one. I would dump binary and use XML serialization (easier to manage, tolerant to changes that are not too extreme - like adding / removing fields). In more extreme cases it is easier to write a transform (xslt perhaps) from one version to another and keep the classes clean. If opacity and small disk footprint are a requirement you can try to compress the data before writing to disk.

Up answered 27/8, 2010 at 11:47 Comment(2)
BinarySerialization is version-tolerant for small changes, including adding/removing fields. What exactly do you mean by "easier to manage"? XSLT sounds like a great solution, actually. And no, file size and performance aren't an issue.Carleycarli
by "easier to manage" i'm referring to the human readable nature of the XML format and to the decoupling of the data representation from the class structure (you can control how the serialized XML can look like through attributes and the structure does not need to mirror the actual class)Up
S
2

This is a really old question, but it needs an up-to-date answer anyway; I know this strays slightly off topic so bear with me. Today, in 2019: I would suggest to people who happen to read this at a stage in your project where this is reasonably feasible to seriously consider using Protobuf instead of BinaryFormatter. It has most of the advantages of a binary format (which it is) but fewer of its disadvantages.

  • It works between different languages and technology stacks with ease (Java, .NET, C++, Go, Python)
  • It has a well-thought-through strategy for handling breaking changes (adding/removing fields, etc) in a way that means it's much easier for "version x" of your software to handle "version y"-generated data and the other way around. Yes, this is actually true: an older version of your app will be able to handle data serialized with a newer version of the Protobuf .proto interface definition. (Non-present fields will simply be ignored when deserializing.)

    By comparison, when running a newer versions of the code and deserializing old data, "not-present" fields in the data will be set to their type-specific default value. In that sense, handling old data is not "fully automatic" in that sense, but still a lot simpler than when using the default binary serialization libraries included with platforms like Java and .NET.

If you prefer a non-binary format, JSON is often a suitable choice. For RPC and such scenarios, Protobuf is better though and is even officially being mentioned/endorsed by Microsoft nowadays: Introduction to gRPC on ASP.NET Core. (gRPC is a technology stack built on top of Protobuf)

Swetiana answered 7/9, 2019 at 19:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.