How does BinaryFormatter.Deserialize create new objects?

Asked 17/8, 2010 at 7:58 Answered 30/11, 2010 at 12:47

c#constructor serialization binary-serialization

When BinaryFormatter deserializes a stream into objects, it appears to create new objects without calling constructors.

How is it doing this? And why? Is there anything else in .NET that does this?

Here's a demo:

[Serializable]
public class Car
{
    public static int constructionCount = 0;

    public Car()
    {
        constructionCount++;
    }
}

public class Test
{
    public static void Main(string[] args)
    {
        // Construct a car
        Car car1 = new Car();

        // Serialize and then deserialize to create a second, identical car
        MemoryStream stream = new MemoryStream();
        BinaryFormatter formatter = new BinaryFormatter();
        formatter.Serialize(stream, car1);
        stream.Seek(0, SeekOrigin.Begin);
        Car car2 = (Car)formatter.Deserialize(stream);

        // Wait, what happened?
        Console.WriteLine("Cars constructed: " + Car.constructionCount);
        if (car2 != null && car2 != car1)
        {
            Console.WriteLine("But there are actually two.");
        }
    }
}

Output:

Cars constructed: 1
But there are actually two.

Ake answered 17/8, 2010 at 7:58 Comment(4)

Good question. To work around this, you will need to do some pointer/reference fixups during deserialization, which might be hard or even impossible. Note the fact, that new Car was only called once. You might want to try this in 2 processes. – Western 17/8, 2010 at 8:3

possible duplicate of DataContractSerializer doesn't call my constructor ?? – Fossilize 17/8, 2010 at 8:5

Note: The other question I linked to is about DataContractSerializer, but the explanation is the same for BinaryFormatter – Fossilize 17/8, 2010 at 8:6

@Thomas: Thanks, that answers it. FormatterServices.GetUninitializedObject() is bizarre. – Ake 18/8, 2010 at 0:25

There are two things calling a constructor does (or at least should do).

One is to set aside a certain amount of memory for the object and does all the housekeeping necessary for it to be an object to the rest of the .NET world (note certain amount of handwaving in this explanation).

The other is to put the object into a valid initial state, perhaps based on parameters - this is what the actual code in the constructor will do.

Deserialisation does much the same thing as the first step by calling FormatterServices.GetUninitializedObject, and then does much the same thing as the second step by setting the values for fields to be equivalent to those that were recorded during serialisation (which may require deserialising other objects to be said values).

Now, the state that deserialisation is putting the object into may not correspond to that possible by any constructor. At best it will be wasteful (all values set by the constructor will be overwritten) and at worse it could be dangerous (constructor has some side-effect). It could also just be impossible (only constructor is one that takes parameters - serialisation has no way of knowing what arguments to use).

You could look at it as a special sort of constructor only used by deserialisation (OO purists will - and should - shudder at the idea of a constructor that doesn't construct, I mean this as an analogy only, if you know C++ think of the way overriding new works as far as memory goes and you've an even better analogy, though still just an analogy).

Now, this can be a problem in some cases - maybe we have readonly fields that can only be set by a constructor, or maybe we have side-effects that we want to happen.

A solution to both is to override serialisation behaviour with ISerializable. This will serialise based on a call to ISerializable.GetObjectData and then call a particular constructor with SerializationInfo and StreamingContext fields to deserialise (said constructor can even be private - meaning most other code won't even see it). Hence if we can deserialise readonly fields and have any side-effects we want (we can also do all manner of things to control just what is serialised and how).

If we just care about ensuring some side-effect happens on deserialisation that would happen on construction, we can implement IDeserializationCallback and we will have IDeserializationCallback.OnDeserialization called when deserialisation is complete.

As for other things that do the same thing as this, there are other forms of serialisation in .NET but that's all I know of. It is possible to call FormatterServices.GetUninitializedObject yourself but barring a case where you have a strong guarantee that subsequent code will put the object produced into a valid state (i.e. precisely the sort of situation you are in when deserialising an object from data produced by serialising the same sort of object) doing such is fraught and a good way to produce a really hard to diagnose bug.

Tubercle answered 30/11, 2010 at 12:40 Comment(1)

+1 - IDeserializationCallback is a great idea. Use it to initialize necessary private fields, etc. Solved my issue! – Velma 28/1, 2011 at 9:10

The thing is, BinaryFormatter isn't really making your particular object. It's putting an object graph back into memory. The object graph is basically the representation of your object in memory; this was created when the object is serialized. Then, the deserialize call basically just sticks that graph back in memory as an object at an open pointer, and then it gets casted to what it actually is by the code. If it's casted wrong, then an exception is thrown.

As to your particular example, you are only really constructing one car; you are just making an exact duplicate of that car. When you serialize it off into the stream, you store an exact binary copy of it. When you deserialize it, you don't have to construct anything. It just sticks the graph in memory at some pointer value as an object and lets you do whatever you want with it.

Your comparison of car1 != car2 is true because of that different pointer location, since Car is a reference type.

Why? Frankly, it's easy to just go pull the binary representation, rather than having to go and pull each property and all that.

I'm not sure whether anything else in .NET uses this same procedure; the most likely candidates would be anything else that uses an object's binary in some format during serialization.

Punishable answered 17/8, 2010 at 20:3 Comment(0)

Not sure why the constructor does not get called but I use IDeserializationCallback as a work around.

also take a look at

OnSerializingAttribute

OnSerializedAttribute

OnDeserializingAttribute

OnDeserializedAttribute

Casta answered 30/11, 2010 at 12:47 Comment(0)

Recommended topics

Hot tags