Binary deserialization without object definition
Asked Answered
P

5

6

I'm trying to read a binary serialized object, I don't have the object definition/source for it. I took a peak into the file and saw property names, so I manually recreated the object (let's call it SomeDataFormat).

I ended up with this :

public class SomeDataFormat // 16 field
{
    public string Name{ get; set; }
    public int Country{ get; set; } 
    public string UserEmail{ get; set; }
    public bool IsCaptchaDisplayed{ get; set; }
    public bool IsForgotPasswordCaptchaDisplayed{ get; set; }
    public bool IsSaveChecked{ get; set; }
    public string SessionId{ get; set; } 
    public int SelectedLanguage{ get; set; } 
    public int SelectedUiCulture{ get; set; } 
    public int SecurityImageRefId{ get; set; } 
    public int LogOnId{ get; set; } 
    public bool BetaLogOn{ get; set; } 
    public int Amount{ get; set; }
    public int CurrencyTo{ get; set; }
    public int Delivery{ get; set; } 
    public bool displaySSN{ get; set; }
}   

Now I'm able to deserialize it like this :

BinaryFormatter formatter = new BinaryFormatter();  
formatter.AssemblyFormat = FormatterAssemblyStyle.Full; // original uses this       
formatter.TypeFormat = FormatterTypeStyle.TypesWhenNeeded; // this reduces size
FileStream readStream = new FileStream("data.dat", FileMode.Open);
SomeDataFormat data = (SomeDataFormat) formatter.Deserialize(readStream);

First suspicious thing is that only the 2 string (SessionId & UserEmail) has value in the deserialized data object. The other properties are null or just 0. This might be intended, but still, I suspect that something has gone wrone during the deserialization.

The second suspicious thing is if I reserialize this object, I end up with different file sizes. Original (695 bytes). Reserialized object is 698 bytes. So there is 3bytes difference. I should get the same file size as the original.

Taking a look at the original, and the new (reserialized) file:

The originally serialized file: (zoom) enter image description here The reserialized file: (zoom) enter image description here

As you can see, after the header section, the data appears to be in different order. For example, you can see that the email, and the sessionID is not at the same place.

UPDATE: Will warned me that the byte coming after the "PublicKeyToken=null" is also different. (03 <-> 05)

  • Q1: Why are the values are in different order in the two files?
  • Q2: Why is there extra 3 bytes compared the 2 serialized objects?
  • Q3: What am I missing? How could I do this?

Any help is appreciated.


Kind of related questions: 1 2 3

Prithee answered 1/8, 2013 at 14:21 Comment(4)
You should check that Data_reSerialized.dat will Deserialize and report what size it Serialize produces; i.e. what size is Data_reReSerialized.dat?Fetterlock
You mean what the size of Data_reReSerialized.dat when I deserialize it? I will report back with the results later today.Prithee
@MarkHurd I managed to reserialize the object, and now it's only 3 bytes bigger than it should be. I don't manipulate the data at all, something must be wrong in my object definiton, or I'm missing an option somewhere. I'll post pictures soon.Prithee
I assume you've looked at the first Related Question on the right: How to analyse contents of binary serialization stream?Fetterlock
C
8

Why are the values are in different order in the two files?

That is because member order is not based on the declaration ordering. http://msdn.microsoft.com/en-us/library/424c79hc.aspx

The GetMembers method does not return members in a particular order, such as alphabetical or declaration order. Your code must not depend on the order in which members are returned, because that order varies.

.

Why is there extra 3 bytes compared the 2 serialized objects?

First the TypeFormat 'TypesWhenNeeded' should actually be 'TypesAlways'. That is why there are so many differences. For example the 05 after '=null' becoming 03 is due to that.

Second you don't have the correct types. Looking at BinaryFormatter in ILSpy and the hex dump reveals that the members you marked as 'int' are actually 'string'.

public class SomeDataFormat // 16 field
{
    public string Name { get; set; }
    public string Country { get; set; } 
    public string UserEmail{ get; set; }
    public bool IsCaptchaDisplayed{ get; set; }
    public bool IsForgotPasswordCaptchaDisplayed{ get; set; }
    public bool IsSaveChecked{ get; set; }
    public string SessionId{ get; set; } 
    public string SelectedLanguage{ get; set; } 
    public string SelectedUiCulture{ get; set; } 
    public string SecurityImageRefId{ get; set; } 
    public string LogOnId{ get; set; } 
    public bool BetaLogOn{ get; set; } 
    public string Amount{ get; set; }
    public string CurrencyTo{ get; set; }
    public string Delivery{ get; set; } 
    public bool displaySSN{ get; set; }
}

What am I missing? How could I do this?

I don't see a way to do it with the given BinaryFormatter. You could decompile/reverse the way BinaryFormatter works.

Carvajal answered 9/8, 2013 at 14:43 Comment(0)
E
6

Because it is maybe of interest for someone I decided to do this post about What does the binary format of serialized .NET objects look like and how can we interpret it correctly?

I have based all my research on the .NET Remoting: Binary Format Data Structure specification.



Example class:

To have a working example, I have created a simple class called A which contains 2 properties, one string and one integer value, they are called SomeString and SomeValue.

Class A looks like this:

[Serializable()]
public class A
{
    public string SomeString
    {
        get;
        set;
    }

    public int SomeValue
    {
        get;
        set;
    }
}

For the serialization I used the BinaryFormatter of course:

BinaryFormatter bf = new BinaryFormatter();
StreamWriter sw = new StreamWriter("test.txt");
bf.Serialize(sw.BaseStream, new A() { SomeString = "abc", SomeValue = 123 });
sw.Close();

As can be seen, I passed a new instance of class A containing abc and 123 as values.



Example result data:

If we look at the serialized result in an hex editor, we get something like this:

Example result data



Let us interpret the example result data:

According to the above mentioned specification (here is the direct link to the PDF: [MS-NRBF].pdf) every record within the stream is identified by the RecordTypeEnumeration. Section 2.1.2.1 RecordTypeNumeration states:

This enumeration identifies the type of the record. Each record (except for MemberPrimitiveUnTyped) starts with a record type enumeration. The size of the enumeration is one BYTE.



SerializationHeaderRecord:

So if we look back at the data we got, we can start interpreting the first byte:

SerializationHeaderRecord_RecordTypeEnumeration

As stated in 2.1.2.1 RecordTypeEnumeration a value of 0 identifies the SerializationHeaderRecord which is specified in 2.6.1 SerializationHeaderRecord:

The SerializationHeaderRecord record MUST be the first record in a binary serialization. This record has the major and minor version of the format and the IDs of the top object and the headers.

It consists of:

  • RecordTypeEnum (1 byte)
  • RootId (4 bytes)
  • HeaderId (4 bytes)
  • MajorVersion (4 bytes)
  • MinorVersion (4 bytes)



With that knowledge we can interpret the record containing 17 bytes:

SerializationHeaderRecord_Complete

00 represents the RecordTypeEnumeration which is SerializationHeaderRecord in our case.

01 00 00 00 represents the RootId

If neither the BinaryMethodCall nor BinaryMethodReturn record is present in the serialization stream, the value of this field MUST contain the ObjectId of a Class, Array, or BinaryObjectString record contained in the serialization stream.

So in our case this should be the ObjectId with the value 1 (because the data is serialized using little-endian) which we will hopefully see again ;-)

FF FF FF FF represents the HeaderId

01 00 00 00 represents the MajorVersion

00 00 00 00 represents the MinorVersion



BinaryLibrary:

As specified, each record must begin with the RecordTypeEnumeration. As the last record is complete, we must assume that a new one begins.

Let us interpret the next byte:

BinaryLibraryRecord_RecordTypeEnumeration

As we can see, in our example the SerializationHeaderRecord it is followed by the BinaryLibrary record:

The BinaryLibrary record associates an INT32 ID (as specified in [MS-DTYP] section 2.2.22) with a Library name. This allows other records to reference the Library name by using the ID. This approach reduces the wire size when there are multiple records that reference the same Library name.

It consists of:

  • RecordTypeEnum (1 byte)
  • LibraryId (4 bytes)
  • LibraryName (variable number of bytes (which is a LengthPrefixedString))



As stated in 2.1.1.6 LengthPrefixedString...

The LengthPrefixedString represents a string value. The string is prefixed by the length of the UTF-8 encoded string in bytes. The length is encoded in a variable-length field with a minimum of 1 byte and a maximum of 5 bytes. To minimize the wire size, length is encoded as a variable-length field.

In our simple example the length is always encoded using 1 byte. With that knowledge we can continue the interpretation of the bytes in the stream:

BinaryLibraryRecord_RecordTypeEnumeration_LibraryId

0C represents the RecordTypeEnumeration which identifies the BinaryLibrary record.

02 00 00 00 represents the LibraryId which is 2 in our case.



Now the LengthPrefixedString follows:

BinaryLibraryRecord_RecordTypeEnumeration_LibraryId_LibraryName

42 represents the length information of the LengthPrefixedString which contains the LibraryName.

In our case the length information of 42 (decimal 66) tell's us, that we need to read the next 66 bytes and interpret them as the LibraryName.

As already stated, the string is UTF-8 encoded, so the result of the bytes above would be something like: _WorkSpace_, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null



ClassWithMembersAndTypes:

Again, the record is complete so we interpret the RecordTypeEnumeration of the next one:

ClassWithMembersAndTypesRecord_RecordTypeEnumeration

05 identifies a ClassWithMembersAndTypes record. Section 2.3.2.1 ClassWithMembersAndTypes states:

The ClassWithMembersAndTypes record is the most verbose of the Class records. It contains metadata about Members, including the names and Remoting Types of the Members. It also contains a Library ID that references the Library Name of the Class.

It consists of:

  • RecordTypeEnum (1 byte)
  • ClassInfo (variable number of bytes)
  • MemberTypeInfo (variable number of bytes)
  • LibraryId (4 bytes)



ClassInfo:

As stated in 2.3.1.1 ClassInfo the record consists of:

  • ObjectId (4 bytes)
  • Name (variable number of bytes (which is again a LengthPrefixedString))
  • MemberCount(4 bytes)
  • MemberNames (which is a sequence of LengthPrefixedString's where the number of items MUST be equal to the value specified in the MemberCount field.)



Back to the raw data, step by step:

ClassWithMembersAndTypesRecord_RecordTypeEnumeration_ClassInfo_ObjectId

01 00 00 00 represents the ObjectId. We've already seen this one, it was specified as the RootId in the SerializationHeaderRecord.

ClassWithMembersAndTypesRecord_RecordTypeEnumeration_ClassInfo_ObjectId_Name

0F 53 74 61 63 6B 4F 76 65 72 46 6C 6F 77 2E 41 represents the Name of the class which is represented by using a LengthPrefixedString. As mentioned, in our example the length of the string is defined with 1 byte so the first byte 0F specifies that 15 bytes must be read and decoded using UTF-8. The result looks something like this: StackOverFlow.A - so obviously I used StackOverFlow as name of the namespace.

ClassWithMembersAndTypesRecord_RecordTypeEnumeration_ClassInfo_ObjectId_Name_MemberCount

02 00 00 00 represents the MemberCount, it tell's us that 2 members, both represented with LengthPrefixedString's will follow.

Name of the first member: ClassWithMembersAndTypesRecord_MemberNameOne

1B 3C 53 6F 6D 65 53 74 72 69 6E 67 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the first MemberName, 1B is again the length of the string which is 27 bytes in length an results in something like this: <SomeString>k__BackingField.

Name of the second member: ClassWithMembersAndTypesRecord_MemberNameTwo

1A 3C 53 6F 6D 65 56 61 6C 75 65 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the second MemberName, 1A specifies that the string is 26 bytes long. It results in something like this: <SomeValue>k__BackingField.



MemberTypeInfo:

After the ClassInfo the MemberTypeInfo follows.

Section 2.3.1.2 - MemberTypeInfo states, that the structure contains:

  • BinaryTypeEnums (variable in length)

A sequence of BinaryTypeEnumeration values that represents the Member Types that are being transferred. The Array MUST:

  • Have the same number of items as the MemberNames field of the ClassInfo structure.

  • Be ordered such that the BinaryTypeEnumeration corresponds to the Member name in the MemberNames field of the ClassInfo structure.

  • AdditionalInfos (variable in length), depending on the BinaryTpeEnum additional info may or may not be present.

| BinaryTypeEnum | AdditionalInfos |
|----------------+--------------------------|
| Primitive | PrimitiveTypeEnumeration |
| String | None |

So taking that into consideration we are almost there... We expect 2 BinaryTypeEnumeration values (because we had 2 members in the MemberNames).



Again, back to the raw data of the complete MemberTypeInfo record:

ClassWithMembersAndTypesRecord_MemberTypeInfo

01 represents the BinaryTypeEnumeration of the first member, according to 2.1.2.2 BinaryTypeEnumeration we can expect a String and it is represented using a LengthPrefixedString.

00 represents the BinaryTypeEnumeration of the second member, and again, according to the specification, it is a Primitive. As stated above, Primitive's are followed by additional information, in this case a PrimitiveTypeEnumeration. That's why we need to read the next byte, which is 08, match it with the table stated in 2.1.2.3 PrimitiveTypeEnumeration and be surprised to notice that we can expect an Int32 which is represented by 4 bytes, as stated in some other document about basic datatypes.



LibraryId:

After the MemerTypeInfo the LibraryId follows, it is represented by 4 bytes:

ClassWithMembersAndTypesRecord_LibraryId

02 00 00 00 represents the LibraryId which is 2.



The values:

As specified in 2.3 Class Records:

The values of the Members of the Class MUST be serialized as records that follow this record, as specified in section 2.7. The order of the records MUST match the order of MemberNames as specified in the ClassInfo (section 2.3.1.1) structure.

That's why we can now expect the values of the members.

Let us look at the last few bytes:

BinaryObjectStringRecord_RecordTypeEnumeration

06 identifies an BinaryObjectString. It represents the value of our SomeString property (the <SomeString>k__BackingField to be exact).

According to 2.5.7 BinaryObjectString it contains:

  • RecordTypeEnum (1 byte)
  • ObjectId (4 bytes)
  • Value (variable length, represented as a LengthPrefixedString)



So knowing that, we can clearly identify that

BinaryObjectStringRecord_RecordTypeEnumeration_ObjectId_MemberOneValue

03 00 00 00 represents the ObjectId.

03 61 62 63 represents the Value where 03 is the length of the string itself and 61 62 63 are the content bytes that translate to abc.

Hopefully you can remember that there was a second member, an Int32. Knowing that the Int32 is represented by using 4 bytes, we can conclude, that

BinaryObjectStringRecord_RecordTypeEnumeration_ObjectId_MemberOneValue_MemberTwoValue

must be the Value of our second member. 7B hexadecimal equals 123 decimal which seems to fit our example code.

So here is the complete ClassWithMembersAndTypes record: ClassWithMembersAndTypesRecord_Complete



MessageEnd:

MessageEnd_RecordTypeEnumeration

Finally the last byte 0B represents the MessageEnd record.

Edmondson answered 4/11, 2016 at 10:8 Comment(1)
B
3

If I'm not mistaken the binary serializer dumps some information about the object type name and namespace. If these values differ from the original class type and your new "SomeDataFormat" it may explain the size difference.

Have you tried comparing the two files with a hex-editor?

Boyfriend answered 1/8, 2013 at 14:41 Comment(2)
That was the first thing to check. I was able to read the object variable names, so I created that class with those properties in it. Then I tried to deserialize it, when it complained about "cant convert int to bool" and stuff, so it actually told me what data type it's expecting. I corrected the types and then it deserialized just fine.Prithee
I tinkered around with the binaryFormatter properties, and found out that I can exclude type information, which yields 698bytes at reserialization instead of the previous 672, so now there is only 3 extra bytes. Also, looking at the original and the new serialized objects in hex editor, I see that the data is in different order. I'll probably make some pictures.Prithee
S
2

When you do the deserialization some thing will upcast just fine. For example

public class SomeClass()
{
   public short SomeProperty {get;set;}
}

will deserialize into

public class SomeClass()
{
   public long SomeProperty {get;set;}
}

But if you serialize the second SomeClass (i.e. the one with long) it will result in a different size that the serialization of SomeClass with a short. In this particular case 6 bytes.

Update:

Deserialize into a generic object and then use reflection to get at the types. You would probably have to do recursion and special handling for a complex object.

using (var fileStream = new FileStream("TestFormatter.dat", FileMode.Open))
        {
            var binaryFormatter = new BinaryFormatter();
            var myObject = binaryFormatter.Deserialize(fileStream);
            var objectProperties = myObject.GetType().GetProperties();
            foreach (var property in objectProperties)
            {
                var propertyTypeName = property.PropertyType.Name; //This will tell you the property Type Name. I.e. string, int64 (long)
            }                
        }
Strephon answered 1/8, 2013 at 15:1 Comment(4)
Can I assume that the deserialized file contains all field names used in the original object? Or the only non null, used field names got serialized into that file? Also, is there a way to know for sure what type is SomeProperty?Prithee
By default BinaryFormatter will include all fields even if they are null. Depending on how complex the object you could probably use reflection on the generic object graph. I'll update my answer with a bit on how to do that.Strephon
I tried your reflection method, and it gave me the same types that I'm using right now. So I guess it means that I'm stuck.Prithee
I'm out of ideas. Maybe you can figure out a way to get your hands on the original object that was serialized. Do you end up with the same size issue if you just serialize the generic object.Strephon
L
1

The remaining discrepancies could be from missing attributes on your class. Try this:

[StructLayout(LayoutKind.Sequential, Pack=1)]
public class SomeDataFormat // 16 field
{
   ...
Lasko answered 12/8, 2013 at 6:41 Comment(1)
Will solved the problem, however this was a good idea, thank you :)Prithee

© 2022 - 2024 — McMap. All rights reserved.