XML vs Binary performance for Serialization/Deserialization
Asked Answered
E

6

9

I'm working on a compact framework application and need to boost performance. The app currently works offline by serializing objects to XML and storing them in a database. Using a profiling tool I could see this was quite a big overhead, slowing the app. I thought if I switched to a binary serialization the performance would increase, but because this is not supported in the compact framework I looked at protobuf-net. The serialization seems quicker, but deserialization much slower and the app is doing more deserializing than serializing.

Should binary serialization should be faster and if so what I can do to speed up the performance? Here's a snippet of how I'm using both XML and binary:

XML serialization:

public string Serialize(T obj)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream();
  XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
  serializer.Serialize(stream, obj);
  stream = (MemoryStream)writer.BaseStream;
  return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));            
  return (T)serializer.Deserialize(stream);
}

Protobuf-net Binary serialization:

public byte[] Serialize(T obj)
{
  byte[] raw;
  using (MemoryStream memoryStream = new MemoryStream())
  {
    Serializer.Serialize(memoryStream, obj);
    raw = memoryStream.ToArray();
  }

  return raw;            
}

public T Deserialize(byte[] serializedType)
{
  T obj;
  using (MemoryStream memoryStream = new MemoryStream(serializedType))
  {
    obj = Serializer.Deserialize<T>(memoryStream);
  }
  return obj;
}
Exsiccate answered 7/7, 2009 at 12:28 Comment(1)
I was going to suggest using Red-Gate ANTS profiler but it doesn't work with the Compact framework (search on google "red-gate ants profiler compact")Lithotomy
E
6

I'm going to correct myself on this, Marc Gravall pointed out the first iteration has an overhead of bulding the model so I've done some tests taking the average of 1000 iterations of serialization and deserialization for both XML and binary. I tried my tests with the v2 of the Compact Framework DLL first, and then with the v3.5 DLL. Here's what I got, time is in ms:

.NET 2.0
================================ XML ====== Binary ===
Serialization 1st Iteration      3236       5508
Deserialization 1st Iteration    1501       318
Serialization Average            9.826      5.525
Deserialization Average          5.525      0.771

.NET 3.5
================================ XML ====== Binary ===
Serialization 1st Iteration      3307       5598
Deserialization 1st Iteration    1386       200
Serialization Average            10.923     5.605
Deserialization Average          5.605      0.279
Exsiccate answered 8/7, 2009 at 12:56 Comment(0)
F
3

The main expense in your method is the actual generation of the XmlSerializer class. Creating the serialiser is a time consuming process which you should only do once for each object type. Try caching the serialisers and see if that improves performance at all.

Following this advice I saw a large performance improvement in my app which allowed me to continute to use XML serialisation.

Hope this helps.

Fiona answered 19/11, 2011 at 17:58 Comment(0)
W
1

Interesting... thoughts:

  • what version of CF is this; 2.0? 3.5? In particular, CF 3.5 has Delegate.CreateDelegate that allows protobuf-net to access properties much faster than in can in CF 2.0
  • are you annotating fields or properties? Again, in CF the reflection optimisations are limited; you can get beter performance in CF 3.5 with properties, as with a field the only option I have available is FieldInfo.SetValue

There are a number of other things that simply don't exist in CF, so it has to make compromises in a few places. For overly complex models there is also a known issue with the generics limitations of CF. A fix is underway, but it is a big change, and is taking "a while".

For info, some metrics on regular (full) .NET comparing various formats (including XmlSerializer and protobuf-net) are here.

Woof answered 7/7, 2009 at 12:34 Comment(8)
I'm using CF2.0, and I've added attributes to the properties for the objects I need to serialize.Exsiccate
Is it possible to try it in CF 3.5 (with the CF 3.5 binary) just to see if that fixes it?Woof
Ok, I've just run my test on CF3.5 and see significant performance increases from CF2; binary performs a lot quicker for both serialization and deserialization. Unfortunately I'm tied to CF2 though so might have to rethink things.Exsiccate
Just to clarify my wording above.. I mean I see significant performance increases in CF3.5; CF2 is slower.Exsiccate
Sorry..scratch that, I read the perf report wrong! Here's what I get testing a simple entity with 3 properties: XML Serialize 317ms XML Deserialize: 7ms Binary Serialize: 147ms Binary Deserialize: 19msExsiccate
Is that averaged over a number of iterations? I'm also not sure whether those numbers are CF2 or CF3.5Woof
The just the results from one test, but comes out very similar each time. Thats on 3.5Exsiccate
For info, the first iteration has the overhead of building the model - subsequent calls may be quicker... I'm intrigued that it is slower than XmlSerializer. I'd love to pull it apart ;-(Woof
L
0

Have you tried creating custom serialization classes for your classes? Instead of using XmlSerializer which is a general purpose serializer (it creates a bunch of classes at runtime). There's a tool for doing this (sgen). You run it during your build process and it generates a custom assembly that can be used in pace of XmlSerializer.

If you have Visual Studio, the option is available under the Build tab of your project's properties.

Lag answered 7/7, 2009 at 12:41 Comment(0)
E
0

Is the performance hit in serializing the objects, or writing them to the database? Since writing them is likely hitting some kind of slow storage, I'd imagine it to be a much bigger perf hit than the serialization step.

Keep in mind that the perf measurements posted by Marc Gravell are testing the performance over 1,000,000 iterations.

What kind of database are you storing them in? Are the objects serialized in memory or straight to storage? How are they being sent to the db? How big are the objects? When one is updated, do you send all of the objects to the database, or just the one that has changed? Are you caching anything in memory at all, or re-reading from storage each time?

Exarate answered 8/7, 2009 at 6:36 Comment(1)
The objects are being stored in a SQLCe database, but I can clearly see that the serialization and deserialization is the performance hit, not the database interaction. Stuff is being cached in memory too, but need to store stuff in a DB so that it can be retreived between sessions of the app.Exsiccate
C
0

XML is often slow to process and takes up a lot of space. There have been a number of different attempts to tackle this, and the most popular today seems to be to just drop the lot in a gzip file, like with the Open Packaging Convention.

The W3C has shown the gzip approach to be less than optimal, and they and various other groups have been working on a better binary serialisation suitable for fast processing and compression, for transmission.

Carniola answered 10/7, 2009 at 18:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.