BinaryFormatter alternatives
Asked Answered
E

3

18

A BinaryFormatter-serialized array of 128³ doubles, takes up 50 MB of space. Serializing an array of 128³ structs with two double fields takes up 150 MB and over 20 seconds to process.

Are there fast simple alternatives that would generate compact files? My expectation is that the above examples would take up 16 and 32 MB, respectively, and under two seconds to process. I took a look at protobuf-net, but it appears that it does not even support struct arrays.

PS: I apologize for making a mistake in recording file sizes. The actual space overhead with BinaryFormatter is not large.

Eastwards answered 4/11, 2009 at 19:2 Comment(0)
C
9

If you use a BinaryWriter instead of a Serializer you will get the desired (mimimal) size.
I'm not sure about the speed, but give it a try.

On my system writing 32MB takes less than 0.5 seconds, including Open and Close of the stream.

You will have to write your own for loops to write the data, like this:

struct Pair
{
    public double X, Y;
}

static void WritePairs(string filename, Pair[] data)
{
    using (var fs = System.IO.File.Create(filename))
    using (var bw = new System.IO.BinaryWriter(fs))
    {
        for (int i = 0; i < data.Length; i++)
        {
            bw.Write(data[i].X);
            bw.Write(data[i].Y);
        }
    }
}

static void ReadPairs(string fileName, Pair[] data)
{
    using (var fs = System.IO.File.OpenRead(fileName))
    using (var br = new System.IO.BinaryReader(fs))
    {
        for (int i = 0; i < data.Length; i++)
        {
            data[i].X = br.ReadDouble();
            data[i].Y = br.ReadDouble();
        }
    }
}
Cheerio answered 4/11, 2009 at 19:30 Comment(6)
Manual serialization can indeed be very fast and compact, but it is also prone to error and time-consuming to write. I expect some overhead, but with BinaryFormatter it is often unreasonable.Eastwards
You can make it a little friendlier with generics and/or interfaces. But start adding meta and you will quickly approach the overhead of Formatters.Cheerio
Spot on Henk. BinaryFormatter will work with just about anything. You should expect better performance from something does exactly what you need and only what you need.Radiopaque
Henk, it is not true that you quickly approach the overhead of BinaryFormatter, since protobuf-net is much faster with similar capabilities, besides not supporting structs.Eastwards
Don, I meant the size overhead. Speed will be determined mostly by I/O.Cheerio
Speed should be determined by I/O. I am glad we are in agreement. But it is not with BinaryFormatter. In this simple example it takes twenty times as long to encode the data as it takes to write it to disk, and it is even slower at decoding.Eastwards
N
5

Serializing means that metadata is added so that the data can be safely deserialized, that's what's causing the overhead. If you serialize the data yourself without any metadata, you end up with 16 MB of data:

foreach (double d in array) {
   byte[] bin = BitConverter.GetBytes(d);
   stream.Write(bin, 0, bin.Length);
}

This of course means that you have to deserialize the data yourself also:

using (BinaryReader reader = new BinaryReader(stream)) {
   for (int i = 0; i < array.Length; i++) {
      byte[] data = reader.ReadBytes(8);
      array[i] = BitConverter.ToDouble(data, 0);
   }
}
Northcliffe answered 4/11, 2009 at 19:29 Comment(0)
R
3

This is more of a comment but it's way too much for one... I'm not able to reproduce your results. There is, however, some additional overhead with the struct.

My testing:

-------------------------------------------------------------------------------
Testing array of structs

Size of double:  8
Size of doubles.bin:  16777244
Size per array item:  8
Milliseconds to serialize:  143
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Testing array of structs

Size of dd struct:  16
Size of structs.bin:  52428991
Size per array item:  25
Milliseconds to serialize:  9678
-------------------------------------------------------------------------------

Code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using System.Diagnostics;

namespace ConsoleApplication5
{
    class Program
    {
        static void Main(string[] args)
        {
            TestDoubleArray();
            TestStructArray();
        }

        private static void TestStructArray()
        {

            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();

            dd[] d1 = new dd[2097152];
            BinaryFormatter f1 = new BinaryFormatter();
            f1.Serialize(File.Create("structs.bin"), d1);

            stopWatch.Stop();

            Debug.WriteLine("-------------------------------------------------------------------------------");
            Debug.WriteLine("Testing array of structs");
            Debug.WriteLine("");
            Debug.WriteLine("Size of dd struct:  " + System.Runtime.InteropServices.Marshal.SizeOf(typeof(dd)).ToString());
            FileInfo fi = new FileInfo("structs.bin");
            Debug.WriteLine("Size of structs.bin:  " + fi.Length.ToString());
            Debug.WriteLine("Size per array item:  " + (fi.Length / 2097152).ToString());
            Debug.WriteLine("Milliseconds to serialize:  " + stopWatch.ElapsedMilliseconds);
            Debug.WriteLine("-------------------------------------------------------------------------------");
        }

        static void TestDoubleArray()
        {
            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();

            double[] d = new double[2097152];
            BinaryFormatter f = new BinaryFormatter();
            f.Serialize(File.Create("doubles.bin"), d);

            stopWatch.Stop();

            Debug.WriteLine("-------------------------------------------------------------------------------");
            Debug.WriteLine("Testing array of structs");
            Debug.WriteLine("");
            Debug.WriteLine("Size of double:  " + sizeof(double).ToString());
            FileInfo fi = new FileInfo("test.bin");
            Debug.WriteLine("Size of doubles.bin:  " + fi.Length.ToString());
            Debug.WriteLine("Size per array item:  " + (fi.Length / 2097152).ToString());
            Debug.WriteLine("Milliseconds to serialize:  " + stopWatch.ElapsedMilliseconds);
            Debug.WriteLine("-------------------------------------------------------------------------------");
        }

        [Serializable]
        struct dd
        {
            double a;
            double b;
        }
    }
}
Radiopaque answered 4/11, 2009 at 19:40 Comment(3)
Thank you for the correction. My bad. The space overhead is not very large. The time the serializer takes, though, is still very significant.Eastwards
Like I commented on Henk's post, you're trading generalization and standardization (BinaryFormatter) for the speed of a specialized class doing its one task very well.Radiopaque
It seems like I am trading way too much speed — an order of magnitude beyond a reasonable amount. It does not have to take that long to generate the code in Henk Holterman's answer.Eastwards

© 2022 - 2024 — McMap. All rights reserved.