How to save & append to a serialized MessagePack binary file in C#?
Asked Answered
C

1

5

I'm trying to use MessagePack to save multiple lists of structs because I read that its performance is better than BinaryFormatter serialization.

What I want to do is to receive real-time time series data and to regularly save(append) it to disk time to time, for example, if the number of elements of a list is 100. My questions are:

1) Is it better to serialize lists of structs and save it to disk asynchronously in this scenario?

2) How to simply save it to disk with MessagePack?

public struct struct_realTime
{
    public int indexNum { get; set; }
    public string currentTime { get; set; }
    public string currentType { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        List<struct_realTime> list_temp = new List<struct_realTime>(100000);

        for (int num=0; num < 100000; num++)
        {
            list_temp.Add(new struct_realTime
            {
                indexNum = 1,
                currentTime = "time",
                currentType = "type",
            });
        }

        string filename = "file.bin";

        using (var fileStream = new FileStream(filename, FileMode.Append, FileAccess.Write))
        {
            byte[] bytes = MessagePackSerializer.Serialize(list_temp);
            Console.WriteLine(MessagePackSerializer.ToJson(bytes));
        }
    }
}

When I run this code, it creates file.bin and prints out 100000 structs, but the file is 0 byte.

When I use BinaryFormatter, I do this:

using (var fileStream = new FileStream("file.bin", FileMode.Append))
{
    BinaryFormatter formatter = new BinaryFormatter();
    formatter.Serialize(fileStream, list_temp);
}

How can I fix the problem?

Calaboose answered 9/11, 2019 at 14:29 Comment(4)
why do you want to append a binary file? Make no sense.Lilian
@Lilian I'm not a programmer, so please understand me! I want to receive data and save it to disk continuously. I used to use just text files but someone here told me that binary files are better in terms of performance. But I don't want to make multiple binary files, and that's how I've reached this question. Could you recommend me any other ways to do this?Calaboose
The issue is retrieving the data. Just putting binary filers continuously into a file you probably can't remove the data later. Unless each file contains a size you will not be able to get the data later.Lilian
For reasons not to use BinaryFormatter for this purpose (or any other) see What are the deficiencies of the built-in BinaryFormatter based .Net serialization?.Halide
H
9

What you are trying to do is to append an object (here List<struct_realTime>) serialized using MessagePackSerializer to a file containing an already-serialized sequence of similar objects, in the same way it is possible with BinaryFormatter, protobuf-net or Json.NET. Later, you presumably want to be able to deserialize the entire sequence into a list or array of objects of the same type.

Your code has three problems, two simple and one fundamental.

The simple problems are as follows:

  • You don't actually write to the fileStream. Instead, do the following:

    // Append each list_temp sequentially
    using (var fileStream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.ReadWrite))
    {
        MessagePackSerializer.Serialize(fileStream, list_temp);
    }
    
  • You haven't marked struct_realTime with [MessagePackObject] attributes. This can be implemented e.g. as follows:

    [MessagePackObject]
    public struct struct_realTime
    {
        [Key(0)]
        public int indexNum { get; set; }
        [Key(1)]
        public string currentTime { get; set; }
        [Key(2)]
        public string currentType { get; set; }
    }
    

Having done that, you can now repeatedly serialize list_temp to a file... but you will not be able to read them afterwards! That's because MessagePackSerializer seems to read the entire file when deserializing the root object, skipping over any additional data appended in the file. Thus code like the following will fail, because only one object gets read from the file:

List<List<struct_realTime>> allItemsInFile = new List<List<struct_realTime>>();
using (var fileStream = File.OpenRead(filename))
{
    while (fileStream.Position < fileStream.Length)
    {
        allItemsInFile.Add(MessagePackSerializer.Deserialize<List<struct_realTime>>(fileStream));                   
    }
}
Assert.IsTrue(allItemsInFile.Count == expectedNumberOfRootItemsInFile);

Demo fiddle #1 here.

And code like the following will fail because the (first) root object in the stream is not an array of arrays of objects, but rather just a single array:

List<List<struct_realTime>> allItemsInFile;
using (var fileStream = File.OpenRead(filename))
{
    allItemsInFile = MessagePackSerializer.Deserialize<List<List<struct_realTime>>>(fileStream);
}
Assert.IsTrue(allItemsInFile.Count == expectedNumberOfRootItemsInFile);

Demo fiddle #2 here.

As MessagePackSerializer seems to lack the ability to deserialize multiple root objects from a stream, what are your options? Firstly, you could deserialize a List<List<struct_realTime>>, append to it, and then serialize the entire thing back to the file. Presumably you don't want to do that for performance reasons.

Secondly, using the MessagePack specification directly, you could manually seek to the beginning of the file to parse and rewrite an appropriate array 32 format header, then seek to the end of the file and use MessagePackSerializer to serialize and append the new item. The following extension method does the job:

public static class MessagePackExtensions
{
    const byte Array32 = 0xdd;
    const int Array32HeaderLength = 5;

    public static void AppendToFile<T>(Stream stream, T item)
    {
        if (stream == null)
            throw new ArgumentNullException(nameof(stream));
        if (!stream.CanSeek)
            throw new ArgumentException("!stream.CanSeek");

        stream.Position = 0;
        var buffer = new byte[Array32HeaderLength];
        var read = stream.Read(buffer, 0, Array32HeaderLength);
        stream.Position = 0;
        if (read == 0)
        {
            FormatArray32Header(buffer, 1);
            stream.Write(buffer, 0, Array32HeaderLength);
        }
        else
        {
            var count = ParseArray32Header(buffer, read);
            FormatArray32Header(buffer, count + 1);
            stream.Write(buffer, 0, Array32HeaderLength);
        }

        stream.Position = stream.Length;
        MessagePackSerializer.Serialize(stream, item);
    }

    static void FormatArray32Header(byte [] buffer, uint value)
    {
        buffer[0] = Array32;
        buffer[1] = unchecked((byte)(value >> 24));
        buffer[2] = unchecked((byte)(value >> 16));
        buffer[3] = unchecked((byte)(value >> 8));
        buffer[4] = unchecked((byte)value);
    }

    static uint ParseArray32Header(byte [] buffer, int readCount)
    {
        if (readCount < 5 || buffer[0] != Array32)
            throw new ArgumentException("Stream was not positioned on an Array32 header.");
        int i = 1;
        //https://mcmap.net/q/334682/-how-to-get-little-endian-data-from-big-endian-in-c-using-bitconverter-toint32-method
        //https://stackoverflow.com/a/8241127 by https://stackoverflow.com/users/23354/marc-gravell
        var value = unchecked((uint)((buffer[i++] << 24) | (buffer[i++] << 16) | (buffer[i++] << 8) | buffer[i++]));
        return value;
    }
}

It can be used to append your list_temp as follows:

// Append each entry sequentially
using (var fileStream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
    MessagePackExtensions.AppendToFile(fileStream, list_temp);
}

And then later, to deserialize the entire file, do:

List<List<struct_realTime>> allItemsInFile;
using (var fileStream = File.OpenRead(filename))
{
    allItemsInFile = MessagePackSerializer.Deserialize<List<List<struct_realTime>>>(fileStream);
}

Notes:

Demo fiddle #3 here.

Halide answered 11/11, 2019 at 21:46 Comment(1)
Thank you for your detailed explanation and code! I've learned a lot from you! Thank you again!Calaboose

© 2022 - 2024 — McMap. All rights reserved.