Binary file format with 1000s of records in C#
Asked Answered
E

6

5

I would like to have an array model objects to be serialized to a binary stream. The model class will mainly have string and integer properties.

I believe that I can mark the class as [Serializable] and use the binary formattter, however I'd be interested to know whether you think this is the best way bearing in mind that my priority is to have as smaller file as possible for transfer over a low bandwidth connection (I can zip/unzip the file too).

The file could have 1000s of records, so ideally I'd like to be able to append to disk and read from disk record by record, without ever having to have the entire file in memory at once.

So my priorities are: small file size and efficient memory use.

Maybe there is a pre-written framework for this? It seems easy to do with XML and CSV files! Hopefully it is with a custom binary format too.

thanks

Earsplitting answered 18/3, 2011 at 15:31 Comment(0)
M
6

I suggest protobuf.net which is very efficient.

Having said that, this will not be able to handle serialising/deserialsing individual objects in your collection. That part you need to implement yourself.

  • One solution is to: Store objects as individual files in a folder. File name will contain a reference so that based on name, you can find the object you need.

  • Another is to have one file but keep an index file which keeps a list of all objects and their positions in the file. This is a lot more complicated as when you are saving an object which is in the middle of the file, you have to move all other addresses, and perhaps a b-tree is more effective.

Maturity answered 18/3, 2011 at 15:36 Comment(1)
Thanks Martinho. I liked FileDB!Maturity
P
2

Another option is to just serialize to a fixed-width text file format and let ZIP handle the compression. Fixed-width means you can easily use a MemoryMappedFile to walk through each record without needing to load the entire file into memory.

Presbyter answered 18/3, 2011 at 15:43 Comment(0)
N
1

You can use the BinaryFormatter. It's a good solution for wanting a small file, but only you know if it's the best solution for your domain. I don't think you can read one record at a time, though.

The only example code I have at this time is for a DataSet. These extension methods will (de)serialize a custom DataSet, which, if I recall correctly, was the easiest way to have a type that can use the BinaryFormatter.

public static TDataSet LoadBinary<TDataSet>(Stream stream) where TDataSet : DataSet
{
    var formatter = new BinaryFormatter();
    return (TDataSet)formatter.Deserialize(stream);
}

public static void WriteBinary<TDataSet>(this TDataSet dataSet, Stream stream) where TDataSet : DataSet
{
    dataSet.RemotingFormat = SerializationFormat.Binary;
    var formatter = new BinaryFormatter();
    formatter.Serialize(stream, dataSet);
}

You might also take a look at the DataContractSerializer, which is .NET's new 'standard' way of dealing with serialization (according to C# 4.0 In A Nutshell, Albahari & Albahari). In that case, you'll also want to read Best Practices: Data Contract Versioning. Below are examples of how to (de)serialize in XML and JSON, even though they wouldn't be directly applicable to your situation (since you wanted small files). But you could compress the files.

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/>.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <returns>
/// Formatted XML representing this instance. Does not include the XML declaration.
/// </returns>
public static string ToXml<TSerializable>(this TSerializable value)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    var output = new StringWriter();
    using (var writer = new XmlTextWriter(output) { Formatting = Formatting.Indented })
    {
        serializer.WriteObject(writer, value);
    }
    return output.GetStringBuilder().ToString();
}

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/> and writes it to the specified file.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <param name="filePath">Path of the file to write to.</param>
public static void WriteXml<TSerializable>(this TSerializable value, string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    using (var writer = XmlWriter.Create(filePath, new XmlWriterSettings { Indent = true }))
    {
        serializer.WriteObject(writer, value);
    }
}

/// <summary>
/// Creates from an instance of the specified class from XML.
/// </summary>
/// <typeparam name="TSerializable">The type of the serializable object.</typeparam>
/// <param name="xml">The XML representation of the instance.</param>
/// <returns>An instance created from the XML input.</returns>
public static TSerializable CreateFromXml<TSerializable>(string xml)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var stringReader = new StringReader(xml))
    using (var reader = XmlReader.Create(stringReader))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

/// <summary>
/// Creates from an instance of the specified class from the specified XML file.
/// </summary>
/// <param name="filePath">
/// Path to the XML file.
/// </param>
/// <typeparam name="TSerializable">
/// The type of the serializable object.
/// </typeparam>
/// <returns>
/// An instance created from the XML input.
/// </returns>
public static TSerializable CreateFromXmlFile<TSerializable>(string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var reader = XmlReader.Create(filePath))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

public static T LoadJson<T>(Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    object readObject = serializer.ReadObject(stream);
    return (T)readObject;
}

public static void WriteJson<T>(this T value, Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    serializer.WriteObject(stream, value);
}
Nuclear answered 18/3, 2011 at 15:34 Comment(0)
F
1

I would recommend using Sql Server Compact to store your objects as objects without serializing, it's quite lightweight and extremely fast, I used it under high payload in serving a lot of requests on server.

I also don't recommend to store your data in binary format (serialized) because it would be a terrific pain when it comes to change the objects you are going to store. It's also painful if you have to see what you are storing, because you have to deserialize the whole collection.

As for sending I prefer using XML-serialization with zip-compression if necessary. XML format makes debugging much easier if you need to take a look at what you are sending or make some tests.

Fissiparous answered 18/3, 2011 at 15:38 Comment(0)
E
0

If you want it to be small do it yourself. Make sure to only store the data you need. For example, If you only have 255 different values use a byte.

http://msdn.microsoft.com/en-us/library/system.bitconverter.aspx

I almost always use a simple structure like this to store the data

id (ushort)

data_size (uint)

data of size data_size

Store only the info you have to have and don't think about how it is going to get used. When you load it then you consider how you want to use the data.

Evildoer answered 18/3, 2011 at 15:39 Comment(0)
D
0

I'd be tempted to stick with BinaryFormatter for the objects themselves, or perhaps protobuf.net as suggested elsewhere.

If the random access aspect of this is very important (reading and appending record by record) you might want to look at creating a zip file (or similar) containing an index file and each object serialized to its own file in the zip (or perhaps in small collections).

This way, you can effectively have a mini file system which is compressed and gives you access to your records individually.

Driskill answered 18/3, 2011 at 15:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.