Is there a way to compress an object in memory and use it transparently?
Asked Answered
R

4

11

I'm currently building an analysis application that handles large amounts of data. A typical case would looks like this: the user selects a folder with about 600 measurement files that each contain about 40.000 to 100.000 values. The application reads these values into an object that internally works as a data cache, so that the files must not be read on every access.

This works very well, but I noticed that the memory consumption is very high and it may eventually get too big. During my tests the application crashed when its memory consumption exceeded 2GB of RAM.

The data structure that holds the data is as simple as possible, it basically only consists of some dictionaries that contain the data in a 2-level nested way, nothing complex. I was wondering if there is a convenient way of storing this object in a compressed form in RAM. I know that this would bring down performance, but that is totally acceptable in my case.

Is there a way to do something like that allows me to use my objects as usual? Or do I have to implement compression on my own within my object?

Thanks for your thoughts and recommendations!

Ruffle answered 14/5, 2014 at 10:10 Comment(11)
Compile into 64-bit app to avoid the 2G limit. Most computers have 4Gb+ these days, no?Foochow
There is also a single-object 2Gb limit: #1088482Isoleucine
@Isoleucine That limited is per object, not per process. It can easily be avoided by using a custom type instead of array/List<T>Unconcerned
@nicodemus: The limit can be lifted on 64 bit .NET: msdn.microsoft.com/en-us/library/hh285054%28v=vs.110%29.aspxFineberg
You can avoid that limitation with .NET 4.5 and a x64 bit process setting gcAllowVeryLargeObjects in your app.configMoises
You could implement 7-zip compression on the fly , although I really doubt it'll be faster than just reading data from the disk. If you're not using them all at once, you could cache temporarily measurement files with some kind of queue.Reeba
How big is the raw data from the files? If disk IO really is your bottleneck, and the total file size is acceptable, simply caching the files in memory might be an option.Bullyboy
Thanks for your comments and interesting links on that topic - but the question originally was intended to learn about possibilities to reduce the memory consumption by compressing my objects in the first place, not to enable the application to use more RAM.Ruffle
Can the data be compressed? If you have a million different numbers, compression is hard. If you have a million zeroes and two real values, compression seems easy.Unreel
@nvoigt: Good point. I just zipped the folder with standard settings and the result is 1/9th of the original folder size, so my data seems to be quite "zippophilic" ;)Ruffle
Solve your data management problems with a database. Use the right tool for the job.Elliottellipse
M
20

It really depends on the type of that you're working with. One possibility is to compress your objects, keeping them as a compressed byte[] instead of raw object format using an Extension Method.

You could combine that along with making your process work x64 bit:

public static byte[] SerializeAndCompress(this object obj) 
{
    using (MemoryStream ms = new MemoryStream()) 
    using (GZipStream zs = new GZipStream(ms, CompressionMode.Compress, true))
    {
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(zs, obj);
        return ms.ToArray();
    }
}

public static T DecompressAndDeserialize<T>(this byte[] data)
{
    using (MemoryStream ms = new MemoryStream(data)) 
    using (GZipStream zs = new GZipStream(ms, CompressionMode.Decompress, true))
    {
        BinaryFormatter bf = new BinaryFormatter();
        return (T)bf.Deserialize(zs);
    }
}
Moises answered 14/5, 2014 at 10:25 Comment(1)
Yes, it did - sorry for the 2 year lag :-)Ruffle
A
4

Code of "Yuval Itzchakov" has an error!; he execute ms.ToArray(); before close the compressor. This will cause error in DecompressAndDeserialize method.

this code will work:

public static byte[] SerializeAndCompress(this object obj)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream zs = new GZipStream(ms, CompressionMode.Compress, true))
        {
            BinaryFormatter bf = new BinaryFormatter();
            bf.Serialize(zs, obj);
        }
        return ms.ToArray();
    }
}
Agnusago answered 14/3, 2020 at 11:57 Comment(0)
E
3

For me it only worked when I changed it that way. With the above example I go an "serializationexception is not marked as serializable". Don't forget to add "[Serializable()]" to your class.

public static byte[] SerializeAndCompress(this object obj) 
{
    var ms = new MemoryStream();
    using (GZipStream zs = new GZipStream(ms, CompressionMode.Compress, true))
    {
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(zs, obj);
    }
    return ms.ToArray();
}

public static T DecompressAndDeserialize<T>(this byte[] data)
{
    using (MemoryStream ms = new MemoryStream(data)) 
    using (GZipStream zs = new GZipStream(ms, CompressionMode.Decompress, true))
    {
        BinaryFormatter bf = new BinaryFormatter();
        return (T)bf.Deserialize(zs);
    }
}
Edouard answered 8/9, 2019 at 8:51 Comment(0)
T
0

A worthwhile update I think: Since the original answers to this question were posted, the BinaryFormatter type has been declared dangerous and use is strongly discouraged (https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide). A migration guide has been created to move away from it (https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-migration-guide/).

I found that, in creating a solution which matches the original question posted here, namely to serialise and compress nested Dictionaries, while also maintaining security and recommended practice, the MessagePack library (https://github.com/MessagePack-CSharp/MessagePack-CSharp) was perfect. Without duplicating the MessagePack documentation, I found that a simple usage was sufficient to meet my requirements.

using MessagePack;

//Simple options to include compression and security
var lz4Options = MessagePackSerializerOptions.Standard.WithCompression(MessagePackCompression.Lz4BlockArray).WithSecurity(MessagePackSecurity.UntrustedData);

//To compress
byte[] compressedDictionary = MessagePackSerializer.Serialize(myDictionary, lz4Options);

//To decompress
Dictionary<int, Dictionary<string, string>> mySecondDictionary = MessagePackSerializer.Deserialize<Dictionary<int, Dictionary<string, string>>>(compressedDictionary, lz4Options);

I hope this proves helpful.

Tadtada answered 20/8 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.