Large Object Heap friendly IDictionary
Asked Answered
W

2

10

We have an application that holds large numbers of objects in several Dictionarys, some of which grow continually during the lifetime of the app (trading application with lots of instruments and continually growing orders/trades).

We are having problems with OutOfMemoryExceptions due to fragmentation of the large object heap.

To counter this I've tried to write a 'large' dictionary that is implemented as a two level dictionary where all the leaf dictionaries are not large enough to be allocated on the LOH. I've used a consistent hashing algorithm to avoid having to rehash the entire dictionary when a single bucket becomes too large. The consistent hashing 'circle' is a TreeDictionary from the C5 collections library.

My question is, are there any better data structures (or perhaps better implementations of the one I described) for C#?

Update

This is the implementation for the 'large' dictionary: https://gist.github.com/956621

I understand it's not foolproof as neither the LOH heap threshold is in the specification, nor the size of each Dictionary entry or scaling algorithm. However, this is currently the best I can think of to avoid the application blowing up mid-day.

Westley answered 5/5, 2011 at 4:26 Comment(7)
It might help to tell what problems do you see with your existing implementation?Rewrite
If you are using your dictionary as a key-value store for flat objects you might consider a memory mapped file instead.Pushcart
Have you verified that you aren't 'leaking' memory anywhere (as in the held references 'leaking' of memory)Hissing
@zespri, approximately mid-way through a day of trading the OutOfMemoryExceptions start getting thrown.Westley
@Mitch Wheat, we've spent a lot of time going through a memory profiler (specifically redgate ANTS Memory Profiler). We've reduced all the potential leaks to the point at which the actual memory used is relatively constant. The profiler points out that the LOH is particularly fragmented. At the moment, after running for 3 hours, the LOH has 10.37MB used, but has 435.4MB allocated to it, the largest free fragment is 49.83MB. It was much worse before using the 'large' dictionary implementation.Westley
Why don't you use local database like squlite or SQL compact edition? You will be able to index and query your needs easily and database will already manage caching based on situation. And this is the whole reason databases were created, if your app crashes then you loose everything but database will maintain everything. Even apps like Auto Cad uses database to store their complex 3D structures.Ondrej
So basically the consensus is don't use .NET for long running memory intensive applications..?Westley
R
3

A dictionary is an unfortunate data structure when it is the largest one in your application. The hashtable is often doubled in size when it becomes too full and that requires 150% overallocation during the resizing, right at the critical time. The hashtable works superbly enough when it's gigantic but it requires consecutive allocation which stresses heap algorithms.

You can diminish these disadvantages with multi-level hashtables, for example using a byte of the hashcode as an index into 256 hashtables. This adds some overhead for sure but more importantly this and other strategies are filled with peril by fiddling with the randomness such as it of the hashcodes that you get and potentially making things much, much worse performance-wise. Using this approach requires a good theoretical foundation and solid empirical testing. But it can work.

Another strategy is to pre-allocate the biggest data-structure for the worst case and allocate it early. No fine-grained allocation necessary but now you face the spectre of catastrophic failure if it should ever run out. It's an option.

Recalcitrate answered 5/5, 2011 at 5:15 Comment(3)
I think the multi-level hashing algorithm I have is OK performance-wise. We are pre-allocating arrays where possible, but it's not always feasible/desirable to allocate an array straight off for the worst case scenario.Westley
Would it help to pre-allocate the tables by specifying capacity in the Dictionary constructor?Pushcart
@shmith: If you know there will probably be a million entries, it helps to say so right away. But then the real pain starts when you get to a million and one.Recalcitrate
W
1

I think this calls for change of algorithm.

From what I heard and understood, GC is quite good at packaging and defragmenting memory. So your prolem stems from simple fact, that you save too much data into the memory.

How much data do you keep in memory?

Did you thought about using database? compact one might be enough.

Or simply tell your client, that to correctly run your app, he needs 16 GB of memory. And if your app needs all those 16 GB of memory, then there is definitely something wrong.

Edit: Looking at your problem from different side and after reading your edit I got question : How big are your objects? Or do they contain long lists or arrays? How often do you remove/add those objects?

I think problem might not be in the dictionary itself, but objects, that are too big and are removed/added too often. Maybe using some kind of catching or pool might be profitable. And if you use lists, then create those lists with prealocated.

And maybe using imutable structs instead of mutable classes can ease the fragmentation.

Wealthy answered 5/5, 2011 at 4:58 Comment(4)
+1 "How much data do you keep in memory?" Rather than tweak your dictionary, re-evaluate your requirement using your recently gained experience.Porty
The long term goal is to reduce the number of trades/orders we have in memory, but it's not feasible in the short term. Unfortunately, neither is replacing all the trader's 32 bit XP machines with a 64 bit OS.Westley
@Akash, that's not really a very good solution, it's just pushing the memory requirements to a different process. What I'm after is any clever data structures that might be used in the database to avoid this sort of situation. Having said that, I doubt there are any RDBMSs written on the CLR specifically due to this issue...Westley
-1 for missing a key point in OP's question, and containing fundamental misinformation re GC and defragmenting. As OP mentioned, "fragmentation of the large object heap". Prior to .Net 4.5.1, there was NO way to relocate large objects -- and Dictionary is one of the worst offendors in creating large objects, as it grows, unless max size is known beforehand. The problem is indeed in the dictionary. I have encountered this exact problem quite a few times over the years (though until today, was always able to resolve it without switching to a custom implementation of IDictionary).Popup

© 2022 - 2024 — McMap. All rights reserved.