I'm working a system that has lists and dictionaries with over five million items where each item is typically a flat dto with up to 90 primitive properties. The collections are persisted to disk using protobuf-net for resilience and subsequence processing.
Unsurprisingly, we're hitting the LOH during processing and serialization.
We can avoid the LOH during processing by using ConcurrentBag etc but we still hit the problem when serializing.
Currently, the items in a collection are batched into groups of 1000 and serialized to memory streams in parallel. Each byte array is placed in a concurrent queue to be later written to a file stream.
While I understand what this is trying to do, it seems overly complicated. It feels like there should be something within protobuf itself that deals with huge collections without using the LOH.
I'm hoping I've made a schoolboy error - that there are some settings I've overlooked. Otherwise, I'll be looking to write a custom binary reader/writer.
I should point out we're using 4.0, looking to move to 4.5 soon but realise we won't overcome this issue despite the GC improvements.
Any help appreciated.