My data structures knowledge is way rusty, and to be honest it was never my strongest point.
Right now we're about to build a queue-like component that has the following requirements:
- Has to be able to queue, dequeue and find a specific item by key.
- each item will be a structure or class with another class as key, having 5 different properties, category-like. Assume something like: MasterCategoryId, ChildCategoryId, TimeId, PriorityId, GroupId.
- it has to be a sorte collection.
- usually the collection will hold anywhere from 5k to 10k objects, but in order to consider the worst case scenario, we're testing our current prototype to hold about a million objects.
- right now it will not be multithreaded.
- about 90 or 95% of the items in the collection (queueing) will happen when the component is created, but the component is being used as a tree, in the sense that we will be dequeueing the last item on the collection, do calculations on it, and then it will report it's result to its parent, which may or may not be in the collection already. If it isn't, the queue method used to try to find the parent will have to insert the item.
- since the component is like a queue that is processed, the collection will be empty after dequeueing everything.
I think that sums it up. so, obviously a single list or ordered list is out of the question, due to the fact that every time we add or remove objects from the collection it gets sorted again, and doing this in a single collection with a million objects is slow.
We tested a couple of approaches in the past, like linked lists, which proved to be fast for queueing, but slow for finding items (since we do have that scenario).
Right now we've come up with a structure like
SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, SortedDictionary<int, ..
You get the idea.
It's kind of a sweet spot of grouping levels, keeping each collection relatively small (around 300 items per dictionary).
so, for the first level, we will have a sorteddictionary, for which the keys are the ids of each master category, and the values will be a sorteddictionary of which the key will be the id of the child category...and so on.
Right now we've tested with 100, 1,000, 10,000, 100,000 and 1,000,000 items.
For the smaller range, up to 100k, the solution is fast. it can queue/dequeue/find in less than a second even for up to 300k, which is really above the 80-90% of the scenarios that we will handle.
When it comes to a million, it does become slower, taking about 3-4 seconds to queue the whole thing, and up to 10 seconds to deplete the queue.
So, my questions are:
- Is there a more suited collection or approach for our specific scenario?
- I had never worked with this amount of items in collections before. Are these timing reasonable for such high numbers, or not? I ask because I've read some tweets of people who do 200k operations a second on stuff like MSMQ or NserviceBus (which i know is not related to this, i am just trying to understand and compare my results).
- The objects I am using right now in the prototype are just mock classes, just the composite object key and a single property. Would my results be affected when I use the real classes, or not? I'm guessing not, since all the framework would do is add the reference to the object, but just want to confirm, since like I said, data structures have never been my strongest knowledge.
- As a bit of a separate topic, If i was to want to prepare for this to be multithreaded, what are some considerations I would have to take?
Thanks.