Efficiently find nearest dictionary key
Asked Answered
H

4

40

I have a bunch of pairs of dates and monetary values in a SortedDictionary<DateTime, decimal>, corresponding to loan balances calculated into the future at contract-defined compounding dates. Is there an efficient way to find a date key that is nearest to a given value? (Specifically, the nearest key less than or equal to the target). The point is to store only the data at the points when the value changed, but efficiently answer the question "what was the balance on x date?" for any date in range.

A similar question was asked ( What .NET dictionary supports a "find nearest key" operation? ) and the answer was "no" at the time, at least from the people who responded, but that was almost 3 years ago.

The question How to find point between two keys in sorted dictionary presents the obvious solution of naively iterating through all keys. I am wondering if any built-in framework function exists to take advantage of the fact that the keys are already indexed and sorted in memory -- or alternatively a built-in Framework collection class that would lend itself better to this kind of query.

Halbeib answered 13/9, 2012 at 18:51 Comment(2)
But the keys of a dictionary are NOT sorted in memory. Which is why you can't directly do a "get nearest" or "get less than or equal to" on a dictionary.Kaffir
The linked question has mention about SortedList<K, V> structure. I would go with that.Perspicacity
W
35

Since SortedDictionary is sorted on the key, you can create a sorted list of keys with

var keys = new List<DateTime>(dictionary.Keys);

and then efficiently perform binary search on it:

var index = keys.BinarySearch(key);

As the documentation says, if index is positive or zero then the key exists; if it is negative, then ~index is the index where key would be found at if it existed. Therefore the index of the "immediately smaller" existing key is ~index - 1. Make sure you handle correctly the edge case where key is smaller than any of the existing keys and ~index - 1 == -1.

Of course the above approach really only makes sense if keys is built up once and then queried repeatedly; since it involves iterating over the whole sequence of keys and doing a binary search on top of that there's no point in trying this if you are only going to search once. In that case even naive iteration would be better.

Update

As digEmAll correctly points out, you could also switch to SortedList<DateTime, decimal> so that the Keys collection implements IList<T> (which SortedDictionary.Keys does not). That interface provides enough functionality to perform a binary search on it manually, so you could take e.g. this code and make it an extension method on IList<T>.

You should also keep in mind that SortedList performs worse than SortedDictionary during construction if the items are not inserted in already-sorted order, although in this particular case it is highly likely that dates are inserted in chronological (sorted) order which would be perfect.

Whitehurst answered 13/9, 2012 at 18:56 Comment(3)
OP could also switch to SortedList<> (to avoid the conversion to List<>) and then use this method --> LINKThud
@digEmAll: That's a very good suggestion, even more so because likely the collection is constructed on a very SortedList-friendly manner. I incorporated it into the answer.Whitehurst
Perfect. You are correct that the lists will be populated in order, but will be queried many times and not in order as different permutations of possible loans and contracts are calculated.Halbeib
C
12

So, this doesn't directly answer your question, because you specifically asked for something built-in to the .NET framework, but facing a similar problem, I found the following solution to work best, and I wanted to post it here for other searchers.

I used the TreeDictionary<K, V> from the C5 Collections (GitHub/NuGet), which is an implementation of a red-black tree.

It has Predecessor/TryPredecessor and WeakPredessor/TryWeakPredecessor methods (as well as similar methods for successors) to easily find the nearest items to a key.

More useful in your case, I think, is the RangeFrom/RangeTo/RangeFromTo methods that allow you to retrieve a range of key-value-pairs between keys.

Note that all of these methods can also be applied to the TreeDictionary<K, V>.Keys collection, which allow you to work with only the keys as well.

It really is a very neat implementation, and something like it deserves to be in the BCL.

Coincide answered 23/7, 2013 at 15:2 Comment(1)
Use => Predecessor , (diff1), (Value) , (diff2), Successor => if diff1 < diff2 then use Predecessor method.Thermistor
S
1

It is not possible to efficiently find the nearest key with SortedList, SortedDictionary or any other "built-in" .NET type, if you need to interleave queries with inserts (unless your data arrives pre-sorted, or the collection is always small).

As I mentioned on the other question you referenced, I created three data structures related to B+ trees that provide find-nearest-key functionality for any sortable data type: BList<T>, BDictionary<K,V> and BMultiMap<K,V>. Each of these data structures provide FindLowerBound() and FindUpperBound() methods that work like C++'s lower_bound and upper_bound.

These are available in the Loyc.Collections NuGet package, and BDictionary typically uses about 44% less memory than SortedDictionary.

Seductive answered 26/12, 2014 at 5:12 Comment(3)
Very nice implementation, Can I extend SparseAList to support qword index size? It would be very good If there was long version of this.Paving
Hmm, it seems all the interfaces are designed for 32-bit - but if you'd like to fork it on GitHub and make a 64-bit version, feel free. Unfortunately I don't think one single version is enough, as 64-bit indices would increase the memory usage by up to 50% and not all clients would want that. A bigger problem is that all ALists are derived from AListBase, which uses AListInner/AListInnerBase, which contains 32-bit indexes. So it seems like 64-bit version would have to eliminate the base class and replicate all its code.Seductive
Note: you could use LeMP to avoid physically replicating the code (thus duplicating any remaining bugs). Instead, you could use using to introduce new names for indexes (using index = int; using uindex = uint) and then use LeMP to do find-and-replace operations on the 32-bit version of the code to generate a 64-bit version. If you create an issue we can talk more about how this approach would work. Note: LeMP and SparseAList are defined in the same repo.Seductive
C
-1
    public static DateTime RoundDown(DateTime dateTime)
    {
        long remainingTicks = dateTime.Ticks % PeriodLength.Ticks;
        return dateTime - new TimeSpan(remainingTicks);
    }
Crumbly answered 17/10, 2015 at 22:11 Comment(2)
You should provide some explanation about what your code is doing. Don't just post code.Carrico
I thought it was self-evident. Used rounded DateTimes as your keys, and use the same method for look-ups. PeriodLength can be a week, a day, an hour, 5 minutes, or whatever regular length period you need. If PeriodLength is not a constant for the app, a more generic solution would accept periodLength as a parameter.Crumbly

© 2022 - 2024 — McMap. All rights reserved.