What .NET dictionary supports a "find nearest key" operation?
Asked Answered
C

8

12

I'm converting some C++ code to C# and it calls std::map::lower_bound(k) to find an entry in the map whose key is equal to or greater than k. However, I don't see any way to do the same thing with .NET's SortedDictionary. I suspect I could implement a workaround using SortedList, but unfortunately SortedList is too slow (O(n) for inserting and deleting keys). What can I do?

Note: I found a workaround using that takes advantage of my particular scenario... Specifically, my keys are a dense population of integers starting at just over 0, so I used a List<TValue> as my dictionary with the list index serving as the key, and searching for a key equal or greater than k can be done in only a few loop iterations. But it would still be nice to see the original question answered.

Conatus answered 6/11, 2009 at 22:29 Comment(1)
Same question, but with no restriction on SortedList<K, V>.Strategic
C
1

I created several data structures that provide this functionality for any data type: BList<T> (a sorted list), BDictionary<K,V> (a dictionary whose items are sorted by key), and BMultiMap<K,V> (a dictionary in which more than one value can be associated with a key). See this article for details. Each of these data structures provide FindLowerBound() and FindUpperBound() methods that work like C++'s lower_bound and upper_bound. Internally, these collections are similar to B+ trees, so they have good performance and low memory usage; BDictionary<,> typically uses about 44% less memory than a standard SortedDictionary<,> (which in turn uses, on average, slightly less memory than Dictionary<,>), assuming 64-bit keys and 64-bit values.

I also made a "sparse" collection, SparseAList<T>, which is similar to BDictionary<int,T> except that you can insert and remove "empty space" anywhere in the collection (empty space does not consume any memory). See this article for details.

All of these collections are in the Loyc.Collections NuGet package.

Conatus answered 21/5, 2014 at 22:26 Comment(0)
C
2

I have developed several collection classes that support "find next higher key" and "find next lower key" operations.

First I made a set of Compact Patricia Trie collections. These are sets/dictionaries designed to minimize memory usage; they are especially efficient for large collections of URLs and certain other kinds of data. They only partly solve the problem, because only certain kinds of keys are supported, namely byte[], string, and all primitive integer types (Int8..UInt64). Also, string sorting is case-sensitive. NuGet package: Loyc.Utilities

After publishing this answer, I made more sorted data structures that solve this problem generally: BList<T>, BDictionary<K,V>, BMultiMap<K,V> and SparseAList<T>. See my second answer for details.

Conatus answered 26/2, 2010 at 19:50 Comment(1)
I agree. There are other better solutions out there, e.g. the TreeDictionary<K,V> in the C5 Collections which is a red-black tree implementation, and has WeakSuccessor/TryWeakSuccessor methods already.Terrie
F
1

The problem is that a dictionary/hash table is designed to arrive at a unique memory location based on an input value, so you'll need a data structure that is designed to accommodate a range related to each value you store, and at the same time update each interval correctly

I think skip lists (or balanced binary trees) can help you. Although they cannot perform lookups in O(n), they can do logarithmically and still faster than trees.

I know this is not a proper answer since I cannot say that the .NET BCL already contains such a class, you'll unfortunately have to implement one yourself, or find a 3rd party assembly that supports it for you. There seems to be a nice example over at The CodeProject here, though.

Flaviaflavian answered 6/11, 2009 at 23:55 Comment(1)
SortedDictionary seems to be implemented with a red-black tree; too bad not all its capabilities are made public.Conatus
W
1

You can try the code i wrote below. it using binary search, therefore assuming the list/array is pre-sorted.

public static class ListExtensions
{
    public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
    {
        return GetAtMostIndex(list, value, comparer, 0, list.Count);
    }

    public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer)
    {
        return GetAtLeastIndex(list, value, comparer, 0, list.Count);
    }

    public static int GetAtMostIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
    {
        if (count == 0)
        {
            return -1;
        }

        int startIndex = index;
        int endIndex = index + count - 1;
        int middleIndex = 0;
        int compareResult = -1;

        while (startIndex < endIndex)
        {
            middleIndex = (startIndex + endIndex) >> 1; //  / 2
            compareResult = comparer.Invoke(list[middleIndex], value);

            if (compareResult > 0)
            {
                endIndex = middleIndex - 1;
            }
            else if (compareResult < 0)
            {
                startIndex = middleIndex + 1;
            }
            else
            {
                return middleIndex;
            }
        }

        if (startIndex == endIndex)
        {
            compareResult = comparer.Invoke(list[startIndex], value);

            if (compareResult <= 0)
            {
                return startIndex;
            }
            else
            {
                int returnIndex = startIndex - 1;

                if (returnIndex < index)
                {
                    return -1;
                }
                else
                {
                    return returnIndex;
                }
            }
        }
        else
        {
            //todo: test
            return startIndex - 1;
        }
    }

    public static int GetAtLeastIndex<TItem, TValue>(/*this*/ IList<TItem> list, TValue value, Func<TItem, TValue, int> comparer, int index, int count)
    {
        if (count == 0)
        {
            return -1;
        }

        int startIndex = index;
        int endIndex = index + count - 1;
        int middleIndex = 0;
        int compareResult = -1;

        while (startIndex < endIndex)
        {
            middleIndex = (startIndex + endIndex) >> 1; //  / 2
            compareResult = comparer.Invoke(list[middleIndex], value);

            if (compareResult > 0)
            {
                endIndex = middleIndex - 1;
            }
            else if (compareResult < 0)
            {
                startIndex = middleIndex + 1;
            }
            else
            {
                return middleIndex;
            }
        }

        if (startIndex == endIndex)
        {
            compareResult = comparer.Invoke(list[startIndex], value);

            if (compareResult >= 0)
            {
                return startIndex;
            }
            else
            {
                int returnIndex = startIndex + 1;

                if (returnIndex >= index + count)
                {
                    return -1;
                }
                else
                {
                    return returnIndex;
                }
            }
        }
        else
        {
            return endIndex + 1;
        }
    }
}
Wyandotte answered 7/11, 2009 at 17:17 Comment(1)
Thanks for contributing this binary search algorithm but it would not have solved my problem, since it requires a sorted array. In my scenario (sorry for not being clear in the question), key inserts are interleaved with key queries. Maintaining the sort order of an array on EVERY insert (so that binary searches are possible) requires O(N) time. Thus, an array sorted by key would not have good performance. Now, if the array could be built in advance and then be followed by a series of queries, sorting would only have to be done once, which would be efficient. But that wasn't an option for me.Conatus
C
1

I created several data structures that provide this functionality for any data type: BList<T> (a sorted list), BDictionary<K,V> (a dictionary whose items are sorted by key), and BMultiMap<K,V> (a dictionary in which more than one value can be associated with a key). See this article for details. Each of these data structures provide FindLowerBound() and FindUpperBound() methods that work like C++'s lower_bound and upper_bound. Internally, these collections are similar to B+ trees, so they have good performance and low memory usage; BDictionary<,> typically uses about 44% less memory than a standard SortedDictionary<,> (which in turn uses, on average, slightly less memory than Dictionary<,>), assuming 64-bit keys and 64-bit values.

I also made a "sparse" collection, SparseAList<T>, which is similar to BDictionary<int,T> except that you can insert and remove "empty space" anywhere in the collection (empty space does not consume any memory). See this article for details.

All of these collections are in the Loyc.Collections NuGet package.

Conatus answered 21/5, 2014 at 22:26 Comment(0)
L
0

find nearest to K:

dict.Keys.Where(i => i >= K).OrderBy(i => i).First();

or much faster:

public int? GetNearestKey(dict, K) 
{
    int? lowerK = null;
    foreach (int key in dict.Keys)
    {
        if (key == K) 
        {
            lowerK = K;
            break; 
        }
        else if (key >= K && (!lowerK.HasValue || key < lowerK))
        {
            lowerK = key;
        }
    }
    return lowerK;
}
Lambrequin answered 6/11, 2009 at 22:42 Comment(2)
er... now it's up from O(n) to O(n log n).Conatus
I need to do it in O(log n). Theoretically the SortedDictionary is capable of doing this, but I don't see an API for it.Conatus
I
0

There isn't a binary search tree collection implementation in the base framework, so you'll either have to build one or find an implementation. As you noted, SortedList is closest in terms of searching but is slower (due to its underlying array implementation) for insertion/deletion.

Ironbark answered 7/11, 2009 at 2:6 Comment(1)
SortedDictionary IS a binary search tree. Its public API just leaves out the searching functionality.Conatus
T
0

I think there's a mistake in the question about SortedList complexity.

SortedList has O(log(n)) amortized complexity for inserting new item. If you know in advance the capacity it can be done in O(Log(n)) in the worst case.

Tacy answered 7/11, 2009 at 15:47 Comment(3)
Microsoft foolishly doesn't state the big-O complexity in the documentation ( msdn.microsoft.com/en-us/library/… ) but it seems to imply that SortedList stores the keys and values in arrays. Sorted arrays have O(N) insert complexity if the keys being inserted are random.Conatus
It does, in msdn.microsoft.com/en-us/library/… it says: "This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n)."Tacy
Unsorted data is normal. If your data is already sorted, you don't need SortedList in the first place (List<T> has a BinarySearch method.) So your claim that "SortedList has O(log(n)) amortized complexity" is just plain wrong.Conatus
B
0

You can do this for SortedSet<T> with following extension methods:

public static class SortedSetExtensions
{
    public static bool FindLowerOrEqualThan<T>(this SortedSet<T> set, T value, out T first)
    {
        if(set.Count == 0)
        {
            first = default(T);
            return false;
        }

        var minimum = set.Min;

        if(set.Comparer.Compare(minimum, value) > 0)
        {
            first = default(T);
            return false;
        }

        first = set.GetViewBetween(minimum, value).Max;
        return true;
    }

    public static bool FindGreaterOrEqualThan<T>(this SortedSet<T> set, T value, out T first)
    {
        if (set.Count == 0)
        {
            first = default(T);
            return false;
        }

        var maximum = set.Max;

        if (set.Comparer.Compare(maximum, value) < 0)
        {
            first = default(T);
            return false;
        }

        first = set.GetViewBetween(value, maximum).Min;
        return true;
    }
}
Bicuspid answered 3/3, 2016 at 8:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.