Can LINQ use binary search when the collection is ordered?
Asked Answered
A

5

26

Can I somehow "instruct" LINQ to use binary search when the collection that I'm trying to search is ordered. I'm using an ObservableCollection<T>, populated with ordered data, and I'm trying to use Enumerable.First(<Predicate>). In my predicate, I'm filtering by the value of the field my collection's sorted by.

Artimas answered 19/11, 2009 at 20:35 Comment(1)
Hi, I just found and fixed a bug in my implementation, in case you're using it...Sunless
S
36

As far as I know, it's not possible with the built-in methods. However it would be relatively easy to write an extension method that would allow you to write something like that :

var item = myCollection.BinarySearch(i => i.Id, 42);

(assuming, of course, that you collection implements IList ; there's no way to perform a binary search if you can't access the items randomly)

Here's a sample implementation :

public static T BinarySearch<T, TKey>(this IList<T> list, Func<T, TKey> keySelector, TKey key)
        where TKey : IComparable<TKey>
{
    if (list.Count == 0)
        throw new InvalidOperationException("Item not found");

    int min = 0;
    int max = list.Count;
    while (min < max)
    {
        int mid = min + ((max - min) / 2);
        T midItem = list[mid];
        TKey midKey = keySelector(midItem);
        int comp = midKey.CompareTo(key);
        if (comp < 0)
        {
            min = mid + 1;
        }
        else if (comp > 0)
        {
            max = mid - 1;
        }
        else
        {
            return midItem;
        }
    }
    if (min == max &&
        min < list.Count &&
        keySelector(list[min]).CompareTo(key) == 0)
    {
        return list[min];
    }
    throw new InvalidOperationException("Item not found");
}

(not tested... a few adjustments might be necessary) Now tested and fixed ;)

The fact that it throws an InvalidOperationException may seem strange, but that's what Enumerable.First does when there's no matching item.

Sunless answered 19/11, 2009 at 20:42 Comment(19)
Sweet! Maybe this should be added to ExtensionOverflow: stackoverflow.com/questions/271398Machine
didn't now about that, looks interesting... thanks for the link ;)Sunless
Thank you for fixing the bug, you should probably also fix your ExtensionOverflow answer.Artimas
A better solution would be to return the index of the item (and -1 for not-present) - avoiding the need to trap exceptions.Biostatics
Just a note the index variable is not used in this code. Thanks for the solution.Tressa
@Thomas, if you don't mind explaining, in what circumstances would the following block of code be hit ?- (min == max && keySelector(list[min]).CompareTo(key) == 0) { return list[min]; }Tressa
@AdrianRussell, I'm not sure... I wrote this 2 years ago, so I don't remember exactly why I did it this way ;)Sunless
This is awesome--I just used this in a project! However, max should be initialized to list.Count - 1 or you'll get an exception when the value sought is higher than the largest item in the list.Raman
Just a question, does the IList has to be sorted by what ever the key is for this to work?Franklin
@StefanVasiljevic, yes, binary search relies on the list being sorted.Sunless
Sorry to bump this old answer, but shouldn't your mid calculation be: int mid = min + ((max - min) / 2);. Check this out: googleresearch.blogspot.ae/2006/06/…Serrato
@DimitarDimitrov, it's the same thing: min + ((max - min) / 2) is the same as (2 * min + (max - min)) / 2, which reduces to (min + max) / 2Sunless
@ThomasLevesque deleted my comment by mistake ... Anyway, did you check my link? It's a well known bug, in your case it can overflow.Serrato
@DimitarDimitrov, oh, I see... Although mathematically correct, it can cause an overflow for really large collection. I'll fix it, thanks ;)Sunless
empty list case is not handled, also this function can be extended to return list of matching objects.Angelesangelfish
There's a bug in the implementation! When the list only contains 1 element that is not the one being searched - you'll get an array index error.Ailurophobe
Replace "if (min == max &&" with if (min == max && max < list.count &&"Ailurophobe
It would be better if it extends an IOrderedEnumerable.Arborescent
@CrouchingKitten why? It has to be a list, so that items can be accessed by index, due to how binary search works.Sunless
C
8

The accepted answer is very good.

However, I need that the BinarySearch returns the index of the first item that is larger, as the List<T>.BinarySearch() does.

So I watched its implementation by using ILSpy, then I modified it to have a selector parameter. I hope it will be as useful to someone as it is for me:

public static class ListExtensions
{
    public static int BinarySearch<T, U>(this IList<T> tf, U target, Func<T, U> selector)
    {
        var lo = 0;
        var hi = (int)tf.Count - 1;
        var comp = Comparer<U>.Default;

        while (lo <= hi)
        {
            var median = lo + (hi - lo >> 1);
            var num = comp.Compare(selector(tf[median]), target);
            if (num == 0)
                return median;
            if (num < 0)
                lo = median + 1;
            else
                hi = median - 1;
        }

        return ~lo;
    }
}
Chromosphere answered 15/4, 2014 at 18:55 Comment(5)
That's not median, but mean.Arborescent
@CrouchingKitten: The center index is the index of the median.Paymar
@BenVoigt ah yes, I didn't think about that.Arborescent
Excellent implementation which informs you of all the "between spots".Nonu
@GabeHalsmer I confessed in my post that I stole it from the actual MS BinarySearch implementation. So it is not surprising it is excellent 😊Chromosphere
L
2

Well, you can write your own extension method over ObservableCollection<T> - but then that will be used for any ObservableCollection<T> where your extension method is available, without knowing whether it's sorted or not.

You'd also have to indicate in the predicate what you wanted to find - which would be better done with an expression tree... but that would be a pain to parse. Basically, the signature of First isn't really suitable for a binary search.

I suggest you don't try to overload the existing signatures, but write a new one, e.g.

public static TElement BinarySearch<TElement, TKey>
    (this IList<TElement> collection, Func<TElement, TItem> keySelector,
     TKey key)

(I'm not going to implement it right now, but I can do so later if you want.)

By providing a function, you can search by the property the collection is sorted by, rather than by the items themselves.

Legwork answered 19/11, 2009 at 20:42 Comment(0)
B
1

Enumerable.First(predicate) works on an IEnumarable<T> which only supports enumeration, therefore it does not have random access to the items within.

Also, your predicate contains arbitrary code that eventually results in true or false, and so cannot indicate whether the tested item was too low or too high. This information would be needed in order to do a binary search.

Enumerable.First(predicate) can only test each item in order as it walks through the enumeration.

Banc answered 19/11, 2009 at 20:42 Comment(0)
S
1

Keep in mind that all(? at least most) of the extension methods used by LINQ are implemented on IQueryable<T>orIEnumerable<T> or IOrderedEnumerable<T> or IOrderedQueryable<T>.

None of these interfaces supports random access, and therefore none of them can be used for a binary search. One of the benefits of something like LINQ is that you can work with large datasets without having to return the entire dataset from the database. Obviously you can't binary search something if you don't even have all of the data yet.

But as others have said, there is no reason at all you can't write this extension method for IList<T> or other collection types that support random access.

Shalna answered 19/11, 2009 at 21:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.