Why is there a List<T>.BinarySearch(...)?
Asked Answered
F

9

14

I'm looking at List and I see a BinarySearch method with a few overloads, and I can't help wondering if it makes sense at all to have a method like that in List?

Why would I want to do a binary search unless the list was sorted? And if the list wasn't sorted, calling the method would just be a waste of CPU time. What's the point of having that method on List?

Fisk answered 15/7, 2010 at 13:40 Comment(7)
Actually that is a good question. But is it possible for any type system (not just C#/.NET) to allow a method to work on only for "sorted" items? How would you know if items are sorted on a type system?Substantialize
Why would I want to do a binary search unless the list was sorted? -- You've answered your own question ... you would want to do a binary search if your list was sorted. if the list wasn't sorted, calling the method would just be a waste of CPU time -- so don't call it if your list isn't sorted. Just because a method exists, that doesn't compel you to use it, especially when the prerequisites aren't met.Slipperwort
@SedatKapanoglu Sure ... write a MaybeSortedList<T> class that maintains a flag saying whether the collection is sorted. If the flag is set, Add does binary search to find the insertion point, Find uses binary search to find items, and Remove uses binary search to remove items. If the flag isn't set, add at the end and fall back to the linear search for Find and Remove. An InsertAt operation when the flag is set checks whether the item is being inserted in order and clears the flag if not. But even with such a class, an "unsafe" independent binary search method on Lists and arrays is useful.Slipperwort
@JimBalter I think your idea would work despite being slow for insertion operations. I also thought of inventing a new way of using Sort() which would return a new copy of list as an immutable ISortedList<T>. The whole copying thing would be an overhead but can be mitigated with some copy-on-modify semantics.Substantialize
@SedatKapanoglu It's only slow for insertion operations if you insert out of order into a previously sorted list ... so don't do that. You should maintain either a sorted or unsorted list, not mix them up, even though the abstraction allows it. And I don't see any need for new ways of using Sort ... just use OrderBy and ThenBy.Slipperwort
@SedatKapanoglu P.S. I only answered your question of how to know whether items are sorted type-safely. I don't actually recommend MaybeSortedList<T> ... instead, have List<T> and SortedList<T> (but not .NET's abominally misnamed SortedList<K,V> that is actually a IDictionary<K,V>) both of which implement IList<T>, and use one or the other.Slipperwort
@JimBalter Damn you're right. OrderBy is exactly what I was trying to get at. I was almost sure that I was inventing something new :) You're also correct on that a SortedList could work with in-place insertions without overhead. I didn't consider that either.Substantialize
K
20

Sorting and searching are two very common operations on lists. It would be unfriendly to limit a developer's options by not offering binary search on a regular list.

Library design requires compromises - the .NET designers chose to offer the binary search function on both arrays and lists in C# because they likely felt (as I do) that these are useful and common operations, and programmers who choose to use them understand their prerequisites (namely that the list is ordered) before calling them.

It's easy enough to sort a List<T> using one of the Sort() overloads. If you feel that you need an invariant that gaurantees sorting, you can always use SortedList<TKey,TValue> or SortedSet<T> instead.

Kirkpatrick answered 15/7, 2010 at 13:48 Comment(1)
I can confirm that 'List<T>.BinarySearch()' doesn't recognize a defined 'IComparable<T>' on the item's type. Even not if I explicitely give an instance of it as argument 2. Thanks for the tip with 'SortedSet<T>'. Used that for the first time, and never heard of it before. It worked most of the time when 'SortedSet<T>.Add()' was called. In some rare cases there seems also to be a bug and it throws an exception that it is missing 'IComparable<T>' on the item's type. Strange. So I gave him this too, just calling the 'Comparable<T>' class where I had a static singleton getter. That works now.Quick
T
27

I note in addition to the other correct answers that binary search is surprisingly hard to write correctly. There are lots of corner cases and some tricky integer arithmetic. Since binary search is obviously a common operation on sorted lists, the BCL team did the world a service by writing the binary search algorithm correctly once rather than encouraging customers to all write their own binary search algorithm; a significant number of those customer-authored algorithms would be wrong.

Tirrell answered 15/7, 2010 at 14:19 Comment(5)
Even great library designers get it wrong: googleresearch.blogspot.com/2006/06/…Ambidexterity
@Ron Warholic: The fact that a particular implementation is limited to lists of about two billion elements would more typically be described as a limitation rather than a bug. I would tend to think that even if one can have a monolithic data structure with more than two billion elements, that doesn't mean one should.Janettjanetta
Just mention that here's one example of what BinarySearch() can be used for: adding new elements into a sorted list such that it remains sorted. devlicio.us/blogs/marcin_hoppe/archive/2007/05/15/…Tetchy
But BinarySearch method is exposed already on Array class as a static method which I find makes more sense. Whereas an instance method on a collection that makes no guarantees sounds unwanted.Reimers
the BCL team did the world a service by writing the binary search algorithm correctly once rather than encouraging customers to all write their own binary search algorithm -- um, no. If you have something other than an array or a List ... say, a Collection<T> ... you're out of luck. Other "teams" do something sensible and provide an external algorithm that operates on any indexable sequence.Slipperwort
K
20

Sorting and searching are two very common operations on lists. It would be unfriendly to limit a developer's options by not offering binary search on a regular list.

Library design requires compromises - the .NET designers chose to offer the binary search function on both arrays and lists in C# because they likely felt (as I do) that these are useful and common operations, and programmers who choose to use them understand their prerequisites (namely that the list is ordered) before calling them.

It's easy enough to sort a List<T> using one of the Sort() overloads. If you feel that you need an invariant that gaurantees sorting, you can always use SortedList<TKey,TValue> or SortedSet<T> instead.

Kirkpatrick answered 15/7, 2010 at 13:48 Comment(1)
I can confirm that 'List<T>.BinarySearch()' doesn't recognize a defined 'IComparable<T>' on the item's type. Even not if I explicitely give an instance of it as argument 2. Thanks for the tip with 'SortedSet<T>'. Used that for the first time, and never heard of it before. It worked most of the time when 'SortedSet<T>.Add()' was called. In some rare cases there seems also to be a bug and it throws an exception that it is missing 'IComparable<T>' on the item's type. Strange. So I gave him this too, just calling the 'Comparable<T>' class where I had a static singleton getter. That works now.Quick
H
9

BinarySearch only makes sense on a List<T> that is sorted, just like IList<T>.Add only makes sense for an IList<T> with IsReadOnly = false. It's messy, but it's just something to deal with: sometimes functionality X depends on criterion Y. The fact that Y isn't always true doesn't make X useless.

Now, in my opinion, it's frustrating that .NET doesn't have general Sort and BinarySearch methods for any IList<T> implementation (e.g., as extension methods). If it did, we could easily sort and search for items within any non-read-only collection providing random access.

Then again, you can always write your own (or copy someone else's).

Herisau answered 15/7, 2010 at 14:8 Comment(0)
P
6

Others have pointed out that BinarySearch is quite useful on a sorted List<T>. It doesn't really belong on List<T>, though, as anyone with C++ STL experience would immediately recognize.

With recent C# language developments, it makes more sense to define the notion of a sorted list (e.g., ISortedList<T> : IList<T>) and define BinarySearch (et. al.) as extension methods of that interface. This is a cleaner, more orthogonal type of design.

I've started doing just that as part of the Nito.Linq library. I expect the first stable release to be in a couple of months.

Pinkston answered 15/7, 2010 at 14:4 Comment(3)
I'm not sure what you're getting at when comparing it to the C++ STL. std::list<> is a linked-list, but List<> is actually implemented as an array and is closer to std::vector<>.Photochemistry
The point is that the C++ STL supported orthogonality. vector did not have a method named binary_search; rather, binary_search could be applied to any random-access iterator, such as those provided by vector. Apply the same concept to the C# BCL: List<T> should not provide BinarySearch; neither should IList<T>. It should be an extension method for any IList<T> (thus achieving orthogonality of the BinarySearch algorithm over any random-access iterator/container). (cont).Pinkston
In my answer (and library), I go a step further than this and define an ISortedList<T>, which is a random-access iterator/container that is known (at compile time) to be sorted. It is natural then to define BinarySearch as an extension method on ISortedList<T>.Pinkston
S
3

yes but List has Sort() method as well so you can call it before BinarySearch.

Shaitan answered 15/7, 2010 at 13:46 Comment(0)
I
3

Searching and sorting are algorithmic primitives. It's helpful for the standard library to have fast reliable implementations. Otherwise, developers waste time reinventing the wheel.

However, in the case of the .NET Framework, it's unfortunate that the specific choices of algorithms happens to make them less useful than they might be. In some cases, their behaviour is not defined:

  1. List<T>.BinarySearch If the List contains more than one element with the same value, the method returns only one of the occurrences, and it might return any one of the occurrences, not necessarily the first one.

  2. List<T> This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

That's a shame, because there are deterministic algorithms that are just as fast, and these would be much more useful as building blocks. It's noteworthy that the binary search algorithms in Python, Ruby and Go all find the first matching element.

Innoxious answered 19/2, 2014 at 15:34 Comment(5)
Finally someone said it. I'm completely with you on the first point. I believe the BinarySearch method on List<T> doesn't make any sense considering it not just doesn't enforce an order but also doesn't prevent duplicates. Every other answer overlooks this aspect. A random index is not very helpful. Instead a BinarySearch method would have been appropriate on sorted sets. But consider two collections in .NET which is sorted as well is a set - SortedSet<> and SortedList<,> - and guess what, both of them doesn't have a BinarySearch method. Bizarre decisions.Reimers
On point 2, I'm in two minds. At times I think it was ok, for the reason: List<T> only guarantees the insertion order to be preserved. Once you sort it, you are willing to compromise positions, in which case it's ok to do whatever they wanted. And yet the other times I wished MS had went for a stable sort, because it is more consistent with the philosophy of preserving the order. Many a times I wanted this feature. In the end I believe while unstable sort is not wrong, stable sort would have been more correct and useful. Stable sort is often desired; hardly the other way around.Reimers
Point 1: This is normal for binary searches. If you need the first one, you can locate it with a short backwards linear search. Point 2: unstable sorts are faster, but C# offers both ... List.Sort<T> does unstable introsort in place, and Enumerable.OrderBy<T> provides a stable copying mergesort.Slipperwort
SortedSet<> and SortedList<,> ...both of them doesn't have a BinarySearch method -- nonsense; SortedList.IndexOfKey does a binary search. All of the operations on keys in these classes do binary search. IndexOf doesn't make semantic sense for SortedSet ... sets aren't lists.Slipperwort
Point 1: This is normal for binary searches. If you need the first one, you can locate it with a short backwards linear search -- ok, that was dumb and I take it back. There should be both BinarySearch and StableBinarySearch (or perhaps call them FastUnstableBinarySearch and BinarySearch; the latter is not "just as fast"; e.g., if you have a large list all of equal items, the former is O(1) and the latter is O(logn)) methods. And they should be static methods that work on any IList, not .NET's braindamaged instance methods.Slipperwort
M
2

I agree it's completely dumb to Call BinarySearch on an unsorted list, but it's perfect if you know your large list is sorted.

I've used it when checking if items from a stream exist in a (more or less) static list of 100,000 items or more.

Binary Searching the list is ORDERS of magnitude faster than doing a list.Find, which is many orders of magnitude faster than a database look up.

I makes sense, and I'm glad it there (not that it would be rocket science to implement it if it wasn't).

Marymarya answered 15/7, 2010 at 13:56 Comment(0)
H
1

Perhaps another point is that an array could be equally as unsorted. So in theory, having a BinarySearch on an array could be invalid too.

However, as with all features of a higher level language, they need to be applied by someone with reason and understanding of the data, or they will fail. Sure, some cross-checks could be applied, and we could have a flag that said "IsSorted" and it would fail on binary search otherwise, but .....

Honkytonk answered 18/8, 2017 at 14:2 Comment(0)
A
0

Some pseudo code:

if List is sorted
   use the BinarySearch method
else if List is not sorted and you think sorting it is "waste of CPU time"
   use a different algorithm that is more suitable and efficient
Aggregation answered 26/9, 2017 at 21:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.