Where can I get a "useful" C++ binary search algorithm?
Asked Answered
V

9

122

I need a binary search algorithm that is compatible with the C++ STL containers, something like std::binary_search in the standard library's <algorithm> header, but I need it to return the iterator that points at the result, not a simple boolean telling me if the element exists.

(On a side note, what the hell was the standard committee thinking when they defined the API for binary_search?!)

My main concern here is that I need the speed of a binary search, so although I can find the data with other algorithms, as mentioned below, I want to take advantage of the fact that my data is sorted to get the benefits of a binary search, not a linear search.

so far lower_bound and upper_bound fail if the datum is missing:

//lousy pseudo code
vector(1,2,3,4,6,7,8,9,0) //notice no 5
iter = lower_bound_or_upper_bound(start,end,5)
iter != 5 && iter !=end //not returning end as usual, instead it'll return 4 or 6

Note: I'm also fine using an algorithm that doesn't belong to the std namespace as long as its compatible with containers. Like, say, boost::binary_search.

Vish answered 15/1, 2009 at 10:34 Comment(9)
Regarding the edit: that's why std::equal_range is the solution. Otherwise, you'll have to test for equality (or equivalence to be more)Ashmore
You have to test for equality after using (lower/upper)_bound (see answer below).Evalynevan
lower_bound and upper_bound documentation state that the range must be sorted, and because of this they can be implemented as binary search.Sash
@vividos, hurray! you found just the piece of documentation I needed to know about! Thanks!Vish
Robert, the lower/upper_bound/equal_range algorithms do not work with unsorted ranges. You're just lucky to see them working with the elements sample you took.Ashmore
@Luc Hermitte, yes you are totally right, thanks again!Vish
my first look usually goes to sgi.com/tech/stl . It documents the STL before C++ included it into Standard C++ Library, but most of the documentation continues to be valid.Sash
I also get often confused by std::lower_bound, but I believe that is the way to go for binary search, see here. You just need to check for end and then for equality with your needle, instead of just checking for the end. It is only a small constant cost and lower_bound should be as fast as binary_search, unless your data contains a lot of repetition, in which case binary_search may find the needle slightly faster if it is present (if it is not, then it needs to find lower/upper bound anyway).Hysterectomize
Me: STL, please search my sorted container for this value. STL: Yes, its there. Me: sigh, okay where is it? STL: I can tell you where there's a value that's equal or greater. Me: sigh so I have to check myself if it's really the found result, or something greater?Desireah
E
106

There is no such functions, but you can write a simple one using std::lower_bound, std::upper_bound or std::equal_range.

A simple implementation could be

template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
    // Finds the lower bound in at most log(last - first) + 1 comparisons
    Iter i = std::lower_bound(begin, end, val);

    if (i != end && !(val < *i))
        return i; // found
    else
        return end; // not found
}

Another solution would be to use a std::set, which guarantees the ordering of the elements and provides a method iterator find(T key) that returns an iterator to the given item. However, your requirements might not be compatible with the use of a set (for example if you need to store the same element multiple times).

Evalynevan answered 15/1, 2009 at 10:45 Comment(14)
yes this works, and I have a similar implementation right now, however it's a "naive" implementation, in the sense that it's not making use of the situation's context, in this case sorted data.Vish
I don't really understand your comment, since lower_bound can only be used on sorted data. Complexity is lower than using find (see edit).Evalynevan
sorry I forgot they had to be sorted, so your implementation is correct, and it is taking advantage of the sorted data!Vish
To complement Luc's answer, check Matt Austern's classic article Why You Shouldn't Use set, and What You Should Use Instead (C++ Report 12:4, April 2000) to understand why binary search with sorted vectors is usually preferable to std::set, which is a tree-based associative container.Effeminate
Don't use *i == val! Rather use !(val < *i). The reason is that lower_bound uses <, not == (i.e. T is not even required to be equality-comparable). (See Scott Meyers' Effective STL for an explanation of the difference between equality and equivalence.)Malena
@gx if i want a comparison object, it would be !comp(val,*i), right?Athenaathenaeum
In case the val you are searching for is located at the index end, although the function returns the correct value, the comment "not found" becomes incorrect.Sergeant
@CanKavaklıoğlu There is no element located at end. Ranges in the C++ standard library are represented with half-open intervals: the end iterator "points" after the last element. As such, it can be returned by algorithms to indicate that no value was found.Evalynevan
I would make the last param T& val or const T& val. Is in line with the std call, and it saves a copy of course.Complement
@Complement You could even make the last param a boost::call_traits<T>::param_type to avoid the reference for small built-in types.Evalynevan
@LucTouraille Will not it return the location which is not existing if not found. If we have 5 elements, return end will give 5th location, which is not existingAtbara
@AaghazHussain This is how the standard library algorithms denotes the fact that no element was found, by returning an iterator to the end of the range. To know if the element was found by the binary_find function, you must compare the result to end: auto it = binary_find(v.begin(), v.end(), value); if (it != v.end()) { // value found, accessible at *it }Evalynevan
Just a comment, by using this function, you will need to have one more compare after lower_bound. If compare is expensive, probably better idea is to do your own binary search?Ejector
This should so be in the standard library!Jukebox
A
10

You should have a look at std::equal_range. It will return a pair of iterators to the range of all results.

Ashmore answered 15/1, 2009 at 10:39 Comment(2)
According to cplusplus.com/reference/algorithm/equal_range the cost of std::equal_range is approximately twice as high as std::lower_bound. It appears that it wraps a call to std::lower_bound and a call to std::upper_bound. If you know your data doesn't have duplicates then that is overkill and std::lower_bound (as demonstrated in the top answer) is the best choice.Mccray
@BruceDawson: cplusplus.com only gives a reference implementation to specify the behavior; for an actual implementation you can check your favorite standard library. For example, in llvm.org/svn/llvm-project/libcxx/trunk/include/algorithm we can see that the calls to lower_bound and upper_bound are made on disjoint intervals (after some manual binary search). That being said, it is likely to be more expensive, especially on ranges with multiple values matching.Baun
L
6

There is a set of them:

http://www.sgi.com/tech/stl/table_of_contents.html

Search for:

On a separate note:

They were probably thinking that searching containers could term up more than one result. But on the odd occasion where you just need to test for existence an optimized version would also be nice.

Lascivious answered 15/1, 2009 at 10:41 Comment(4)
binary_search doesn't return an iterator as I mentioned earlier, that's why I'm looking for an alternative.Vish
Yes, I know. But it fits in the set of binary search algorithms. So its nice for others to know about.Lascivious
binary_search is just, like so many other things in the STL, named wrong. I hate that. Testing for existence is not the same as searching for something.Norwood
These binary search functions are not useful in case where you want to know the index of the element you are looking for. I have to write my own recursive function for this task. I hope this, template<class T> int bindary_search(const T& item), should be added to the next version of C++.Burkholder
E
3

If std::lower_bound is too low-level for your liking, you might want to check boost::container::flat_multiset. It is a drop-in replacement for std::multiset implemented as a sorted vector using binary search.

Effeminate answered 3/11, 2012 at 12:58 Comment(1)
Good link; and also good link in the link: lafstern.org/matt/col1.pdf, which describes how lookups implemented with a sorted vector, rather than set (though both are log(N)), have significantly better constants of proportionality and are ~twice as fast (the disadvantage being a larger INSERTION time).Rumba
R
3

The shortest implementation, wondering why it's not included in the standard library:

template<class ForwardIt, class T, class Compare=std::less<>>
ForwardIt binary_find(ForwardIt first, ForwardIt last, const T& value, Compare comp={})
{
    // Note: BOTH type T and the type after ForwardIt is dereferenced 
    // must be implicitly convertible to BOTH Type1 and Type2, used in Compare. 
    // This is stricter than lower_bound requirement (see above)

    first = std::lower_bound(first, last, value, comp);
    return first != last && !comp(value, *first) ? first : last;
}

From https://en.cppreference.com/w/cpp/algorithm/lower_bound

Recension answered 25/9, 2018 at 12:26 Comment(1)
I can think of two reasons this is not in the standard library: They think it is easy to implement, but the major reason is probably that it may require a reversed version of operator()() if value is not interchangeable with *first.Athenaathenaeum
R
2
int BinarySearch(vector<int> array,int var)
{ 
    //array should be sorted in ascending order in this case  
    int start=0;
    int end=array.size()-1;
    while(start<=end){
        int mid=(start+end)/2;
        if(array[mid]==var){
            return mid;
        }
        else if(var<array[mid]){
            end=mid-1;
        }
        else{
            start=mid+1;
        }
    }
    return 0;
}

Example: Consider an array, A=[1,2,3,4,5,6,7,8,9] Suppose you want to search the index of 3 Initially, start=0 and end=9-1=8 Now, since start<=end; mid=4; (array[mid] which is 5) !=3 Now, 3 lies to the left of mid as its smaller than 5. Therefore, we only search the left part of the array Hence, now start=0 and end=3; mid=2.Since array[mid]==3, hence we got the number we were searching for. Hence, we return its index which is equal to mid.

Rie answered 12/2, 2018 at 19:50 Comment(3)
It's good to have code, but you could improve the answer by providing a brief explanation of how it works for people who are new to the language.Ido
Somebody incorrectly flagged your post as low-quality. A code-only answer is not low-quality. Does it attempt to answer the question? If not, flag as 'not an answer' or recommend deletion (if in the review queue). b) Is it technically incorrect? Downvote or comment.Clary
Not sure about how this compares in terms of actual runtime against the std:lower_bound solution, but personally I much prefer this nice and readable solution. It is a clean and simple implementation of binary search returning an index. No idea why this is so poorly rated!! And yes, it's not a template and yes, it doesn't return an iterator - but shall we have a poll how many people do binary search on anything other than a vector of ints?Chiliad
N
1

Check this function, qBinaryFind:

RandomAccessIterator qBinaryFind ( RandomAccessIterator begin, RandomAccessIterator end, const T & value )

Performs a binary search of the range [begin, end) and returns the position of an occurrence of value. If there are no occurrences of value, returns end.

The items in the range [begin, end) must be sorted in ascending order; see qSort().

If there are many occurrences of the same value, any one of them could be returned. Use qLowerBound() or qUpperBound() if you need finer control.

Example:

QVector<int> vect;
 vect << 3 << 3 << 6 << 6 << 6 << 8;

 QVector<int>::iterator i =
         qBinaryFind(vect.begin(), vect.end(), 6);
 // i == vect.begin() + 2 (or 3 or 4)

The function is included in the <QtAlgorithms> header which is a part of the Qt library.

Nottinghamshire answered 15/1, 2009 at 11:2 Comment(1)
Unfortunately this algorithm isn't compatible with STL containers.Brumby
H
0

std::lower_bound() :)

Haggar answered 15/1, 2009 at 10:43 Comment(1)
OP: "so far lower_bound and upper_bound fail, because..."Iverson
R
0

A solution returning the position inside the range could be like this, using only operations on iterators (it should work even if iterator does not arithmetic):

template <class InputIterator, typename T>
size_t BinarySearchPos(InputIterator first, InputIterator last, const T& val)
{       
    const InputIterator beginIt = first;
    InputIterator element = first;
    size_t p = 0;
    size_t shift = 0;
    while((first <= last)) 
    {
        p = std::distance(beginIt, first);
        size_t u = std::distance(beginIt, last);
        size_t m = p + (u-p)/2;  // overflow safe (p+u)/2
        std::advance(element, m - shift);
        shift = m;
        if(*element == val) 
            return m; // value found at position  m
        if(val > *element)
            first = element++;
        else
            last  = element--;

    }
    // if you are here the value is not present in the list, 
    // however if there are the value should be at position u
    // (here p==u)
    return p;

}
Regalia answered 15/3, 2017 at 21:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.