Finding multiple entries with binary search
Asked Answered
D

15

33

I use standard binary search to quickly return a single object in a sorted list (with respect to a sortable property).

Now I need to modify the search so that ALL matching list entries are returned. How should I best do this?

Doloresdolorimetry answered 27/8, 2012 at 15:18 Comment(2)
What language? the 'standard' binary search might be different or have some convenient overloads.Boom
@ColinD: I currently use my own implementation in Java. It's about a dozen rows.Doloresdolorimetry
G
27

Well, as the list is sorted, all the entries you are interested in are contiguous. This means you need to find the first item equal to the found item, looking backwards from the index which was produced by the binary search. And the same about last item.

You can simply go backwards from the found index, but this way the solution may be as slow as O(n) if there are a lot of items equal to the found one. So you should better use exponential search: double your jumps as you find more equal items. This way your whole search is still O(log n).

Grimes answered 27/8, 2012 at 15:28 Comment(3)
why go backward and not forward too?Sodamide
@Muhammad: of course, you are right: the same applies to finding the upper boundGrimes
@Grimes The second section is either wrong or missing. What will you do when your edge is at the (n-2)th element? Your last jump of (n/2) will fail (assume n is 2^x) . What will you do now? If you scan it's O(n). If you apply the same technique it will fail again at the (n/4) jump. And so on... Can you prove it's O(log n)?Farlay
J
27

First let's recall the naive binary search code snippet:

int bin_search(int arr[], int key, int low, int high)
{
    if (low > high)
        return -1;

    int mid = low + ((high - low) >> 1);

    if (arr[mid] == key) return mid;
    if (arr[mid] > key)
        return bin_search(arr, key, low, mid - 1);
    else
        return bin_search(arr, key, mid + 1, high);
}

Quoted from Prof.Skiena: Suppose we delete the equality test if (s[middle] == key) return(middle); from the implementation above and return the index low instead of −1 on each unsuccessful search. All searches will now be unsuccessful, since there is no equality test. The search will proceed to the right half whenever the key is compared to an identical array element, eventually terminating at the right boundary. Repeating the search after reversing the direction of the binary comparison will lead us to the left boundary. Each search takes O(lgn) time, so we can count the occurrences in logarithmic time regardless of the size of the block.

So, we need two rounds of binary_search to find the lower_bound (find the first number no less than the KEY) and the upper_bound (find the first number bigger than the KEY).

int lower_bound(int arr[], int key, int low, int high)
{
    if (low > high)
        //return -1;
        return low;

    int mid = low + ((high - low) >> 1);
    //if (arr[mid] == key) return mid;

    //Attention here, we go left for lower_bound when meeting equal values
    if (arr[mid] >= key) 
        return lower_bound(arr, key, low, mid - 1);
    else
        return lower_bound(arr, key, mid + 1, high);
}

int upper_bound(int arr[], int key, int low, int high)
{
    if (low > high)
        //return -1;
        return low;

    int mid = low + ((high - low) >> 1);
    //if (arr[mid] == key) return mid;

    //Attention here, we go right for upper_bound when meeting equal values
    if (arr[mid] > key) 
        return upper_bound(arr, key, low, mid - 1);
    else
        return upper_bound(arr, key, mid + 1, high);
}

Hope it's helpful :)

Johanson answered 22/9, 2014 at 3:6 Comment(4)
got a StackOverflowError when running this code with array size of 10,000.Aronarondel
I think each "high - 1" should be "mid - 1", and each "low + 1" should be "mid + 1".Aronarondel
Works great!, tho I tried this with an array that does not contain an elements that your're looking for (Ex: trying to search for a key 2 in an array [1,1,3,4,5]. It'll return lower 2 and upper 5. So I added an extra check after if (low > high), if the element at low is in range of the array (>0 <size) and if it matches to the key.Brunner
what is the time complexity of (upperbound and lower bound ) algorithm ?Gwendagwendolen
A
8

If I'm following your question, you have a list of objects which, for the purpose of comparison, look like {1,2,2,3,4,5,5,5,6,7,8,8,9}. A normal search for 5 will hit one of objects that compare as 5, but you want to get them all, is that right?

In that case, I'd suggest a standard binary search which, upon landing on a matching element, starts looking left until it stops matching, and then right (from the first match) again until it stops matching.

Be careful that whatever data structure you are using is not overwriting elements that compare to the same!

Alternatively, consider using a structure that stores elements that compare to the same as a bucket in that position.

Ancier answered 27/8, 2012 at 15:24 Comment(2)
Thanks, it was something like this I figured would be fruitful. But the code gets quite ugly. I was hoping someone had a short recursive solution or something like that... :)Doloresdolorimetry
It doesn't need to be ugly. You should have one function that does the binary search and returns the index of the value, then a linear search, taking an initial index to return the rest, possibly recursively calling itself to search left and right.Sita
J
3

I would do two binary searches, one looking for the first element comparing >= the value (in C++ terms, lower_bound) and then one searching for the first element comparing > the value (in C++ terms, upper_bound). The elements from lower_bound to just before upper bound are what you are looking for (in terms of java.util.SortedSet, subset(key, key)).

So you need two different slight modifications to the standard binary search: you still probe and use the comparison at the probe to narrow down the area in which the value you are looking for must lie, but now e.g. for lower_bound if you hit equality, all you know is that the element you are looking for (the first equal value) is somewhere between the first element of the range so far and the value you have just probed - you can't return immediately.

Joris answered 27/8, 2012 at 15:32 Comment(1)
This is a right way of doing it, the functions lower bound/higher bound are functions that are implemented as such in many libraries.Crotchet
C
3

Once you found a match with the bsearch, just recursive bsearch both side until no more match

pseudo code :

    range search (type *array) {
      int index = bsearch(array, 0, array.length-1);

      // left
      int upperBound = index -1;
      int i = upperBound;
      do {
         upperBound = i;
         i = bsearch(array, 0, upperBound);
      } while (i != -1)

      // right
      int lowerBound = index + 1;
      int i = lowerBound;
      do {
         lowerBound = i;
         i = bsearch(array, lowerBound, array.length);
      } while (i != -1)

      return range(lowerBound, UpperBound);
}

No corner cases are covered though. I think this will keep ur complexity to (O(logN)).

Carotenoid answered 27/8, 2012 at 16:3 Comment(0)
C
2

This depends on which implementation of the binary search you use:

  • In Java and .NET, the binary search will give you an arbitrary element; you must search both ways to get the range that you are looking for.
  • In C++ you can use equal_range method to produce the result that you want in a single call.

To speed up searches in Java and .NET for cases when the equal range is too long for iterating linearly, you can look for a predecessor element and for the successor element, and take values in the middle of the range that you find, exclusive of the ends.

Should this be too slow because of a second binary search, consider writing your own search that looks for both ends at the same time. This may be a bit tedious, but it should run faster.

Can answered 27/8, 2012 at 15:26 Comment(2)
Thanks, I'm using my own implementation. Normally, any language supplied implementations return only a single element.Doloresdolorimetry
@Doloresdolorimetry Not in C++, it returns the entire range. You may want to take a look at their implementation and see how they do it, it's pretty clever, and it shouldn't be too hard to translate equal_range to your language of choice.Can
I
2

I'd start by finding the index of a single element given the sortable property (using "normal" binary search) and then start looking to both left-and-right of the element in the list, adding all elements found to meet the search criterion, stopping at one end when an element doesn't meet the criterion or there are no more elements to traverse, and stopping altogether when both the left-and-right ends meet the stop conditions mentioned before.

Inveigle answered 27/8, 2012 at 15:26 Comment(3)
Thanks. Seems to be the consensus. Do you know if this is established best practice?Doloresdolorimetry
@Doloresdolorimetry Well for one thing, it's the easiest-to-implement solution, reusing an existing algorithm with minimum modifications. That's a huge advantage over inventing and testing a new variation of the algorithm, and it also has a negligible cost in performanceCounsel
If you want to preserve an existing binary search you could create two extra arrays giving, for each element, the number of equal values to its left and right. Using these as part of a composite key, you could locate (key, left(0)) and (key, right(0)) - the first and last elements holding value key. Probably only worthwhile if you want a single value and a count, as I guess if you must read all values the cost of moving left and right to find them is comparatively small.Joris
P
2

This code in Java is counting occurences of target value in a sorted array in O(logN) time in one pass. It's easy to modify it to return list of found indexes, just pass in ArrayList.

Idea is to recursively refine e and b bounds until they become lower and upper boundary for contiguous block having target values;

static int countMatching(int[] arr, int b, int e, int target){
    int m = (b+e)/2;
    
    if(e-b<2){
        int count = 0;
        if(arr[b] == target){
            count++;
        }
        if(arr[e] == target && b!=e){
            count++;
        }
        return count;
    }
    else if(arr[m] > target){
        return countMatching(arr,b,m-1, target);
    }
    else if(arr[m] < target){
        return countMatching(arr, m+1, e, target);
    }
    else {
        return countMatching(arr, b, m-1, target) + 1 
            + countMatching(arr, m+1, e, target);
    }
}
Pimp answered 15/12, 2020 at 20:20 Comment(0)
B
1

does your binary search return the element, or the index the element is at? Can you get the index?

Since the list is sorted, all matching elements should appear adjacent. If you can get the index of the item returned in the standard search, you just need to search in both directions from that index until you find non-matches.

Boom answered 27/8, 2012 at 15:24 Comment(1)
The use of index is central to the algorithm, so I've got it. Thanks.Doloresdolorimetry
O
1

Here is the solution by Deril Raju (in the answer above), ported to swift:

func bin_search(_ A: inout [Int], first: Int, last: Int, key: Int, searchLow: Bool) -> Int {
    var result = -1
    var low = first
    var high = last

    while low <= high {
        let mid = (low + high) / 2
        if A[mid] < key {
            low = mid + 1
        } else if A[mid] > key {
            high = mid - 1
        } else {
            result = mid
            if searchLow {
                high = mid - 1 // go on searching towards left (lower indices)
            } else {
                low = mid + 1 // go on searching towards right (higher indices)
            }
        }
    }
    return result
}

func bin_search_range(_ A: inout [Int], first: Int, last: Int, key: Int) -> (Int, Int) {
    let low = bin_search(&A, first: first, last: last, key: key, searchLow: true)
    let high = bin_search(&A, first: first, last: last, key: key, searchLow: false)
    return (low, high)
}


func test() {
    var A = [1, 2, 3, 3, 3, 4, 4, 4, 4, 5]

    assert(bin_search(&A, first: 0, last: A.count - 1, key: 3, searchLow: true) == 2)
    assert(bin_search(&A, first: 0, last: A.count - 1, key: 3, searchLow: false) == 4)
    assert(bin_search_range(&A, first: 0, last: A.count - 1, key: 3) == (2, 4))

    assert(bin_search(&A, first: 0, last: A.count - 1, key: 4, searchLow: true) == 5)
    assert(bin_search(&A, first: 0, last: A.count - 1, key: 4, searchLow: false) == 8)
    assert(bin_search_range(&A, first: 0, last: A.count - 1, key: 4) == (5, 8))

    assert(bin_search_range(&A, first: 0, last: A.count - 1, key: 5) == (9, 9))
    assert(bin_search_range(&A, first: 0, last: A.count - 1, key: 0) == (-1, -1))
}

test()
Occupy answered 6/8, 2022 at 16:54 Comment(0)
S
0

Try this. It works amazingly.

working example, Click here

   var arr = [1, 1, 2, 3, "a", "a", "a", "b", "c"]; // It should be sorted array.
   // if it arr contain more than one keys than it will return an array indexes. 

   binarySearch(arr, "a", false);

   function binarySearch(array, key, caseInsensitive) {
       var keyArr = [];
       var len = array.length;
       var ub = (len - 1);
       var p = 0;
       var mid = 0;
       var lb = p;

       key = caseInsensitive && key && typeof key == "string" ? key.toLowerCase() : key;

       function isCaseInsensitive(caseInsensitive, element) {
           return caseInsensitive && element && typeof element == "string" ? element.toLowerCase() : element;
       }
       while (lb <= ub) {
           mid = parseInt(lb + (ub - lb) / 2, 10);

           if (key === isCaseInsensitive(caseInsensitive, array[mid])) {
               keyArr.push(mid);
               if (keyArr.length > len) {
                   return keyArr;
               } else if (key == isCaseInsensitive(caseInsensitive, array[mid + 1])) {
                   for (var i = 1; i < len; i++) {
                       if (key != isCaseInsensitive(caseInsensitive, array[mid + i])) {
                           break;
                       } else {
                           keyArr.push(mid + i);

                       }
                   }
               }
               if (keyArr.length > len) {
                   return keyArr;
               } else if (key == isCaseInsensitive(caseInsensitive, array[mid - 1])) {
                   for (var i = 1; i < len; i++) {

                       if (key != isCaseInsensitive(caseInsensitive, array[mid - i])) {
                           break;
                       } else {
                           keyArr.push(mid - i);
                       }
                   }
               }
               return keyArr;

           } else if (key > isCaseInsensitive(caseInsensitive, array[mid])) {
               lb = mid + 1;
           } else {
               ub = mid - 1;
           }
       }

       return -1;
   }
Systematic answered 27/3, 2017 at 22:47 Comment(0)
M
0

Very efficient algorithm for this was found recently.
The algorithm has logarithmic time complexity considering both variables (size of input and amount of searched keys). However searched keys has to be sorted as well.

#define MIDDLE(left, right) ((left) + (((right) - (left)) >> 1))

int bs (const int *arr, int left, int right, int key, bool *found)
{
    int middle = MIDDLE(left, right);

    while (left <= right)
    {
        if (key < arr[middle])
            right = middle - 1;
        else if (key == arr[middle]) {
            *found = true;
            return middle;
        }
        else
            left = middle + 1;
        middle = MIDDLE(left, right);
    }

    *found = false;
    /* left points to the position of first bigger element */
    return left;
}

static void _mkbs (const int *arr, int arr_l, int arr_r,
                   const int *keys, int keys_l, int keys_r, int *results)
{
    /* end condition */
    if (keys_r - keys_l < 0)
        return;

    int keys_middle = MIDDLE(keys_l, keys_r);

    /* throw away half of keys, if the key on keys_middle is out */
    if (keys[keys_middle] < arr[arr_l]) {
        _mkbs(arr, arr_l, arr_r, keys, keys_middle + 1, keys_r, results);
        return;
    }
    if (keys[keys_middle] > arr[arr_r]) {
        _mkbs(arr, arr_l, arr_r, keys, keys_l, keys_middle - 1, results);
        return;
    }

    bool found;
    int pos = bs(arr, arr_l, arr_r, keys[keys_middle], &found);

    if (found)
        results[keys_middle] = pos;

    _mkbs(arr, arr_l, pos - 1, keys, keys_l, keys_middle - 1, results);
    _mkbs(arr, (found) ? pos + 1 : pos, arr_r, keys, keys_middle + 1, keys_r, results);
}

void mkbs (const int *arr, int N, const int *keys, int M, int *results)
{   _mkbs(arr, 0, N - 1, keys, 0, M - 1, results);   }

Here is the implementation in C and a draft paper intended for publication: https://github.com/juliusmilan/multi_value_binary_search

Can you please share a use case?

Mane answered 24/11, 2018 at 9:40 Comment(0)
E
0

You can use the below code for your problem. The main aim here is first to find the lower bound of the key and then to find the upper bound of the same. Later we get the difference of the indices and we get our answer. Rather than having two different functions, we can use a flag which can be used to find the upper bound and lower bound in the same function.

#include <iostream>
#include <bits/stdc++.h>
using namespace std;

int bin_search(int a[], int low, int high, int key, bool flag){
long long int mid,result=-1;
while(low<=high){
    mid = (low+high)/2;
    if(a[mid]<key)
        low = mid + 1;
    else if(a[mid]>key)
        high = mid - 1;
    else{
        result = mid;
        if(flag)
            high=mid-1;//Go on searching towards left (lower indices)
        else
            low=mid+1;//Go on searching towards right (higher indices)
    }
}
return result;
}

int main() {

int n,k,ctr,lowind,highind;
cin>>n>>k;
//k being the required number to find for
int a[n];
for(i=0;i<n;i++){
    cin>>a[i];
}
    sort(a,a+n);
    lowind = bin_search(a,0,n-1,k,true);
    if(lowind==-1)
        ctr=0;
    else{
        highind = bin_search(a,0,n-1,k,false);
        ctr= highind - lowind +1;   
}
cout<<ctr<<endl;
return 0;
}
Evetta answered 30/1, 2019 at 5:44 Comment(0)
E
0

You can do two searches: one for index before the range and one for index after. Because the before and after can repeat itself - use float as "unique" key"

    static int[] findFromTo(int[] arr, int key) {
    float beforeKey = (float) ((float) key - 0.2);
    float afterKey = (float) ((float) key + 0.2);
    int left = 0;
    int right = arr.length - 1;
    for (; left <= right;) {
        int mid = left + (right - left) / 2;
        float cur = (float) arr[mid];
        if (beforeKey < cur)
            right = mid - 1;
        else
            left = mid + 1;
    }
    leftAfter = 0;
    right = arr.length - 1;
    for (; leftAfter <= right;) {
        int mid = left + (right - leftAfter) / 2;
        float cur = (float) arr[mid];
        if (afterKey < cur)
            right = mid - 1;
        else
            left = mid + 1;
    }
    return new int[] { left, leftAfter };
}
Elo answered 23/6, 2020 at 15:28 Comment(0)
H
0

You can use recursion to solve the problem. first, call binary_search on the list, if the matching element is found then the left side of the list is called, and the right side of the list is reached. The matching element index is stored in the vector and returns the vector.

vector<int> binary_search(int arr[], int n, int key) {
    vector<int> ans;
    int st = 0;
    int last = n - 1;
    helper(arr, st, last, key, ans);

    return ans;
}

Recursive function:

void helper(int arr[], int st, int last, int key, vector<int> &ans) {
    if (st > last) {
        return;
    }

    while (st <= last) {
        int mid = (st + last) / 2;
        if (arr[mid] == key) {
            ans.push_back(mid);
            helper(arr, st, mid - 1, key, ans);    // left side call
            helper(arr, mid + 1, last, key, ans);  // right side call
            return;
        } else if (arr[mid] > key) {
            last = mid - 1;
        } else {
            st = mid + 1;
        }
    }
}
Holdall answered 13/1, 2023 at 10:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.