Extension of Binary search algo to find the first and last index of the key value to be searched in an array
Asked Answered
H

7

3

The problem is to extend the binary search algorithm to find all occurrences of a target value in a sorted array in the most efficient way. Concretely speaking, the input of the algorithm is (1) a sorted array of integers, where some numbers may appear more than once, and (2) a target integer to be searched. The output of the algorithm should be a pair of index values, indicating the first and last occurrence of the integer in the array, if it does occur. The source code could be in c#, c, c++.

Also, what is the max and min number of comparisons that we might need to find the indexes?

Huskamp answered 8/2, 2010 at 0:13 Comment(6)
If this is homework, tag it as such and show what you've got so far.Pathos
do you already understand the binary search or do you need an explanation of that too?There
yeah I understand the binary search algo. This is not an homework..its a part of class discussion.Huskamp
... doesn't that make it homework?Villanelle
okay treat it as a homework!!Huskamp
Please show what you have tried so far and where exactly you are stuck.Sinistrocular
C
3

If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.

Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.

** Update

Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.

A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.

Clavate answered 9/2, 2010 at 6:19 Comment(4)
Can you please elaborate how to use two different binary functions in finding the first and last appearance, I mean how do we compute that whether it is the first or last value.Huskamp
I updated my answer to give you one of the binary search function. Now you find one that finds the last appearance of the value in the array.Clavate
thank a lot for all your help!! I am assuming that the max no. of comparisons if I take 2 binary search for finding the min and max index value are 2 times log 2 to the base n and the best case is log 2 to the base n......correct me if I am wrong in either of the case.where n is no. of elements.Huskamp
umm.. that's not quite right. The best case is 2*floor(log base 2 of N) and the worst case is 2*ceil(log base 2 of N). The floor function rounds down; the ceil function rounds up. Take for instance an array of length 3. If I use the binary search function above, and the if(val > a[mid]) is true, then we have found the index with only 1 comparison, but if it isn't true, then it will take 2 comparisons. This corresponds with the worst and best cases above (though without the 2 in front because we are only using one binary search).Clavate
B
6

For C++, you could look up std::equal_range() and its complexity requirements. As long as you're interested in the basic algorithm, the same general rules should apply regardless of the language use for the implementation.

Blythebm answered 8/2, 2010 at 4:33 Comment(2)
+1: Everything that needs to be said, and all that should be said.Bajaj
Specifically, looking at the implementation of lower_bound and upper_bound would give great insight into how to do this correctly.Murine
C
3

If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.

Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.

** Update

Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.

A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.

Clavate answered 9/2, 2010 at 6:19 Comment(4)
Can you please elaborate how to use two different binary functions in finding the first and last appearance, I mean how do we compute that whether it is the first or last value.Huskamp
I updated my answer to give you one of the binary search function. Now you find one that finds the last appearance of the value in the array.Clavate
thank a lot for all your help!! I am assuming that the max no. of comparisons if I take 2 binary search for finding the min and max index value are 2 times log 2 to the base n and the best case is log 2 to the base n......correct me if I am wrong in either of the case.where n is no. of elements.Huskamp
umm.. that's not quite right. The best case is 2*floor(log base 2 of N) and the worst case is 2*ceil(log base 2 of N). The floor function rounds down; the ceil function rounds up. Take for instance an array of length 3. If I use the binary search function above, and the if(val > a[mid]) is true, then we have found the index with only 1 comparison, but if it isn't true, then it will take 2 comparisons. This corresponds with the worst and best cases above (though without the 2 in front because we are only using one binary search).Clavate
H
1

This is fairly easy to do without writing your own binary search algorithm, by repeatedly calling a standard algorithm.

// some curly-bracket language:

// int BinarySearch(sortedList, searchIndex, searchLength, valueToFind)
// returns the zero-based index of the item in the list, or a negative value
// if the item is not found

int inner = BinarySearch(list, 0, listSize, value);
if(inner < 0){
    // handle case where value is not found in list
}

int bottom = inner, top = inner;
while(true){
    int i = BinarySearch(list, 0, bottom, value);
    if(i < 0)
        break;
    bottom = i;
}
while(true){
    int i = BinarySearch(list, top + 1, listSize - top - 1, value);
    if(i < 0)
        break;
    top = i;
}

// bottom and top now hold the bounds of all instances of value in list

This is pretty close to the same efficiency you'd get with a custom algorithm, except that you have more function call overhead.

As for the number of comparisons, I'd have to think a little harder to be sure, but I think it's just 2*log2N, where N is the number of items in the list.


Edit

Bah! It's not 2*log2N, because unlike what you could do with a custom algorithm, it doesn't incrementally exclude portions of the list. It appears1 that the maximum comparison count is (log2N - 0.5) * log2N. This is still only 885 comparisons for a list with 230 elements (390 comparisons for 220 N, and 95 for 210 N), but we can do better than that.

// int Compare(a, b)
// returns 0 if a and b are equal,
//         a negative value if a < b, or
//         a positive value if a > b

int start = 0, end = listSize, inner;

while(true){
    if(end == start){
        // handle case where value is not found in list
    }
    inner = (start + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        break;
    if(cmp < 0)
        start = inner + 1;
    else end = inner;
}

int top = inner, bottom = inner;

while(true){
    if(start >= bottom)
        break;
    inner = (start + bottom) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        bottom = inner;
    else start = inner + 1;
}

while(true){
    if(end - 1 <= top)
        break;
    inner = (top + 1 + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        top = inner;
    else end = inner;
}

This will do at most 2*log2N comparisons. 230 items will require at most 60 comparisons, 220 items will require at most 40 comparisons, etc.


1 I determined this experimentally. I'm not quite smart enough to figure it out mathematically.

Hildredhildreth answered 11/2, 2010 at 2:56 Comment(0)
S
1

You can find the discussion on this in Bentley Programming Pearls and Knuth's Vol.3 : Sorting and Searching.

Here is one implementation in C++ : http://the-algo-blog.blogspot.com/2011/06/binary-search-to-find-last-and-first.html

Saltire answered 26/6, 2011 at 12:45 Comment(0)
P
0

I imagine that the normal algorithm would have something like this in it:

if(value == test) return;
if(value < test) min = i;
if(value > test) max = i;

Once you have used this to find one of the values, perform two more slightly moded binary searches using the min and max you currently have to find the tips.

To find the top most replace the above with:

if(value <= test) min = i;
if(value > test) max = i;

for the bottom most replace with:

if(value >= test) max = i;
if(value < test) min = i;

Note there is no early return using this method, you just keep going until min and max are like one or something apart, I suppose you could add one with another check

if(value == test and arr[i-1] != test) return;

etc.

Poulin answered 8/2, 2010 at 7:28 Comment(1)
thanks for the suggestion.....I am using the following approach: BinarySearch(A[0..N-1], value, low, high) { if (high < low) return -1 // not found mid = low + ((high - low) / 2) if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) else return mid // found. Now lets say the index of key value is found using this approach, then how do we find the index of first and last value of the same index??Huskamp
F
0

There's no clean answer to the most efficient part of the question. That would depend on how many entries with the same value is to be expected. If it's a few the a linear search in both directtions of the array after finding one element will be you're fastest option but if you're expecting a lot of entries with the same value you could do kind of a binary search to find the start end indices.

Disclaimer: Not tested; it's meant to show the idea and not to be used directly as production code

int org = binarySearch(array,value) //do the binary search and find on element
int min = org-delta; //delta is some constant based on how many elemts are to be expected
int max = org;
min = min < 0 ? 0 : min;
int search= min;
bool latestWasHit = false;
while(search > 0)
{
  if(search+1 == max)
     return max;
  if(array[search] != value)
  {
     min = search;
     search = search + (max-search)/2
  }
  else
  {
     max = search;
     search = (search-min)/2;
  } 
}

and then the reverse for the upper bound. However it will require quite a lot of elements before this is faster than a simple linear search.

Forebear answered 11/2, 2010 at 17:14 Comment(0)
U
0

I have created two binary search methods for returning first and last occurrences respectively.

public static void main(String[] args) {
    int a[] ={1,2,2,2,2,2,5,5,6,8,9,10};

    System.out.println(5+" first = "+first(a, 5, 0, a.length-1));
    System.out.println(5+" last = "+right(a, 5, 0, a.length-1));

    System.out.println(1+" first = "+first(a, 1, 0, a.length-1));
    System.out.println(1+" last = "+right(a, 1, 0, a.length-1));

    System.out.println(2+" first = "+first(a, 2, 0, a.length-1));
    System.out.println(2+" last = "+right(a, 2, 0, a.length-1));

    System.out.println(10+" first = "+first(a, 10, 0, a.length-1));
    System.out.println(10+" last = "+right(a, 10, 0, a.length-1));

    System.out.println(8+" first = "+first(a, 8, 0, a.length-1));
    System.out.println(8+" last = "+right(a, 8, 0, a.length-1));

    System.out.println(11+" first = "+first(a, 11, 0, a.length-1));
    System.out.println(11+" last = "+right(a, 11, 0, a.length-1));


}

private static int first(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==0 || a[mid-1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return first(a, x, l, mid-1);
    }else if(a[mid]>x){
        return first(a, x, l, mid-1);
    }else{
        return first(a, x, mid+1, h);
    }
}


private static int right(int [] a, int x, int l, int h){
    if(l>h){
        return -1;
    }
    int mid = (h-l)/2+l;
    if(a[mid] == x && (mid==a.length-1 || a[mid+1] != x) ){
        return mid;
    }else if(a[mid] == x){
        return right(a, x, mid+1, h);
    }else if(a[mid]>x){
        return right(a, x, l, mid-1);
    }else{
        return right(a, x, mid+1, h);
    }
}

Output:
    1 first = 0
    1 last = 0
    2 first = 1
    2 last = 5
    10 first = 11
    10 last = 11
    8 first = 9
    8 last = 9
    11 first = -1
    11 last = -1
Upon answered 7/12, 2017 at 6:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.