First occurrence in a binary search
Asked Answered
P

16

30

I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

//ripped from the JDK
public static int binarySearchValue(InvertedContainer.InvertedIndex[] a, long key) {
    return bSearchVal(a, 0, a.length, key);
}

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                                 int toIndex, long key) {
    int low = fromIndex;
    int high = toIndex - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        long midVal = a[mid].val;

        if (midVal < key)
            low = mid + 1;
        else if (midVal > key)
            high = mid - 1;
        else
            return mid; // key found
    }
    return (low); // key not found. return insertion point
}
Proselytize answered 13/7, 2011 at 8:47 Comment(2)
Cool - I get to rep-whore my own question and answer... #4948662 explains a form of the binary search that can find the first item > or >=, or the last item < or <=.Tulle
Hah, thanks! I'll take a look. Every once in a while I notice something like this and I think 'you know nothing'.Proselytize
B
21

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

Bruin answered 13/7, 2011 at 8:49 Comment(5)
Hi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no?Proselytize
BTW, I'm a big fan. Thanks for your response so far.Proselytize
@Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer another method which will find the first one, by calling the original method and then performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using ~) to indicate a failure to find the value, but also indicating the insertion point at the same time.Bruin
This solution has O(N) time complexity as there can be up to N elements with the value you are searching for.Odontograph
@JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer.Bruin
C
62

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else if (low != mid) //Equal but range is not fully scanned
        high = mid; //Set upper bound to current number and rescan
    else //Equal and full range is scanned
        return mid;
Concessionaire answered 13/7, 2011 at 9:7 Comment(6)
How would you change this to work for a last index of instead of a first index of?Proselytize
Simply invert the high and low (didnt test if that works): .... else if (high != mid) low = mid; else return mid;Concessionaire
first item: mid = (unsigned int) floor((low + high) / 2.0); last item: mid = (unsigned int) ceil((low + high) / 2.0);Banister
@Concessionaire i need a little help of yours... i want to know whether this code will work faster than the code in this link hackerearth.com/notes/searching-code-monk please see the code for first occurrenceDeflation
@VaibhavBajpai where / how does that fit in?Zarzuela
#14415908Zarzuela
B
21

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

Bruin answered 13/7, 2011 at 8:49 Comment(5)
Hi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no?Proselytize
BTW, I'm a big fan. Thanks for your response so far.Proselytize
@Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer another method which will find the first one, by calling the original method and then performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using ~) to indicate a failure to find the value, but also indicating the insertion point at the same time.Bruin
This solution has O(N) time complexity as there can be up to N elements with the value you are searching for.Odontograph
@JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer.Bruin
Z
6

If your data is all integral, then this hack can help. It uses a float array to store values.

float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

import java.util.Arrays;

public class BinarySearch {
    // considering array elements are integers
    float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

    public void returnFirstOccurrence(int key) {
        int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
        if (ar[firstIndex] != key)
            System.out.println("Key doesn't exist");
        else
            System.out.println("First index of key is " + firstIndex);
    }

    public static void main(String Args[]) throws Exception {
        new BinarySearch().returnFirstOccurrence(9);
    }

}

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

Zama answered 10/10, 2013 at 10:26 Comment(6)
Could you explain? How does this get the first index of occurrence?Fadge
Binary search implementation in java Collections either returns the index of the number OR if the number is not present it returns the index of position where the number can be inserted. see link. Also I've edited to include an example.Zama
got it. It's not just hacky but extremely hacky :) Not only that it works only for integers, but still you would want a float[] to hold them all. In case the client originally has an int[], then creating a new float[] might have some cost. You might want to make the two conditions written in bold :)Fadge
As I said, I only use it in contests. I'm a student and don't have any industrial experience but I agree it would be very inappropriate and utterly confusing to use it in a production code.Zama
no to me, using this kind of solution in production is better than using some ad hoc algo on an int[] ... there is an O(n) conversion cost if you do this once but if you create the array as float[] from the beginning and maintain it during lifetime of your program and run the search many times, there will be no significant performance diff.Bernettabernette
There are two issues with your example code. #1 If firstIndex == ar. length (because key > ar[ar.length-1]) this throws an exception. #2 ar[firstIndex] != key is risky as you shouldn't be comparing floats with == or !=Chalone
J
3

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

Juryman answered 13/7, 2011 at 9:3 Comment(3)
Although this is called a lower bound in the C++ library, it's still a binary search - at least according to my old copy of Algorithms and Data Structures (Modula 2 Edition) by Niklaus Wirth. Maybe it's a matter of opinion, though, where the boundary is between "different algorithm" and "variant of the same algorithm".Tulle
"Binary search" as implemented by many (most?) libraries (e.g. C, C++/STL, Java) does not specify which index is returned when multiple keys are present. This was also the problem in the posed question. "Lower bound" specifies exactly the result.Juryman
Agreed, but good names for library functions aren't necessarily the same as in the algorithms textbook, especially when the library may have several variants. BTW, I didn't mean to claim you're wrong about anything, and I upvoted your answer.Tulle
O
3

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

public static int binarySearch(int[] a, int key) {
    int low=0,high=a.length-1;
    while (low<=high) {
        int mid=(low+high) >>> 1;
        int midVal=a[mid];
        if (midVal < key) 
            low=mid+1;
        else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
            high=mid-1;
        else if (midVal==key) //found the 1st key 
             return mid;
        else
            return ~mid;      //found insertion point
    }
    return ~(a.length);       //insertion point after everything
}

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

Onomastic answered 9/2, 2012 at 0:33 Comment(0)
T
1

The following algorithm binary-searches for the first item with a key greater-than-or-equal-to your search key...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] >= goal)
  {
    //  new best-so-far
    upperbound = testpos;
  }
  else
  {
    lowerbound = testpos + 1;
  }
}

This isn't written for Java, which I don't know that well, so it may need a minor tweak. Note that the bounds are half-open (lowerbound is inclusive and upperbound is exclusive) and that this is important for correctness.

This can be adapted to other similar searches - e.g. finding the last key <= the search value.

This is slightly modified from my earlier question-and-answer here.

Tulle answered 13/7, 2011 at 9:4 Comment(1)
while it's trivial to extend, just want to point out that OP wants the first occurrence "equal to", not "greater than or equal to".Fadge
T
1

here is the solution, i found for getting the lower index of key having multiple occurrences in a sorted array using binary search.

int lowerBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]< key) low=mid;
        else high=mid;
    }
    int p = high;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

we have to change a little in this code to work for upper bound, using binary search, so here is the working copy of code.

 int upperBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]> key) high=mid;
        else low=mid;
    }
    int p = low;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}
Trantham answered 19/1, 2013 at 19:38 Comment(0)
Q
1

This should do the trick

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                             int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;
int result = low;
while (low <= high) {
    int mid = (low + high) >>> 1;
    long midVal = a[mid].val;

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else
    {
        result = mid;
        high = mid -1; 
    }
}
return result; 

}

Quaquaversal answered 29/6, 2014 at 18:45 Comment(0)
S
1

One approach is to hold an invariant throughout the whole binary search. In your particular case, the invariant would be:

  • array[low] < key
  • key <= array[high]

Then you can minimize the gap between low and high using binary search. When low + 1 == high, high would be the answer. Example code in C++:

// check invariant on initial values.
if (array[low] >= key) return low;
if (array[high] < key) return high+1;
// low + 1 < high ensures high is at least low + 2, thus
// mid will always be different from low or high. It will
// stop when low + 1 == high.
while (low + 1 < high) {
  int mid = low + (high - low) / 2;
  if (array[mid] < key) {
    low = mid;   // invariant: array[low] < key
  } else {
    high = mid;  // invariant: array[high] >= key
  }
}
return high;

Key difference between this and your example code is to update low and high to only mid rather than mid+1 or mid-1, because we have checked the value of array[mid], we can guarantee the invariant still holds when updating the boundaries. You need to check the invariant on initial values before starting to search too.

Soulless answered 17/2, 2018 at 22:15 Comment(0)
E
1

In this thread, you can find a full example of the binary search (recursive version), and two other versions (based on the original one) which allow you to get the first index and last index of a given key.

For your convenience I added the relevant Junit tests.

Emend answered 15/5, 2019 at 14:27 Comment(0)
T
0

Here's a variation of the solution in scala. Used pattern-matching and recursion instead of the while loop to get the first occurrence.

def binarySearch(arr:Array[Int],key:Int):Int = {
     def binSearchHelper(lo:Int,hi:Int,mid:Int):Int = {
        if(lo > hi) -1 else {
            if(arr(mid) == key) mid else if(arr(mid) > key){
                binSearchHelper(lo,mid-1,lo + (((mid-1) - lo)/2))
            }else{
                binSearchHelper(mid+1,hi,(mid+1) + ((hi - (mid+1))/2))
            }
        }
     }
    binSearchHelper(0,arr.size-1,(arr.size-1)/2)
}

def findFirstOccurrence(arr:Array[Int],key:Int):Int = {
    val startIdx = binarySearch(arr,key)
    startIdx match {
        case 0 => 0
        case -1 => -1
        case _ if startIdx > 0 => {
            if(arr(startIdx - 1) < key) startIdx else {
                    findFirstOccurrence(arr.slice(0,startIdx),key)
            }
        }
    }
}
Truth answered 4/5, 2014 at 23:40 Comment(0)
P
0

For the last occurrence of the element:

static int elementExists(int input[], int element){
    int lo=0;
    int high = input.length-1;
    while(lo<high){
        int mid = (lo + high )/2;
        if(element >input[mid] ){
            lo = mid+1;
        }
        else if(element < input[mid]){
            high= mid-1;
        }
        else if (high != input.length-1) //Change for the Occurrence check
            lo = mid;
        else {
            return mid;
        }
    }
    return -1;
}

For the first occurrence:

else if (lo != mid){
        high = mid;
}

Puga answered 29/11, 2015 at 22:38 Comment(0)
S
0

I think a simpler approach is storing the latest mid index where xs[mid] == key into a result variable and then keep running the binary search.

Here's the swift code:

func first<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { hi = mid - 1; res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Also, this requires a really small change (just one line) if you were to find the last index of a key.

func last<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { lo = mid + 1;  res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}
Snaggletooth answered 4/6, 2018 at 22:19 Comment(0)
G
0

Try this javascript recursive solution. It's optimal in a sense that it's O(log(N))

function solve(A, e) {
  function solve (A, start, end, e, bestUntilNow) {
    if (start > end) {
      if (A[start] === e)
        return start
      return bestUntilNow
    } else {
      const mid = start + Math.floor((end - start) / 2)
      if (A[mid] === e) {
        return solve(A, start, mid - 1, e, mid)
      } else if (e < A[mid]) {
        return solve(A, start, mid - 1, e, bestUntilNow)
      } else {
        return solve(A, mid + 1, end, e, bestUntilNow)
      }
    }
  }
  return solve(A, 0, A.length, e, -1)
}
Goglet answered 23/11, 2018 at 13:48 Comment(0)
K
0

Given a sorted array with possibly duplicated elements like [1,1,2,2,2,2,3], the following code with time complexity O(logn) finds the indexes of the first and last occurrences of an element in the given array. This approach is based on a recursive JS Binary Search implementation by comparing with the immediately lower or immediately higher index/element after matching initially the element (as it is also suggested below).

// Find the first occurence of the value using binary search
function binarySearchFirstOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle-1] < value || middle === 0)) {
    return middle;
  } else if (arr[middle] < value) {
    return binarySearchFirstOccurence(arr, middle + 1, right, value);
  } else {
    // Going lower
    return binarySearchFirstOccurence(arr, left, middle - 1, value);
  }
}

// Find the last occurence of the value using binary search
function binarySearchLastOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle+1] > value || middle === arr.length - 1)) {
    return middle;
  } else if (arr[middle] > value) {
    return binarySearchLastOccurence(arr, left, middle - 1, value);
  } else {
    // Going higher
    return binarySearchLastOccurence(arr, middle + 1, right, value);
  }
}

function sortedFrequency(arr, value) {
  let left = 0;
  let right = arr.length -1;
  let first = binarySearchFirstOccurence(arr, left, right, value);
  let last = binarySearchLastOccurence(arr, left, right, value);
  if (first === -1 && last === -1) {
    return -1;
  } else if (first === -1 && last !== -1) {
    return 1;
  } else {
    return last-first+1;
  }
}

let arr = [1,1,2,2,2,2,3];
console.log(sortedFrequency(arr, 3));

Best George

Kremer answered 2/5, 2020 at 17:58 Comment(0)
C
0

The solution is doing some changes in the binary search so that time complexity remains equal to O(log2n). Keep in mind that for binary search the array must be in sorted form. Following are the steps

  • Calculate the mid index
  • If the element to be searched is greater than element present at the mid index then low becomes mid + 1
  • If the element to be searched is smaller than element present at the mid index then high will become mid - 1.
  • Now lets make some changes. Suppose if two elements are same in an array so our aim is to return the first occurrence of that element. Case would be if mid index == 0 or arr[mid - 1] != arr[mid], then mid index is returned. This means that same element is not repeated and if arr[mid - 1] == arr[mid], then one will make high = mid - 1. The reason being one need to return the previous element as it is the first occurrence.

Code

def first_occurence_binary_search(arr, search_element):

n = len(arr)
low = 0
high = n - 1

while low <= high:

    mid = (low + high) // 2
    if arr[mid] > search_element:
        high = mid - 1

    elif arr[mid] < search_element:
        low = mid + 1

    else:
        if mid == 0 or arr[mid - 1] != arr[mid]:
            return mid
        else:
            high = mid - 1

return -1


array = [1, 3, 3, 3]
searching = 3
print(first_occurence_binary_search(arr=array,
                                    search_element=searching))
Cameron answered 20/6, 2022 at 13:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.