How to find the insertion point in an array using binary search?
Asked Answered
S

3

6

The basic idea of binary search in an array is simple, but it might return an "approximate" index if the search fails to find the exact item. (we might sometimes get back an index for which the value is larger or smaller than the searched value).

For looking for the exact insertion point, it seems that after we got the approximate location, we might need to "scan" to left or right for the exact insertion location, so that, say, in Ruby, we can do arr.insert(exact_index, value)

I have the following solution, but the handling for the part when begin_index >= end_index is a bit messy. I wonder if a more elegant solution can be used?

(this solution doesn't care to scan for multiple matches if an exact match is found, so the index returned for an exact match may point to any index that correspond to the value... but I think if they are all integers, we can always search for a - 1 after we know an exact match is found, to find the left boundary, or search for a + 1 for the right boundary.)

My solution:

DEBUGGING = true

def binary_search_helper(arr, a, begin_index, end_index)
  middle_index = (begin_index + end_index) / 2
  puts "a = #{a}, arr[middle_index] = #{arr[middle_index]}, " +
           "begin_index = #{begin_index}, end_index = #{end_index}, " +
           "middle_index = #{middle_index}" if DEBUGGING
  if arr[middle_index] == a
    return middle_index
  elsif begin_index >= end_index
    index = [begin_index, end_index].min
    return index if a < arr[index] && index >= 0  #careful because -1 means end of array
    index = [begin_index, end_index].max
    return index if a < arr[index] && index >= 0
    return index + 1
  elsif a > arr[middle_index]
    return binary_search_helper(arr, a, middle_index + 1, end_index)
  else
    return binary_search_helper(arr, a, begin_index, middle_index - 1)
  end
end

# for [1,3,5,7,9], searching for 6 will return index for 7 for insertion
# if exact match is found, then return that index
def binary_search(arr, a)
  puts "\nSearching for #{a} in #{arr}" if DEBUGGING
  return 0 if arr.empty?
  result = binary_search_helper(arr, a, 0, arr.length - 1)
  puts "the result is #{result}, the index for value #{arr[result].inspect}" if DEBUGGING
  return result
end


arr = [1,3,5,7,9]
b = 6
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = 6
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1,3,5,7,9,11]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1]
b = -60
arr.insert(binary_search(arr, b), b)
p arr

arr = [1]
b = 60
arr.insert(binary_search(arr, b), b)
p arr

arr = []
b = 60
arr.insert(binary_search(arr, b), b)
p arr

and result:

Searching for 6 in [1, 3, 5, 7, 9]
a = 6, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = 6, arr[middle_index] = 7, begin_index = 3, end_index = 4, middle_index = 3
a = 6, arr[middle_index] = 5, begin_index = 3, end_index = 2, middle_index = 2
the result is 3, the index for value 7
[1, 3, 5, 6, 7, 9]

Searching for 6 in [1, 3, 5, 7, 9, 11]
a = 6, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = 6, arr[middle_index] = 9, begin_index = 3, end_index = 5, middle_index = 4
a = 6, arr[middle_index] = 7, begin_index = 3, end_index = 3, middle_index = 3
the result is 3, the index for value 7
[1, 3, 5, 6, 7, 9, 11]

Searching for 60 in [1, 3, 5, 7, 9]
a = 60, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = 60, arr[middle_index] = 7, begin_index = 3, end_index = 4, middle_index = 3
a = 60, arr[middle_index] = 9, begin_index = 4, end_index = 4, middle_index = 4
the result is 5, the index for value nil
[1, 3, 5, 7, 9, 60]

Searching for 60 in [1, 3, 5, 7, 9, 11]
a = 60, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = 60, arr[middle_index] = 9, begin_index = 3, end_index = 5, middle_index = 4
a = 60, arr[middle_index] = 11, begin_index = 5, end_index = 5, middle_index = 5
the result is 6, the index for value nil
[1, 3, 5, 7, 9, 11, 60]

Searching for -60 in [1, 3, 5, 7, 9]
a = -60, arr[middle_index] = 5, begin_index = 0, end_index = 4, middle_index = 2
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 1, middle_index = 0
a = -60, arr[middle_index] = 9, begin_index = 0, end_index = -1, middle_index = -1
the result is 0, the index for value 1
[-60, 1, 3, 5, 7, 9]

Searching for -60 in [1, 3, 5, 7, 9, 11]
a = -60, arr[middle_index] = 5, begin_index = 0, end_index = 5, middle_index = 2
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 1, middle_index = 0
a = -60, arr[middle_index] = 11, begin_index = 0, end_index = -1, middle_index = -1
the result is 0, the index for value 1
[-60, 1, 3, 5, 7, 9, 11]

Searching for -60 in [1]
a = -60, arr[middle_index] = 1, begin_index = 0, end_index = 0, middle_index = 0
the result is 0, the index for value 1
[-60, 1]

Searching for 60 in [1]
a = 60, arr[middle_index] = 1, begin_index = 0, end_index = 0, middle_index = 0
the result is 1, the index for value nil
[1, 60]

Searching for 60 in []
[60]
Scooter answered 9/11, 2012 at 10:43 Comment(1)
note: #8672972Gunther
E
17

This is the code from Java's java.util.Arrays.binarySearch as included in Oracles Java:

    /**
     * Searches the specified array of ints for the specified value using the
     * binary search algorithm.  The array must be sorted (as
     * by the {@link #sort(int[])} method) prior to making this call.  If it
     * is not sorted, the results are undefined.  If the array contains
     * multiple elements with the specified value, there is no guarantee which
     * one will be found.
     *
     * @param a the array to be searched
     * @param key the value to be searched for
     * @return index of the search key, if it is contained in the array;
     *         otherwise, <tt>(-(<i>insertion point</i>) - 1)</tt>.  The
     *         <i>insertion point</i> is defined as the point at which the
     *         key would be inserted into the array: the index of the first
     *         element greater than the key, or <tt>a.length</tt> if all
     *         elements in the array are less than the specified key.  Note
     *         that this guarantees that the return value will be &gt;= 0 if
     *         and only if the key is found.
     */
    public static int binarySearch(int[] a, int key) {
        return binarySearch0(a, 0, a.length, key);
    }

    // Like public version, but without range checks.
    private static int binarySearch0(int[] a, int fromIndex, int toIndex,
                                     int key) {
        int low = fromIndex;
        int high = toIndex - 1;

        while (low <= high) {
            int mid = (low + high) >>> 1;
            int midVal = a[mid];

            if (midVal < key)
                low = mid + 1;
            else if (midVal > key)
                high = mid - 1;
            else
                return mid; // key found
        }
        return -(low + 1);  // key not found.
    }

The algorithm has proven to be appropriate and I like the fact, that you instantly know from the result whether it is an exact match or a hint on the insertion point.

This is how I would translate this into ruby:

# Inserts the specified value into the specified array using the binary
# search algorithm. The array must be sorted prior to making this call.
# If it is not sorted, the results are undefined.  If the array contains
# multiple elements with the specified value, there is no guarantee
# which one will be found.
#
# @param [Array] array the ordered array into which value should be inserted
# @param [Object] value the value to insert
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Array] the resulting array
def self.insert(array, value, from_index=0,  to_index=array.length)
  array.insert insertion_point(array, value, from_index, to_index), value
end

# Searches the specified array for an insertion point ot the specified value
# using the binary search algorithm.  The array must be sorted prior to making
# this call. If it is not sorted, the results are undefined.  If the array
# contains multiple elements with the specified value, there is no guarantee
# which one will be found.
#
# @param [Array] array the ordered array into which value should be inserted
# @param [Object] value the value to insert
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] the position where value should be inserted
def self.insertion_point(array, value, from_index=0,  to_index=array.length)
  raise(ArgumentError, 'Invalid Range') if from_index < 0 || from_index > array.length || from_index > to_index || to_index > array.length
  binary_search = _binary_search(array, value, from_index, to_index)
  if binary_search < 0
    -(binary_search + 1)
  else
    binary_search
  end
end

# Searches the specified array for the specified value using the binary
# search algorithm.  The array must be sorted prior to making this call.
# If it is not sorted, the results are undefined.  If the array contains
# multiple elements with the specified value, there is no guarantee which
# one will be found.
#
# @param [Array] array the ordered array in which the value should be searched
# @param [Object] value the value to search for
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] if > 0 position of value, otherwise -(insertion_point + 1)
def self.binary_search(array, value, from_index=0,  to_index=array.length)
  raise(ArgumentError, 'Invalid Range') if from_index < 0 || from_index > array.length || from_index > to_index || to_index > array.length
  _binary_search(array, value, from_index, to_index)
end

private
# Like binary_search, but without range checks.
#
# @param [Array] array the ordered array in which the value should be searched
# @param [Object] value the value to search for
# @param [Fixnum|Bignum] from_index ordered sub-array starts at
# @param [Fixnum|Bignum] to_index ordered sub-array ends the field before
# @return [Fixnum|Bignum] if > 0 position of value, otherwise -(insertion_point + 1)
def self._binary_search(array, value, from_index, to_index)
  low = from_index
  high = to_index - 1

  while low <= high do
    mid = (low + high) / 2
    mid_val = array[mid]

    if mid_val < value
      low = mid + 1
    elsif mid_val > value
      high = mid - 1
    else
      return mid # value found
    end
  end
  -(low + 1) # value not found.
end

Code returns the same values as OP provided for his test data.

Empire answered 9/11, 2012 at 10:58 Comment(7)
It is the exact insertion point. You just need to do -result -1 to get it back positive as stated in the javadoc for the return.Empire
This code has a subtle bug: the (low + high) expression could overflow resulting in a negative value for mid. Better would be to use int mid = low + (high - low) / 2;. Also better to explicitly use the divide by 2 and let the compiler perform the optimization.Tanh
@BobReynolds I just wrote a small programm as test and the overflow is correctly used by >>> so nothing gets lost. It is (Integer.MAX_VALUE + (Integer.MAX_VALUE / 2)) >>> 1) == (Integer.MAX_VALUE / 2) + (Integer.MAX_VALUE -(Integer.MAX_VALUE / 2)) / 2) as >>> effectivley treats the left operand as unsigned, thereby reusing the overflow. docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html While Your solution certainly is easier to prove right, the Java-guy impementing this certainly knew bit-operatins well.Empire
+1 due to the instructive comments and answer. BTW, I wish C's stdlib bsearch would behave the same, regretfully it returns NULL and not the so called insertion point, but the answer provides the required hackGriffon
@Empire My ignorance of Java is showing here. Per the document you noted, the >>> operator explicitly shifts a zero into the leftmost bit so my previous comment that it is a bug is just plain wrong. The result of using >> is sign dependent and would be an error. Thanks for the education.Tanh
that's a bit interesting... the code above still has the overflow bug ai.googleblog.com/2006/06/…Scooter
The Java-code uses one of the proposed alternatives (three >). Ruby probably should be adjusted. Will (hopefully) have a look at this tomorrow. Feel free to edit.Empire
S
1

Update 2020

Actually, the insertion problem of binary search has been well researched. There is a left insertion point and right insertion point. Code can be found on Wikipedia and Rosetta Code. For example, to find the left insertion point, the code is:

  BinarySearch_Left(A[0..N-1], value) {
      low = 0
      high = N - 1
      while (low <= high) {
          // invariants: value > A[i] for all i < low
                         value <= A[i] for all i > high
          mid = (low + high) / 2
          if (A[mid] >= value)
              high = mid - 1
          else
              low = mid + 1
      }
      return low
  }

One note is about the overflow bug, so mid really should be found as low + floor((high - low) / 2).

Earlier answer:

Actually, instead of checking for begin_index >= end_index, it can be better handled using begin_index > end_index, and the solution is much cleaner:

def binary_search_helper(arr, a, begin_index, end_index)    
  if begin_index > end_index
    return begin_index
  else
    middle_index = (begin_index + end_index) / 2
    if arr[middle_index] == a
      return middle_index
    elsif a > arr[middle_index]
      return binary_search_helper(arr, a, middle_index + 1, end_index)
    else
      return binary_search_helper(arr, a, begin_index, middle_index - 1)
    end
  end
end

# for [1,3,5,7,9], searching for 6 will return index for 7 for insertion
# if exact match is found, then return that index
def binary_search(arr, a)
  return binary_search_helper(arr, a, 0, arr.length - 1)
end

And using iteration instead of recursion may be faster and have less worry for stack overflow.

Scooter answered 9/11, 2012 at 12:8 Comment(1)
looks like a translation of my answer into ruby with just the difference, that you give the insertion point as positive number like the index of an actual find. Should work great, if you always need to insert the element.Empire
E
0

sample for left insertion:

def binary_search(arr, target):
    left = 0
    right = len(arr) - 1
    while left <= right:
        mid = (left + right) >> 1
        if arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return left

sample for right insertion:

def binary_search(arr, target):
    left = 0
    right = len(arr) - 1
    while left <= right:
        mid = (left + right) >> 1
        if arr[mid] <= target:
            left = mid + 1
        else:
            right = mid - 1

return left # add - 1 for right most index of target

sample for is present:

def binary_search(arr, target):
    left = 0
    right = len(arr) - 1
    while left <= right:
        mid = (left + right) >> 1
        if n < target:
            left = mid + 1
        elif n > target:
            right = mid - 1
        else:
            return True # or return mid for index


return False # or return -1 for not found

sample test case: arr = [1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10]

result = binary_search(arr, 5)

Enounce answered 22/12, 2022 at 11:8 Comment(1)
>> 1 is equivalent to division by 2, (left + right) // 2 also messes up the layoutEnounce

© 2022 - 2024 — McMap. All rights reserved.