Get index of closest value with binary search
Asked Answered
F

5

8

I want to do a binary search in python:

def binarySearch(data, val):

Where data is a sorted array and value is the value being searched for. If the value is found, I want to return the index (such that data[index] = val). If the value is not found, I want to return the index of the item that is closest to that value.

Here is what I've got:

def binarySearch(data, val):
    high = len(data)-1
    low = 0
    while True:
        index = (high + low) / 2
        if data[index] == val:
            return index
        if data[index] < val:
            low = index
        if data[index] > val:
            high = index
False answered 15/5, 2014 at 14:58 Comment(3)
Use the Python's bisect module: "The module is called bisect because it uses a basic bisection algorithm to do its work. The source code may be most useful as a working example of the algorithm (the boundary conditions are already right!)."Jugendstil
Indeed this has been solved in the standard library, and source is available there.Hypophysis
Retracted close vote and down-vote after the code was added.Ere
R
5

Something like this should work. It returns an array with two indexes. If val is found, both values in the return array are the same. Otherwise, it returns the indexes of the two items closest to val.

def binarySearch(data, val):
    highIndex = len(data)-1
    lowIndex = 0
    while highIndex > lowIndex:
            index = (highIndex + lowIndex) / 2
            sub = data[index]
            if data[lowIndex] == val:
                    return [lowIndex, lowIndex]
            elif sub == val:
                    return [index, index]
            elif data[highIndex] == val:
                    return [highIndex, highIndex]
            elif sub > val:
                    if highIndex == index:
                            return sorted([highIndex, lowIndex])
                    highIndex = index
            else:
                    if lowIndex == index:
                            return sorted([highIndex, lowIndex])
                    lowIndex = index
    return sorted([highIndex, lowIndex])
Roundfaced answered 15/5, 2014 at 15:1 Comment(2)
Assumes the list has an even number of values in it. If it is has 7 items data[index] is data[3.5] which is a TypeError.Armandarmanda
@Armandarmanda I was using Python 2.x where / will return a truncated integer. If you are using a language where / will not return a truncated integer then you'll have to add a line of code to truncate it.Roundfaced
C
14

Here is the code that will return the index if the value is found, otherwise the index of the item that is closest to that value, hope it helps.

def binarySearch(data, val):
    lo, hi = 0, len(data) - 1
    best_ind = lo
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if data[mid] < val:
            lo = mid + 1
        elif data[mid] > val:
            hi = mid - 1
        else:
            best_ind = mid
            break
        # check if data[mid] is closer to val than data[best_ind] 
        if abs(data[mid] - val) < abs(data[best_ind] - val):
            best_ind = mid
    return best_ind

def main():
    data = [1, 2, 3, 4, 5, 6, 7]
    val = 6.1
    ind = binarySearch(data, val)
    print 'data[%d]=%d' % (ind, data[ind])

if __name__ == '__main__':
    main()
Clere answered 6/12, 2014 at 23:51 Comment(2)
This solution only works for a list of unique values. In order to adjust it to a list of non-unique values one should change: if abs(data[mid] - val) < abs(data[best_ind] - val): on if abs(data[mid] - val) <= abs(data[best_ind] - val):Daphie
you should mid = lo + (hi - lo) // 2 Other than that excellent solutionBundy
R
5

Something like this should work. It returns an array with two indexes. If val is found, both values in the return array are the same. Otherwise, it returns the indexes of the two items closest to val.

def binarySearch(data, val):
    highIndex = len(data)-1
    lowIndex = 0
    while highIndex > lowIndex:
            index = (highIndex + lowIndex) / 2
            sub = data[index]
            if data[lowIndex] == val:
                    return [lowIndex, lowIndex]
            elif sub == val:
                    return [index, index]
            elif data[highIndex] == val:
                    return [highIndex, highIndex]
            elif sub > val:
                    if highIndex == index:
                            return sorted([highIndex, lowIndex])
                    highIndex = index
            else:
                    if lowIndex == index:
                            return sorted([highIndex, lowIndex])
                    lowIndex = index
    return sorted([highIndex, lowIndex])
Roundfaced answered 15/5, 2014 at 15:1 Comment(2)
Assumes the list has an even number of values in it. If it is has 7 items data[index] is data[3.5] which is a TypeError.Armandarmanda
@Armandarmanda I was using Python 2.x where / will return a truncated integer. If you are using a language where / will not return a truncated integer then you'll have to add a line of code to truncate it.Roundfaced
A
5

I know this is an old question, but it's high on Google's results and I had the same issue. There's a built-in to do this which uses binary search and allows you to feed in a reference array and a comparison array.

numpy.searchsorted(a, v, side='left', sorter=None)

a is the reference array (data in original question), v is the array to compare (val from the question). This returns an array of size v with int values for the index the nth element of v would need to be inserted into a to preserve the sort order in a' The side keyword determines whether you want the elements of v to be placed to the 'left' (before) or the 'right' (after) the appropriate value in a.

[documentation link as of July 2017] https://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html#numpy.searchsorted

Actuality answered 29/6, 2017 at 15:36 Comment(0)
G
0

Here's a sample implementation of binary search. I won't do all the (home?)work for you, I am sure you can figure out how to store and return the index of the closest value yourself.

# BINARY SEARCH: O(log n), search space halfed each step
def biSearch(lst, find): # expects sorted lst 
    lowIndex = 0
    highIndex = len(lst) - 1
    midIndex = (lowIndex + highIndex)//2
    lastMid = None
    steps = 0
    while midIndex != lastMid:
        steps += 1
        if lst[midIndex] == find:
            return (midIndex, steps)
        if lst[midIndex] < find:
            lowIndex = midIndex + 1
        else:
            highIndex = midIndex - 1
        lastMid = midIndex    
        midIndex = (lowIndex + highIndex)//2
    return (-1, steps)
Greff answered 15/5, 2014 at 15:2 Comment(0)
A
0

Not the answer to this question. But I landed here trying to figure out how to get the two surrounding values for a given target item in a sorted list.

If anyone else is looking, this is what I came up with based on some of the other answers here.

import random


def get_nearest(items, target):
    print(f'looking for {target}')
    high_index = len(items) - 1
    low_index = 0

    if not items[low_index] <= target <= items[high_index]:
        raise ValueError(f'The target {target} is not in the range of'
                         f' provided items {items[low_index]}:{items[high_index]}')

    if target in items:
        return target, target

    while high_index > low_index:
        index = int((high_index + low_index) / 2)
        sub = items[index]

        if sub > target:
            if high_index == index:
                return tuple(sorted([items[high_index], items[low_index]]))
            high_index = index
        else:
            if low_index == index:
                return tuple(sorted([items[high_index], items[low_index]]))
            low_index = index
    return tuple(sorted([items[high_index], items[low_index]]))


if __name__ == '__main__':
    my_randoms = sorted(random.sample(range(10000000), 100000))
    x = 340000
    print(get_nearest(my_randoms, x))

    x = 0
    my_randoms = [x] + my_randoms
    print(get_nearest(my_randoms, x))

    x = 10000000
    my_randoms.append(x)
    print(get_nearest(my_randoms, x))

    idx = random.randint(0, 100000)
    x = my_randoms[idx]
    print(get_nearest(my_randoms, x))
Armandarmanda answered 7/4, 2018 at 22:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.