Binary search (bisection) in Python
Asked Answered
G

22

217

Is there a library function that performs binary search on a list/tuple and return the position of the item if found and 'False' (-1, None, etc.) if not?

I found the functions bisect_left/right in the bisect module, but they still return a position even if the item is not in the list. That's perfectly fine for their intended usage, but I just want to know if an item is in the list or not (don't want to insert anything).

I thought of using bisect_left and then checking if the item at that position is equal to what I'm searching, but that seems cumbersome (and I also need to do bounds checking if the number can be larger than the largest number in my list). If there is a nicer method I'd like to know about it.

Edit To clarify what I need this for: I'm aware that a dictionary would be very well suited for this, but I'm trying to keep the memory consumption as low as possible. My intended usage would be a sort of double-way look-up table. I have in the table a list of values and I need to be able to access the values based on their index. And also I want to be able to find the index of a particular value or None if the value is not in the list.

Using a dictionary for this would be the fastest way, but would (approximately) double the memory requirements.

I was asking this question thinking that I may have overlooked something in the Python libraries. It seems I'll have to write my own code, as Moe suggested.

Giorgia answered 17/10, 2008 at 14:23 Comment(5)
What is it you're trying to accomplish? If the values are unique, consider using a set and "if value in set: something".Soulier
For what it's worth, "-1" is considered true; "0" would be false.Weiss
I mentioned -1 because a function that returns the index of the searched item in the array can return 0 already so -1 is returned if the item is not found (similar to substring search).Giorgia
If you use numpy, np.searchsorted is useful. docs.scipy.org/doc/numpy/reference/generated/…Upright
In the question, you say "I thought of using bisect_left and then checking if the item at that position is equal to what I'm searching, but that seems cumbersome". So why did you select the answer that does exactly that (and bounds checking as well)?Hallowmas
R
279

bisect_left finds the first position p at which an element could be inserted in a given sorted range while maintaining the sorted order. That will be the position of x if x exists in the range. If p is the past-the-end position, x wasn't found. Otherwise, we can test to see if x is there to see if x was found.

from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):
    if hi is None: hi = len(a)
    pos = bisect_left(a, x, lo, hi)                  # find insertion position
    return pos if pos != hi and a[pos] == x else -1  # don't walk off the end
Ratiocinate answered 10/2, 2010 at 2:5 Comment(3)
@volcano So does binsearch in general.Cloudburst
What about descending order?Nutrilite
Note: you could use pos < hi, to support a=(),x=-1,lo=1,hi=0 case (result should be -1 for an empty range). Also, hi=min(hi, len(a)) to support a=(0,), x=0, lo=0, hi=2 (result should be 0, not IndexError), and similar for lo. Given that binary_search is tricky around edges, it is better be explicit e.g., raise ValueError for unsupported lo, hi values.Gerardogeratology
R
62

Why not look at the code for bisect_left/right and adapt it to suit your purpose.

like this:

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1
Rufous answered 17/10, 2008 at 14:36 Comment(11)
I originally +1'ed this, but now I've come to the conclusion this isn't a good thing. If this answer is followed, it'll cause a lot of code duplication, and as we all know, it's really simple to f*ck up binary search.Marius
shouldn't it be hi = mid - 1 in the elif?Godrich
@Paweł: they are two equivalent variants, depending on whether the upper bound is inclusive or exclusive. you can change hi = mid to hi = mid-1 and hi = len(a) to hi = len(a)-1 and while lo < hi: to while lo <= hi, and it would be equivalently correctGail
why not do something like: def binary_search(a, x, lo = 0, hi = None): i = bisect(a, x, lo, hi) return i if a[i] == x else -1 sorry for the formatting - not sure how to do this properly in the comment arreaFullrigged
You should use bisect.bisect_left() rather than this.Heth
Note that this is classic binary search, which will return the first found index. Not very useful in case of duplicates.Pabulum
@Marius - While you're right about how easy it is to mess up implementations of algorithms (even simple ones like this one), it's still very valuable to have a known, working, simple implementation at hand. If only for educational purposes.Perpendicular
This will loop forever rather than returning -1 when you search for a number that isn't in the list. There needs to be an if midval != x and lo + 1 == hi: return -1 before the existing if statement.Herzog
… @AStudentataUniversity which is why we should listen to at-abyxRatiocinate
@alastair: sometimes you can't use bisect while binary_search() works e.g. Get the highest possible gmtime for any architectureGerardogeratology
@AStudentataUniversity: could you provide an example? both implementation work the same for limited lo, hi values repl.it/@zed1/binary-searchGerardogeratology
R
41

This is a little off-topic (since Moe's answer seems complete to the OP's question), but it might be worth looking at the complexity for your whole procedure from end to end. If you're storing thing in a sorted lists (which is where a binary search would help), and then just checking for existence, you're incurring (worst-case, unless specified):

Sorted Lists

  • O( n log n) to initially create the list (if it's unsorted data. O(n), if it's sorted )
  • O( log n) lookups (this is the binary search part)
  • O( n ) insert / delete (might be O(1) or O(log n) average case, depending on your pattern)

Whereas with a set(), you're incurring

  • O(n) to create
  • O(1) lookup
  • O(1) insert / delete

The thing a sorted list really gets you are "next", "previous", and "ranges" (including inserting or deleting ranges), which are O(1) or O(|range|), given a starting index. If you aren't using those sorts of operations often, then storing as sets, and sorting for display might be a better deal overall. set() incurs very little additional overhead in python.

Renown answered 17/10, 2008 at 16:59 Comment(3)
There is one other thing a sorted list gets you. O(n) ordered traversal. With a set that's O(n log n) and you end up having to copy references to the data into a list.Rashid
True enough! Thank you for expanding on what I meant by range search. Fwiw, a full traversal is the same a range query between min,max, which is O(k) where k = n :)Renown
How about lists with duplicates?Prisilla
A
20

It might be worth mentioning that the bisect docs now provide searching examples: http://docs.python.org/library/bisect.html#searching-sorted-lists

(Raising ValueError instead of returning -1 or None is more pythonic – list.index() does it, for example. But of course you can adapt the examples to your needs.)

Alodium answered 23/4, 2011 at 8:36 Comment(1)
index(a, x) in the above link solves the (binary) search task. +1Damoiselle
D
11

Simplest is to use bisect and check one position back to see if the item is there:

def binary_search(a,x,lo=0,hi=-1):
    i = bisect(a,x,lo,hi)
    if i == 0:
        return -1
    elif a[i-1] == x:
        return i-1
    else:
        return -1
Dramaturge answered 9/2, 2009 at 22:41 Comment(1)
Nice, but the code barfs if you do not pass in the 'hi' value. I'd write it like this: "def binary_search(a,x,lo=0,hi=None): from bisect import bisect i = bisect(a,x,lo,hi or len(a)) return (i-1 if a[i-1] == x else -1) " and test it like this: " for i in range(1, 20): a = list(range(i)) for aa in a: j = binary_search(a, aa) if j != aa: print i, aa, j"Vociferance
C
9

This is right from the manual:

http://docs.python.org/2/library/bisect.html

8.5.1. Searching Sorted Lists

The above bisect() functions are useful for finding insertion points but can be tricky or awkward to use for common searching tasks. The following five functions show how to transform them into the standard lookups for sorted lists:

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

So with the slight modification your code should be:

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    return -1
Cirrate answered 29/12, 2013 at 17:20 Comment(0)
D
7

This one is based on a mathematical assertion that the floor of (low + high)/2 is always smaller than high where low is the lower limit and high is the upper limit.


def binsearch(t, key, low = 0, high = len(t) - 1):
    # bisecting the range
    while low < high:
        mid = (low + high)//2
        if t[mid] < key:
            low = mid + 1
        else:
            high = mid
    # at this point 'low' should point at the place
    # where the value of 'key' is possibly stored.
    return low if t[low] == key else -1
Deferral answered 21/5, 2015 at 23:2 Comment(1)
How can you use variable t in high=len(t)-1 ?Comb
C
6

I agree that @DaveAbrahams's answer using the bisect module is the correct approach. He did not mention one important detail in his answer.

From the docs bisect.bisect_left(a, x, lo=0, hi=len(a))

The bisection module does not require the search array to be precomputed ahead of time. You can just present the endpoints to the bisect.bisect_left instead of it using the defaults of 0 and len(a).

Even more important for my use, looking for a value X such that the error of a given function is minimized. To do that, I needed a way to have the bisect_left's algorithm call my computation instead. This is really simple.

Just provide an object that defines __getitem__ as a

For example, we could use the bisect algorithm to find a square root with arbitrary precision!

import bisect

class sqrt_array(object):
    def __init__(self, digits):
        self.precision = float(10**(digits))
    def __getitem__(self, key):
        return (key/self.precision)**2.0

sa = sqrt_array(4)

# "search" in the range of 0 to 10 with a "precision" of 0.0001
index = bisect.bisect_left(sa, 7, 0, 10*10**4)
print 7**0.5
print index/(10**4.0)
Conover answered 8/9, 2013 at 8:28 Comment(1)
This is not clean. Use scipy.optimize for this.Ingalls
R
4

If you just want to see if it's present, try turning the list into a dict:

# Generate a list
l = [n*n for n in range(1000)]

# Convert to dict - doesn't matter what you map values to
d = dict((x, 1) for x in l)

count = 0
for n in range(1000000):
    # Compare with "if n in l"
    if n in d:
        count += 1

On my machine, "if n in l" took 37 seconds, while "if n in d" took 0.4 seconds.

Ruperto answered 17/10, 2008 at 15:3 Comment(3)
That's not always a good option for a couple of reasons: 1) dicts/sets take up more memory. 2) if he doesn't have much in the list, a binary search may be faster. 3) converting the list to a dict is an O(n) operation while a binary search is O(log n).Soke
As an FYI, the "set" overhead in python compared to python lists, is very very low. And they are extremely fast for lookups. Where binary search really excels is for looking up ranges.Renown
Converting the list may be O(n) but sorting the data in the list, which you'd have to do before binary searching it, is worse. Where's the data coming from, you can probably insert it into a dictionary as you go. I agree that the memory may be an issue.Zito
P
4

Dave Abrahams' solution is good. Although I have would have done it minimalistic:

def binary_search(L, x):
    i = bisect.bisect_left(L, x)
    if i == len(L) or L[i] != x:
        return -1
    return i
Polygynist answered 7/9, 2013 at 22:8 Comment(0)
G
2

Check out the examples on Wikipedia http://en.wikipedia.org/wiki/Binary_search_algorithm

def binary_search(a, key, imin=0, imax=None):
    if imax is None:
        # if max amount not set, get the total
        imax = len(a) - 1

    while imin <= imax:
        # calculate the midpoint
        mid = (imin + imax)//2
        midval = a[mid]

        # determine which subarray to search
        if midval < key:
            # change min index to search upper subarray
            imin = mid + 1
        elif midval > key:
            # change max index to search lower subarray
            imax = mid - 1
        else:
            # return index number 
            return mid
    raise ValueError
Getz answered 14/5, 2012 at 6:26 Comment(1)
v clean and conciseWulf
S
2

While there's no explicit binary search algorithm in Python, there is a module - bisect - designed to find the insertion point for an element in a sorted list using a binary search. This can be "tricked" into performing a binary search. The biggest advantage of this is the same advantage most library code has - it's high-performing, well-tested and just works (binary searches in particular can be quite difficult to implement successfully - particularly if edge cases aren't carefully considered).

Basic Types

For basic types like Strings or ints it's pretty easy - all you need is the bisect module and a sorted list:

>>> import bisect
>>> names = ['bender', 'fry', 'leela', 'nibbler', 'zoidberg']
>>> bisect.bisect_left(names, 'fry')
1
>>> keyword = 'fry'
>>> x = bisect.bisect_left(names, keyword)
>>> names[x] == keyword
True
>>> keyword = 'arnie'
>>> x = bisect.bisect_left(names, keyword)
>>> names[x] == keyword
False

You can also use this to find duplicates:

...
>>> names = ['bender', 'fry', 'fry', 'fry', 'leela', 'nibbler', 'zoidberg']
>>> keyword = 'fry'
>>> leftIndex = bisect.bisect_left(names, keyword)
>>> rightIndex = bisect.bisect_right(names, keyword)
>>> names[leftIndex:rightIndex]
['fry', 'fry', 'fry']

Obviously you could just return the index rather than the value at that index if desired.

Objects

For custom types or objects, things are a little bit trickier: you have to make sure to implement rich comparison methods to get bisect to compare correctly.

>>> import bisect
>>> class Tag(object):  # a simple wrapper around strings
...     def __init__(self, tag):
...         self.tag = tag
...     def __lt__(self, other):
...         return self.tag < other.tag
...     def __gt__(self, other):
...         return self.tag > other.tag
...
>>> tags = [Tag('bender'), Tag('fry'), Tag('leela'), Tag('nibbler'), Tag('zoidbe
rg')]
>>> key = Tag('fry')
>>> leftIndex = bisect.bisect_left(tags, key)
>>> rightIndex = bisect.bisect_right(tags, key)
>>> print([tag.tag for tag in tags[leftIndex:rightIndex]])
['fry']

This should work in at least Python 2.7 -> 3.3

Snatch answered 15/11, 2013 at 18:3 Comment(0)
H
2

I thought of using bisect_left and then checking if the item at that position is equal to what I'm searching, but that seems cumbersome (and I also need to do bounds checking if the number can be larger than the largest number in my list). If there is a nicer method I'd like to know about it.

One way to avoid bounds checks or the checking the item for equality is to run both bisect_left() and bisect_right():

def find(data, target):
    start = bisect_left(data, target)
    end = bisect_right(data, target)
    return -1 if start == end else start
Hallowmas answered 28/5, 2022 at 5:24 Comment(1)
Maybe use start for the second search. (Btw odd that nobody seems to like slicing, i.e., checking target in data[start:start+1]).Cist
S
1

Using a dict wouldn't like double your memory usage unless the objects you're storing are really tiny, since the values are only pointers to the actual objects:

>>> a = 'foo'
>>> b = [a]
>>> c = [a]
>>> b[0] is c[0]
True

In that example, 'foo' is only stored once. Does that make a difference for you? And exactly how many items are we talking about anyway?

Soulier answered 17/10, 2008 at 21:1 Comment(2)
It's about numbers and lots of them :) I'd like to use an array almost as big as the computer memory. I know the base of my problem could be wrong, but I was curious about the lack of a binary search method.Giorgia
You can't have a key object small enough to qualify as "really tiny" here. An object would have a minimum cost of 3 words (type, refcount, payload), while a list adds 1 word, a set adds 1 word, and a dict adds 2 words. All three (list/set/dict) preallocate space in some fashion as well, which is another multiplier, but still not enough to matter.Microcline
M
1

This code works with integer lists in a recursive way. Looks for the simplest case scenario, which is: list length less than 2. It means the answer is already there and a test is performed to check for the correct answer. If not, a middle value is set and tested to be the correct, if not bisection is performed by calling again the function, but setting middle value as the upper or lower limit, by shifting it to the left or right

def binary_search(intList, intValue, lowValue, highValue):
    if(highValue - lowValue) < 2:
        return intList[lowValue] == intValue or intList[highValue] == intValue
    middleValue = lowValue + ((highValue - lowValue)/2)
    if intList[middleValue] == intValue:
        return True
    if intList[middleValue] > intValue:
        return binary_search(intList, intValue, lowValue, middleValue - 1)
   return binary_search(intList, intValue, middleValue + 1, highValue)
Migrate answered 30/4, 2012 at 21:25 Comment(0)
V
1
  • s is a list.
  • binary(s, 0, len(s) - 1, find) is the initial call.
  • Function returns an index of the queried item. If there is no such item it returns -1.

    def binary(s,p,q,find):
        if find==s[(p+q)/2]:
            return (p+q)/2
        elif p==q-1 or p==q:
            if find==s[q]:
                return q
            else:
                return -1
        elif find < s[(p+q)/2]:
            return binary(s,p,(p+q)/2,find)
        elif find > s[(p+q)/2]:
            return binary(s,(p+q)/2+1,q,find)
    
Vespid answered 8/1, 2015 at 15:1 Comment(0)
I
0
'''
Only used if set your position as global
'''
position #set global 

def bst(array,taget): # just pass the array and target
        global position
        low = 0
        high = len(array)
    while low <= high:
        mid = (lo+hi)//2
        if a[mid] == target:
            position = mid
            return -1
        elif a[mid] < target: 
            high = mid+1
        else:
            low = mid-1
    return -1

I guess this is much better and effective. please correct me :) . Thank you

Infielder answered 11/5, 2012 at 16:54 Comment(0)
B
0

I needed binary search in python and generic for Django models. In Django models, one model can have foreign key to another model and I wanted to perform some search on the retrieved models objects. I wrote following function you can use this.

def binary_search(values, key, lo=0, hi=None, length=None, cmp=None):
    """
    This is a binary search function which search for given key in values.
    This is very generic since values and key can be of different type.
    If they are of different type then caller must specify `cmp` function to
    perform a comparison between key and values' item.
    :param values:  List of items in which key has to be search
    :param key: search key
    :param lo: start index to begin search
    :param hi: end index where search will be performed
    :param length: length of values
    :param cmp: a comparator function which can be used to compare key and values
    :return: -1 if key is not found else index
    """
    assert type(values[0]) == type(key) or cmp, "can't be compared"
    assert not (hi and length), "`hi`, `length` both can't be specified at the same time"

    lo = lo
    if not lo:
        lo = 0
    if hi:
        hi = hi
    elif length:
        hi = length - 1
    else:
        hi = len(values) - 1

    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if not cmp:
            if values[mid] == key:
                return mid
            if values[mid] < key:
                lo = mid + 1
            else:
                hi = mid - 1
        else:
            val = cmp(values[mid], key)
            # 0 -> a == b
            # > 0 -> a > b
            # < 0 -> a < b
            if val == 0:
                return mid
            if val < 0:
                lo = mid + 1
            else:
                hi = mid - 1
    return -1
Bette answered 23/2, 2017 at 13:18 Comment(0)
S
0
def binary_search_length_of_a_list(single_method_list):
    index = 0
    first = 0
    last = 1

    while True:
        mid = ((first + last) // 2)
        if not single_method_list.get(index):
            break
        index = mid + 1
        first = index
        last = index + 1
    return mid
Solomon answered 17/5, 2017 at 19:57 Comment(0)
R
0

Binary Search :

// List - values inside list
// searchItem - Item to search
// size - Size of list
// upperBound - higher index of list
// lowerBound - lower index of list
def binarySearch(list, searchItem, size, upperBound, lowerBound):
        print(list)
        print(upperBound)
        print(lowerBound)
        mid = ((upperBound + lowerBound)) // 2
        print(mid)
        if int(list[int(mid)]) == value:
               return "value exist"
        elif int(list[int(mid)]) < value:
             return searchItem(list, value, size, upperBound, mid + 1)
        elif int(list[int(mid)]) > value:
               return searchItem(list, value, size, mid - 1, lowerBound)

// To call above function use :

list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
searchItem = 1        
print(searchItem(list[0], item, len(list[0]) -1, len(list[0]) - 1, 0))
Rustice answered 26/6, 2017 at 13:43 Comment(0)
J
0

Many good solutions above but I haven't seen a simple (KISS keep it simple (cause I'm) stupid use of the Python built in/generic bisect function to do a binary search. With a bit of code around the bisect function, I think I have an example below where I have tested all cases for a small string array of names. Some of the above solutions allude to/say this, but hopefully the simple code below will help anyone confused like I was.

Python bisect is used to indicate where to insert an a new value/search item into a sorted list. The below code which uses bisect_left which will return the index of the hit if the search item in the list/array is found (Note bisect and bisect_right will return the index of the element after the hit or match as the insertion point) If not found, bisect_left will return an index to the next item in the sorted list which will not == the search value. The only other case is where the search item would go at the end of the list where the index returned would be beyond the end of the list/array, and which in the code below the early exit by Python with "and" logic handles. (first condition False Python does not check subsequent conditions)

#Code
from bisect import bisect_left
names=["Adam","Donny","Jalan","Zach","Zayed"]
search=""
lenNames = len(names)
while search !="none":
    search =input("Enter name to search for or 'none' to terminate program:")
    if search == "none":
        break
    i = bisect_left(names,search)
    print(i) # show index returned by Python bisect_left
    if i < (lenNames) and names[i] == search:
        print(names[i],"found") #return True - if function
    else:
        print(search,"not found") #return False – if function
##Exhaustive test cases:
##Enter name to search for or 'none' to terminate program:Zayed
##4
##Zayed found
##Enter name to search for or 'none' to terminate program:Zach
##3
##Zach found
##Enter name to search for or 'none' to terminate program:Jalan
##2
##Jalan found
##Enter name to search for or 'none' to terminate program:Donny
##1
##Donny found
##Enter name to search for or 'none' to terminate program:Adam
##0
##Adam found
##Enter name to search for or 'none' to terminate program:Abie
##0
##Abie not found
##Enter name to search for or 'none' to terminate program:Carla
##1
##Carla not found
##Enter name to search for or 'none' to terminate program:Ed
##2
##Ed not found
##Enter name to search for or 'none' to terminate program:Roger
##3
##Roger not found
##Enter name to search for or 'none' to terminate program:Zap
##4
##Zap not found
##Enter name to search for or 'none' to terminate program:Zyss
##5
##Zyss not found
Jeffiejeffrey answered 18/4, 2018 at 19:8 Comment(0)
O
0

Hello here is my python implementation without bisect. let me know if it can be improved.

def bisectLeft(a, t):
    lo = 0
    hi = len(a) - 1
    ans = None
    # print("------lower------")
    # print(a, t)
    while lo <= hi:
        mid = (lo + hi) // 2
        # print(a[lo:mid], [a[mid]], a[mid:hi])
        if a[mid] < t:
            lo = mid + 1
        elif a[mid] > t:
            hi = mid - 1
        elif a[mid] == t:
            if mid == 0: return 0
            if a[mid-1] != t: return mid
            hi = mid - 1
            
    return ans

def bisectRight(a, t):
    lo = 0
    hi = len(a) - 1
    ans = None
    # print("------upper------")
    # print(a, t)
    while lo <= hi:
        mid = (lo + hi) // 2
        # print(a[lo:mid], [a[mid]], a[mid:hi])
        if a[mid] == t:
            ans = mid
        if a[mid] <= t:
            lo = mid + 1
        else:
            hi = mid - 1
    return ans

Octaviooctavius answered 1/4, 2021 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.