Find where f(x) changes in a list, with bisection (in Python)

Asked 8/2, 2017 at 17:23 Answered 8/2, 2017 at 17:33

Solved python bisection git-bisect bisect

Reasoning: I'm trying to implement, in Python, something similar to git bisect, but with basically a list of directories.

I have a (long) list of version numbers like this: ['1.0', '1.14', '2.3', '3.1', '4']

I have a function works() which takes a version number, and returns a value.

[works(x) for x in my_list] would look like: ['foo', 'foo', 'foo', 'bar', 'bar'] ... but running works() is very expensive.

I would like to do some kind of bisect which will find the change boundary.

Economically answered 8/2, 2017 at 17:23 Comment(1)

Why is this question downvoted? – Aparri 8/2, 2017 at 19:57

You could simply use binary search:

def binary_f(f,list):
    frm = 0
    to = len(list)
    while frm < to:
        mid = (frm+to)>>1
        if f(list[mid]):
            to = mid
        else:
            frm = mid+1
    return frm

It will return the first index i for which bool(f(list[i])) is True.

Of course the function assumes that the the map of f on the list is of the form:

f(list) == [False,False,...,False,True,True,...,True]

If this is not the case, it will usually find a swap but which one is rather undefined.

Say f is simply "the version is 2 or higher" so lambda v:v >= '2', then it will return:

>>> binary_f(lambda v:v >= '2',['1.0', '1.14', '2.3', '3.1', '4'])
2

So index 2. In case the entire list would return with False objects, it will **return len(list). Since it "assumes" the element just outside the list will be evaluated to True:

>>> binary_f(lambda v:v >= '4.2',['1.0', '1.14', '2.3', '3.1', '4'])
5

Of course in your example f is works.

Experiments:

>>> binary_f(lambda v:v >= '2',['1.0', '1.14', '2.3', '3.1', '4'])
2
>>> binary_f(lambda v:v >= '0',['1.0', '1.14', '2.3', '3.1', '4'])
0
>>> binary_f(lambda v:v >= '1',['1.0', '1.14', '2.3', '3.1', '4'])
0
>>> binary_f(lambda v:v >= '1.13',['1.0', '1.14', '2.3', '3.1', '4'])
1
>>> binary_f(lambda v:v >= '2.4',['1.0', '1.14', '2.3', '3.1', '4'])
3
>>> binary_f(lambda v:v >= '3',['1.0', '1.14', '2.3', '3.1', '4'])
3
>>> binary_f(lambda v:v >= '3.2',['1.0', '1.14', '2.3', '3.1', '4'])
4
>>> binary_f(lambda v:v >= '4.2',['1.0', '1.14', '2.3', '3.1', '4'])
5

(I here of course did a very cheap version check, but it works of course for more sophisticated predicates).

Since this is binary search, it will run in O(log n) with n the number of items in the list whereas linear search can result in O(n) checks (which is usually more expensive).

EDIT: in case the list contains two values and you want to find the swap, you can simply first compute the value for index 0:

val0 = f(list[0])

and then provide binary_f:

binary_f(lambda v:works(v) != val0,list)

Or putting it into a nice function:

def binary_f_val(f,list):
    val0 = f(list[0])
    return binary_f(lambda x:f(x) != val0,list)

Aparri answered 8/2, 2017 at 17:28 Comment(14)

Isn't this overkill? list.index() does the same thing, does it not? – Frottage 8/2, 2017 at 17:42

it does but with O(n) speed, so with big lists it's much slower. – Marlowe 8/2, 2017 at 17:44

@rshield: If you read the question carefully, you see that works is an expensive operation. It could mean you run an entire testbanch against it. Now imagine there are hundreds of versions you have to check. In that case, this algorithm will perform 7-8 checks, whereas next will on average check 50 cases. If each testbench takes an hour. I would be glad to only need to wait 8 hours instead of half a week. – Aparri 8/2, 2017 at 17:44

@WillemVanOnsem I stand corrected. I see why binary is the best solution here. – Frottage 8/2, 2017 at 17:53

This is great, but I actually misstated the problem a little bit, I just realized. See update. – Economically 8/2, 2017 at 18:21

@dmd: that does not change much: you can simply use lambda x : works(x) == 'bar' as function. – Aparri 8/2, 2017 at 18:25

oh, I see now how I can do this - I should just feed works() neighboring pairs, thus reducing the problem to false/true. – Economically 8/2, 2017 at 18:25

@WillemVanOnsem but I don't actually know what foo and bar are. but my prev comment stands. – Economically 8/2, 2017 at 18:25

@dmd: yeah indeed. As long as you somehow can convert it into a predictate (and the predicate swaps once from False to True it works). – Aparri 8/2, 2017 at 18:25

@dmd: you can simply first inspect the borders of the list: look what val = works(list[0]) is and then feed it like lambda x:works(x) != val. – Aparri 8/2, 2017 at 18:26

I think there is a module that can do the binary search for you – Coady 8/2, 2017 at 18:43

use bisect to do that for you. But there's a little catch: a version list like ['1.0', '1.14', '2.3', '3.1', '4', "11.4"] appears sorted, but insertion fails because numerical sort isn't version sort. For instance 10.1 is inserted in second position... – Marlowe 8/2, 2017 at 19:54

I wonder if you could reuse the Python bisect module for this. You'd probably have to define a custom class with a __getitem__ that calls works() lazily. – Sign 17/2, 2017 at 11:45

@MariusGedminas: you can indeed construct something as a maplist that is a "virtual list" and lazily calls the mapping function. If I find some time after hours I will try to add it to the answer. – Aparri 17/2, 2017 at 11:46

So you basically want to implement binary search algorithm ... this is pretty straight forward, the rough draft of algorithm is below. I haven't tested it, but you should get the idea and take care of edge cases when your version list of length 1 or 2:

def whereWorks(versions, works):

   middle = len(versions)/2

   good = works(versions[middle])

   if middle < 2:
       return good ? 0 : 1

   if works(middle):
         return whereWorks(versions[0:middle])
   else
         return whereWorks(versions[middle:])+middle

Quiver answered 8/2, 2017 at 17:33 Comment(0)

-1

That is what next() is for.

result = next(x for x in my_list if works(x))

A faster way but a more complicated one would be:

alist = [0,0,0,0,0,0,1]

def check(my_list, tracking=0):

    def criterion(i):
        return bool(i)

    if len(my_list) == 1:
        if my_list[0] == 1:
            return tracking
        else:
            return tracking + 1

    start = len(my_list) // 2

    if criterion(my_list[start]):
        return check(my_list[:start], tracking=tracking)
    else:
        tracking += start + 1
        return check(my_list[start+1:], tracking=tracking)

print(check(alist))  # returns 6

Which is a bisection method. Cuts the list recursively in half, checks the element in the middle and moves the the slice on the left if it is 1 or on the right if it is a 0. The tracking tracks the index. I would love to have a timeit by someone if he\she has the time.

Hardwick answered 8/2, 2017 at 17:25 Comment(1)

@WillemVanOnsem it is still not binary but operates at log(n) – Hardwick 8/2, 2017 at 18:4

Recommended topics

Hot tags