How to use bisect.insort_left with a key?
Asked Answered
B

6

53

Doc's are lacking an example...How do you use bisect.insort_left)_ based on a key?

Trying to insert based on key.

bisect.insort_left(data, ('brown', 7))

puts insert at data[0].

From docs...

bisect.insort_left(a, x, lo=0, hi=len(a))

    Insert x in a in sorted order. This is equivalent to a.insert(bisect.bisect_left(a, x, lo, hi), x) assuming that a is already sorted. Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

Sample usage:

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data]         # precomputed list of keys
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)
>>>

I want to put ('brown', 7) after ('red', 5) on sorted list in data using bisect.insort_left. Right now bisect.insort_left(data, ('brown', 7)) puts ('brown', 7) at data[0]...because I am not using the keys to do insert...docs don't show to do inserts using the keys.

Bitt answered 27/12, 2014 at 23:43 Comment(9)
What is your question?Aphrodite
Be careful with this it's an O(N) operation, check if you really need it first. Have you considered heapq or just calling list.sort before printing if list isn't sortedErnst
Yes, heapq does not work well for remove a node mid tree. so, this may be best for meBitt
@Bitt What's the general problem you are solving?Ernst
I want to put ('brown', 7) after ('red', 5) on sorted list in data using bisect.insort_left. Right now bisect.insort_left(data, ('brown', 7)) puts ('brown', 7) at data[0]...because I am not using the keys to do insert...doc dont show to do inserts using the keys.Bitt
The docs suggest using the SortedCollection recipe that has support for a key-function, which the bisect module doesn't support. The insert() method in the recipe's class looks like it might do what you want (given the scaffolding the class provides).Zelig
@martineau, I am using the word 'key' loosely because that is what the docs refer to column data[1] as.Bitt
@martineau, thanks for reference...Now really want to know how to do this....Bitt
You're using 'key' the way it's usually used wrt to sorting (and the way the docs and recipe are using it). For example with list.sort() and sorted() it's often given as key=lambda x: x[1] to allow you to specify what part of each item is the value to sort upon. Don't have the time right now, but I'll see if I can work up an example for you later...Zelig
Z
20

This does essentially the same thing the SortedCollection recipe does that the bisect documentation mentions in its See also: section at the end, but unlike the insert() method in the recipe, the function shown supports a key-function.

What's being done is a separate sorted keys list is maintained in parallel with the sorted data list to improve performance (it's faster than creating the keys list before each insertion, but keeping it around and updating it isn't strictly required). The ActiveState recipe encapsulated this for you within a class, but in the code below they're just two separate independent lists being passed around (so it'd be easier for them to get out of sync than it would be if they were both held in an instance of the recipe's class).

from bisect import bisect_left

def insert(seq, keys, item, keyfunc=lambda v: v):
    """Insert an item into a sorted list using a separate corresponding
       sorted keys list and a keyfunc() to extract the key from each item.

    Based on insert() method in SortedCollection recipe:
    http://code.activestate.com/recipes/577197-sortedcollection/
    """
    k = keyfunc(item)  # Get key.
    i = bisect_left(keys, k)  # Determine where to insert item.
    keys.insert(i, k)  # Insert key of item to keys list.
    seq.insert(i, item)  # Insert the item itself in the corresponding place.

# Initialize the sorted data and keys lists.
data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
data.sort(key=lambda r: r[1]) # Sort data by key value
keys = [r[1] for r in data]   # Initialize keys list
print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]

insert(data, keys, ('brown', 7), keyfunc=lambda x: x[1])
print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]

Follow-on question:
    Can bisect.insort_left be used?

No, you can't simply use the bisect.insort_left() function to do this because it wasn't written in a way that supports a key-function—instead it just compares the whole item passed to it to insert, x, with one of the whole items in the array in its if a[mid] < x: statement. You can see what I mean by looking at the source for the bisect module in Lib/bisect.py.

Here's the relevant excerpt:

def insort_left(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.

    If x is already in a, insert it to the left of the leftmost x.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    a.insert(lo, x)

You could modify the above to accept an optional key-function argument and use it:

def my_insort_left(a, x, lo=0, hi=None, keyfunc=lambda v: v):
    x_key = keyfunc(x)  # Get comparison value.
    . . .
        if keyfunc(a[mid]) < x_key: # Compare key values.
            lo = mid+1
    . . .

...and call it like this:

my_insort_left(data, ('brown', 7), keyfunc=lambda v: v[1])

Actually, if you're going to write a custom function, for the sake of more efficiency at the expense of unneeded generality, you could dispense with the adding of a generic key function argument and just hardcode everything to operate the way needed with the data format you have. This will avoid the overhead of repeated calls to a key-function while doing the insertions.

def my_insort_left(a, x, lo=0, hi=None):
    x_key = x[1]   # Key on second element of each item in sequence.
    . . .
        if a[mid][1] < x_key: lo = mid+1  # Compare second element to key.
    . . .

...called this way without passing keyfunc:

my_insort_left(data, ('brown', 7))
Zelig answered 28/12, 2014 at 1:45 Comment(6)
THANKS, can bisect.insort_left be used?Bitt
You could probably use it to insert the key of the new item into the keys list, but not the item itself into the data list (because it doesn't support a key-function and would use the whole item as the key, and since the item is a tuple it would sort by the string value in it first).Zelig
Since data is a list, and lists are mutable sequences, you can insert items into them at arbitrary indexes using data.insert(i, x). The code in the insert() function in my answer does this twice, once to insert the key value into the keys list, and again to also insert the whole item at the same relative position of the data list. Why are you so adamant about using bisect.insort_left() -- is using it a homework assignment or something?Zelig
its not homework!!! I just want to avoid sorting long lists containing tuples repeatedly, Hoping this method would work...Bitt
In that case you can use my original answer or make a trivial change to the bisect.insort_left() source shown in the follow-up I added -- it'd take literally the changing of a couple of lines plus the adding of an argument to the function def to make it do what you want. Note that while using either avoids repeatedly sorting the list, it's still expensive, O(n), to insert elements into sorted arrays.Zelig
It's perfectly possible to use bisect.insort_left. See the other answer.Cull
B
35

You could wrap your iterable in a class that implements __getitem__ and __len__. This allows you the opportunity to use a key with bisect_left. If you set up your class to take the iterable and a key function as arguments.

To extend this to be usable with insort_left it's required to implement the insert method. The problem here is that if you do that is that insort_left will try to insert your key argument into the list containing the objects of which the the key is a member.

An example is clearer

from bisect import bisect_left, insort_left


class KeyWrapper:
    def __init__(self, iterable, key):
        self.it = iterable
        self.key = key

    def __getitem__(self, i):
        return self.key(self.it[i])

    def __len__(self):
        return len(self.it)

    def insert(self, index, item):
        print('asked to insert %s at index%d' % (item, index))
        self.it.insert(index, {"time":item})

timetable = [{"time": "0150"}, {"time": "0250"}, {"time": "0350"}, {"time": "0450"}, {"time": "0550"}, {"time": "0650"}, {"time": "0750"}]

bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")

islindex = insort_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")

See how in my insert method I had to make it specific to the timetable dictionary otherwise insort_left would try insert "0359" where it should insert {"time": "0359"}?

Ways round this could be to construct a dummy object for the comparison, inherit from KeyWrapper and override insert or pass some sort of factory function to create the object. None of these ways are particularly desirable from an idiomatic python point of view.

So the easiest way is to just use the KeyWrapper with bisect_left, which returns you the insert index and then do the insert yourself. You could easily wrap this in a dedicated function.

e.g.

bslindex = bisect_left(KeyWrapper(timetable, key=lambda t: t["time"]), "0359")
timetable.insert(bslindex, {"time":"0359"})

In this case ensure you don't implement insert, so you will be immediately aware if you accidentally pass a KeyWrapper to a mutating function like insort_left which probably wouldn't do the right thing.

To use your example data

from bisect import bisect_left


class KeyWrapper:
    def __init__(self, iterable, key):
        self.it = iterable
        self.key = key

    def __getitem__(self, i):
        return self.key(self.it[i])

    def __len__(self):
        return len(self.it)

data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
data.sort(key=lambda c: c[1])

newcol = ('brown', 7)

bslindex = bisect_left(KeyWrapper(data, key=lambda c: c[1]), newcol[1])
data.insert(bslindex, newcol)

print(data)

Here is the class with proper typing:

from typing import TypeVar, Generic, Sequence, Callable


T = TypeVar('T')
V = TypeVar('V')


class KeyWrapper(Generic[T, V]):
    def __init__(self, iterable: Sequence[T], key: Callable[[T], V]):
        self.it = iterable
        self.key = key

    def __getitem__(self, i: int) -> V:
        return self.key(self.it[i])

    def __len__(self) -> int:
        return len(self.it)

Bouillon answered 15/9, 2016 at 0:19 Comment(5)
This is excellent and didn't get the love it deserves. It's concise and much more efficient than any other alternative I've seen. If you know that data is already sorted in the correct order, there's no need to calculate the key for each element. The whole point of a binary search is to get O(log n) instead of O(n). What's the point if you have to calculate the key for each element first?Cull
Second the comment about how excellent this answer is. My use case is a big query result from django with everything sorted by certain fields. I want to break the result into chunks based on when the most significant of the fields change, and all I really need is the indices of when this happens. bisect + the KeyWrapper approach lets me do exactly this very efficiently.Polychromatic
This was fixed in Python 3.10. Added key param to bisect_left, etc.Opia
Interesting solution. You basically create a Sequence, may I suggest the class is a sublass of Sequence?Savage
Thanks I'll take a look at adding it.Bouillon
Z
20

This does essentially the same thing the SortedCollection recipe does that the bisect documentation mentions in its See also: section at the end, but unlike the insert() method in the recipe, the function shown supports a key-function.

What's being done is a separate sorted keys list is maintained in parallel with the sorted data list to improve performance (it's faster than creating the keys list before each insertion, but keeping it around and updating it isn't strictly required). The ActiveState recipe encapsulated this for you within a class, but in the code below they're just two separate independent lists being passed around (so it'd be easier for them to get out of sync than it would be if they were both held in an instance of the recipe's class).

from bisect import bisect_left

def insert(seq, keys, item, keyfunc=lambda v: v):
    """Insert an item into a sorted list using a separate corresponding
       sorted keys list and a keyfunc() to extract the key from each item.

    Based on insert() method in SortedCollection recipe:
    http://code.activestate.com/recipes/577197-sortedcollection/
    """
    k = keyfunc(item)  # Get key.
    i = bisect_left(keys, k)  # Determine where to insert item.
    keys.insert(i, k)  # Insert key of item to keys list.
    seq.insert(i, item)  # Insert the item itself in the corresponding place.

# Initialize the sorted data and keys lists.
data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
data.sort(key=lambda r: r[1]) # Sort data by key value
keys = [r[1] for r in data]   # Initialize keys list
print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]

insert(data, keys, ('brown', 7), keyfunc=lambda x: x[1])
print(data)  # -> [('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]

Follow-on question:
    Can bisect.insort_left be used?

No, you can't simply use the bisect.insort_left() function to do this because it wasn't written in a way that supports a key-function—instead it just compares the whole item passed to it to insert, x, with one of the whole items in the array in its if a[mid] < x: statement. You can see what I mean by looking at the source for the bisect module in Lib/bisect.py.

Here's the relevant excerpt:

def insort_left(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.

    If x is already in a, insert it to the left of the leftmost x.

    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    a.insert(lo, x)

You could modify the above to accept an optional key-function argument and use it:

def my_insort_left(a, x, lo=0, hi=None, keyfunc=lambda v: v):
    x_key = keyfunc(x)  # Get comparison value.
    . . .
        if keyfunc(a[mid]) < x_key: # Compare key values.
            lo = mid+1
    . . .

...and call it like this:

my_insort_left(data, ('brown', 7), keyfunc=lambda v: v[1])

Actually, if you're going to write a custom function, for the sake of more efficiency at the expense of unneeded generality, you could dispense with the adding of a generic key function argument and just hardcode everything to operate the way needed with the data format you have. This will avoid the overhead of repeated calls to a key-function while doing the insertions.

def my_insort_left(a, x, lo=0, hi=None):
    x_key = x[1]   # Key on second element of each item in sequence.
    . . .
        if a[mid][1] < x_key: lo = mid+1  # Compare second element to key.
    . . .

...called this way without passing keyfunc:

my_insort_left(data, ('brown', 7))
Zelig answered 28/12, 2014 at 1:45 Comment(6)
THANKS, can bisect.insort_left be used?Bitt
You could probably use it to insert the key of the new item into the keys list, but not the item itself into the data list (because it doesn't support a key-function and would use the whole item as the key, and since the item is a tuple it would sort by the string value in it first).Zelig
Since data is a list, and lists are mutable sequences, you can insert items into them at arbitrary indexes using data.insert(i, x). The code in the insert() function in my answer does this twice, once to insert the key value into the keys list, and again to also insert the whole item at the same relative position of the data list. Why are you so adamant about using bisect.insort_left() -- is using it a homework assignment or something?Zelig
its not homework!!! I just want to avoid sorting long lists containing tuples repeatedly, Hoping this method would work...Bitt
In that case you can use my original answer or make a trivial change to the bisect.insort_left() source shown in the follow-up I added -- it'd take literally the changing of a couple of lines plus the adding of an argument to the function def to make it do what you want. Note that while using either avoids repeatedly sorting the list, it's still expensive, O(n), to insert elements into sorted arrays.Zelig
It's perfectly possible to use bisect.insort_left. See the other answer.Cull
C
10

Add comparison methods to your class

Sometimes this is the least painful way, especially if you already have a class and just want to sort by a key from it:

#!/usr/bin/env python3

import bisect
import functools

@functools.total_ordering
class MyData:
    def __init__(self, color, number):
        self.color = color
        self.number = number
    def __lt__(self, other):
        return self.number < other.number
    def __str__(self):
        return '{} {}'.format(self.color, self.number)

mydatas = [
    MyData('red', 5),
    MyData('blue', 1),
    MyData('yellow', 8),
    MyData('black', 0),
]
mydatas_sorted = []
for mydata in mydatas:
    bisect.insort(mydatas_sorted, mydata)
for mydata in mydatas_sorted:
    print(mydata)

Output:

black 0
blue 1
red 5
yellow 8

See also: "Enabling" comparison for classes

Tested in Python 3.5.2.

Upstream requests/patches

I get the feeling this is going to happen sooner or later ;-)

Contraption answered 5/3, 2019 at 16:29 Comment(1)
This makes a lot of sense and involves a lot less messing around. It seems like the way to go unless you can't or wont change the class.Bouillon
H
10

As of Python 3.10, all the binary search helpers in the bisect module now accept a key argument:

key specifies a key function of one argument that is used to extract a comparison key from each input element. The default value is None (compare the elements directly).

Therefore, you can pass the same function you used to sort the data:

>>> import bisect
>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> data
[('black', 0), ('blue', 1), ('red', 5), ('yellow', 8)]
>>> bisect.insort_left(data, ('brown', 7), key=lambda r: r[1])
>>> data
[('black', 0), ('blue', 1), ('red', 5), ('brown', 7), ('yellow', 8)]
Hewet answered 20/10, 2021 at 13:57 Comment(0)
C
7

If your goal is to mantain a list sorted by key, performing usual operations like bisect insert, delete and update, I think sortedcontainers should suit your needs as well, and you'll avoid O(n) inserts.

Chifley answered 18/6, 2016 at 9:4 Comment(3)
Specific to this question: sortedcontainers.SortedList includes bisect_key* methodsMaice
@Maice link from comment is deadWinnifredwinning
Updated link, sortedcontainers.SortedKeyList includes bisect_key_left and right: grantjenks.com/docs/sortedcontainers/…Maice
S
3

From python version 3.10, the key argument has been added.

It will be something like:

import bisect
bisect.bisect_left(('brown', 7), data, key=lambda r: r[1])

Sources:

Savage answered 5/5, 2022 at 12:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.