Traversing heapified list
Asked Answered
I

5

5

I'm making a Monte-Carlo simulation. And as a part of this task I generate samples uniformly distributed over an interval (0,100).

generate = lambda: uniform(0,100)

The iterations stop when all the closest generated points' pairs meet the criteria.

check = lambda a,b: True if (b-a)<5 else False

I need to have some structure to effectively keep all the generated points and be able to go through them in ascending order to perform check on all the subsequent pairs.

There is a heapq module in Python which supports a very effective heap structure. And I decided to use it.

I faced a problem. I have found no traversal procedure supported by this module. The only way I found to access the values of the heap in ascending order is to use heapq.heappop. But it deletes the values from the heap.

I found the workaround for this and just copied the heap object into the new one and iterated with heappop over the new one. But I don't think it's quite effective to copy the whole structure in memory one every iteration.

Is there any other way I can go to do what I'm trying to do more effectively?


The simplified code for illustration.

import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy


def pairwise(iterable): #get values from iterator in pairs
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)


check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)


def iterate_heap(heap):
    heap = copy(heap) #Here I have to copy the heap to be able to traverse
    try:
        while True:
            yield heapq.heappop(heap)
    except IndexError:
        return


def trial():
    items = []

    for i in count():
        item = generate()
        heapq.heappush(items, item)

        it = iterate_heap(items)
        it = pairwise(it)

        if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
            return i

print "The solution is reached. It took %d iterations." % trial()

paiwise function is from recipe from here.


Update: In this implementation with heappop the complexity on each iteration is O(n*log(n)):

Copying heap: O(n)

Adding a new value to the heap: O(log(n))

Traversing: n elements * O(log(n)) on popping each value from heap -> O(n*log(n)).

Result: O(n+log(n)+n*log(n)) = O(n*log(n)

But I expect the traversal to be O(n), so the resultant complexity would be O(n).

By the way, if we use just sorted list, we would need to sort the list on each adding, so O(n*log(n)), but the traversal would be n*O(1) -> O(n). So, the resultant complexity is still O(n*log(n)).

I have found a solution. It's to use bisect module. Finding the place to add would be O(log(n)). Adding to the list is of O(n) (because of the implementation all the values after the insertion in place have to be moved). Traversing is O(n). So, the resultant complexity is O(n).

Still, I wounder, if there is a way to solve this task using heaps in Python.

Impolite answered 29/10, 2011 at 19:0 Comment(6)
Random note: True if (b-a)<5 else False is equivalent to just (b - a) < 5.Haematinic
Sounds like you want a en.wikipedia.org/wiki/B-treeHebephrenia
@JochenRitzel Yeah, you are right. I confuse 'heap' with 'binary search tree'. So, yes, there is no reason to use heap.Impolite
your calculations about bisect are wrong. Constructing a sorted list of size n with bisect takes O(log(n)) per insert, so it's O(nlog(n)) too. Multiply that with the overhead of moving parts of the list then it's even O(nn*log(n))Hebephrenia
@JochenRitzel I generate a random value and then find a place of insertion using bisect with O(log(n)) complexity, then I insert it with O(n) complexity. Then I just go through the list from left to right (because the values are already sorted) and do some calculations with value pairs. All these complexities are added up (not multiplied) since these operations are not nested (they are consecutive).Impolite
You're right, it takes O(logn+n) to find and insert. Still, doing that n times brings you to O(nn) complexity. A B-Tree does it in O(nlogn).Hebephrenia
D
6

I would use list.sort() on the heap. That leaves the heap condition intact and makes it possible to iterate over the underlying list directly.

FWIW, the Timsort algorithm used by list.sort will take advantage of the partial ordering that already exists in the heap.

Desman answered 29/10, 2011 at 19:50 Comment(7)
Why then to use heaps at all. Why can't I just append a new value to the list and then sort in place? I updated the question. I wary of the complexity to become O(n*log(n)). I just wanted O(n).Impolite
What is the complexity of Timsort if I add the value to the end of the list and then sort it? If it's O(n), then it would turn out 'The simpler the better': just to append to list and then sort it in place. :o)Impolite
Heaps are for when you are doing period updates and want to access only the lowest values. Otherwise, if you need a full sort and access to all values then sort() is preferred. I provided an answer using heaps because that was the question you asked.Desman
Thanks for the answer. I confused 'binary search tree' with 'heap'. If I use list, I would need to sort it on each iteration. Will Timsort have the complexity of O(n) in case of sorted list with appended one non-sorted value, or it will still have O(n*log(n))?Impolite
I decided to use bisect. It has only one disadvantage that O(log n) search is dominated by the slow O(n) insertion step.Impolite
That is pretty significant disadvantage.Desman
If I stick with Timsort, nothing will change regarding complexity. Hopefully Timsort would use Insertion sort, then the complexity will be the same O(n) (because of insertion), and if it uses Mergesort, then the complexity would become O(n*log(n)). So Timsort is no better than bisect-insertion. Binary search tree would be better (since adding the value to the tree would be O(log(n)), but traversing the tree would be still O(n) (so the overal complexity would be O(n) + O(log(n)) = O(n). And I haven't found a good implementation of this structure in Python.Impolite
C
4

From the python docs:

These two make it possible to view the heap as a regular Python list without surprises: heap[0] is the smallest item, and heap.sort() maintains the heap invariant!

Is there a reason you can't just treat the heap as a list and iterate over it?

Comma answered 29/10, 2011 at 19:14 Comment(6)
Heap is not sorted. heap[0] is the smallest value indeed. But I need to go through the values in ascending order, so I would need to sort the heap on after adding each value which is O(nlog(n)). But when the heap is not sorted, I can only use heappop to access the smallest value. So, I expect O(n) complexity(O(n) on heap traversal and O(log(n)) on adding value to heap), not O(nlog(n)) (O(n) on list traversal (just iterating) and O(n*log(n)) on adding.Impolite
Hmm. In my implementation with heappop the complexity is O(n*log(n)) : n elements to pop with O(log(n) complexity of popping each element. I expected the complexity to be O(n). So, yes, I definitely need to think up something without heappop.Impolite
@Impolite Another way to say it -- even after you heapify(some_list), type(some_list) is still <type 'list'>. So to go through the items in ascending order, just use for item in heap and for descending order use for item in reversed(heap).Haematinic
@Haematinic Let's take a list els = [1,2,3,4]. Let's add 2 to els: heapq.heappush(els ,2) -> els: [1, 2, 3, 4, 2]. Then heapify els: heapq.heapify(els) -> els: [1, 2, 3, 4, 2]. So, els is not sorted. It's just heapified. This is the reason why the complexity of heapifying is O(log(n)), not O(n*log(n)) as in sorting.Impolite
@Impolite I assumed you meant "maintained the heap invariant" when you said "sort". If you need them to be sorted, it doesn't sound like heapq is actually going to help?Haematinic
@Haematinic I need them to be sorted. But I decided not to use sorted list and use a heap instead. I expected it to give me access to the values in ascending order. But I just now understood that with using heappop the complexity is not O(n) but O(n*log(n)).Impolite
I
3

I have made some efficiency calculations.

The best performance is achieved with using bisect module: 10000 insertions in the middle of the list clocked 0.037 sec on my computer (Python 2.7).

With using sortedlist from blist module clocked 0.287 sec for the same amount of insertions.

And using a traditional list with sort applyed after each append clocked 2.796 sec. (Now Timsort algorithm is used in Python and it is argued to be very efficient on nearly sorted list; still it turns out to be not that efficient as using bisect).


The code I used to make these calculations:

import bisect
import timeit
import __main__
import blist

N = 10000 #Number of executions
L = 1000 #Length of initial list

def test_f_bisect(a):
    bisect.insort_right(a,500)


def test_f_list_sort(a):
    a.append(500)
    a.sort()


test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
    a.add(500)


names = dir(__main__)
for name in names:
    attr = getattr(__main__,name)
    if hasattr(attr,'__call__'):
        if name.startswith('test_f_'):
            init_name = name + '_init'
            if hasattr(__main__, init_name):
                init = getattr(__main__,init_name)
            else:
                init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
            t = timeit.Timer(stmt='{name}(a)'.format(name=name),
                             setup=init)

            time = t.timeit(N)
            print('{name}: {time}'.format(name=name,time=time))
Impolite answered 26/11, 2011 at 13:34 Comment(0)
H
1

For the record, the right data structure in this case is a B-Tree. There is a implementation:

 from blist import sortedlist

The runtime complexity is as low as it gets: O(n*logn) to construct the list, O(n) to iterate.

Hebephrenia answered 29/10, 2011 at 22:6 Comment(3)
Thanks! It's definitely better than just list and bisect. Because iterating is the same O(n), but blist has O(log(n)) complexity of adding the item to list, and corny list+bisect has O(log(n)+n) -> O(n) complexity.Impolite
The module is a great work. It even gives the complexities in the docsImpolite
I have made some time comparisons. bisect turns out to be the most efficient. See my answer for the details.Impolite
G
0

I created an Iterator class that will perform a lazy in-order traversal of a min heap. It has the following advantages:

  1. Doesn't require a copy of the original heap
  2. Doesn't modify the original heap
  3. Lazy iteration is more efficient if stopping early

To keep track the next items for iteration, I actually just used another heap self.next_items.

import heapq

class HeapIter:

    def __init__(self, heap):
        self.original_heap = heap
        self.next_items = []
        if len(self.original_heap) > 0:
            self.next_items.append((self.original_heap[0], 0))

    def current_element(self):
        if len(self.next_items) == 0:
            return None
        return self.next_items[0][0]

    def next(self):
        if len(self.next_items) == 0:
            return None
        next_elem, next_index = heapq.heappop(self.next_items)
        child_1 = 2 * next_index + 1
        child_2 = child_1 + 1
        if child_1 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
        if child_2 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
        return next_elem
Glide answered 16/4, 2021 at 5:48 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.