How to remove a specific element in a heap without losing heap properties in Python?
Asked Answered
P

4

7

I'm trying to implement an algorithm to solve the skyline problem that involves removing specific elements from the middle of a max heap. The way I currently do it is maxheap.remove(index) but I have to follow up with a heapify(maxheap) otherwise the order is thrown off. I know in java you can use something like a treemap to do that. Is there anyway to do that in python more efficiently than calling two separate methods each of which takes O(n) time?

Psychometry answered 31/10, 2017 at 17:26 Comment(2)
What implementation of heap do you refer? If it contains remove operator (that denotes explicit access to heap elements), it might also contain change priority one.Relationship
@Relationship In Python the obvious implementation is heapq which is in the standard library.Illbred
I
14

Removing an arbitrary item from a heap is an O(log n) operation, provided you know where the item is in the heap. The algorithm is:

Move the last item in the heap to the position that contains the item to remove.
Decrement heap count.
If the item is smaller than its parent
    bubble it up the heap
else
    sift it down the heap

The primary problem is finding the item's position in the heap. As you've noted, doing so is an O(n) operation unless you maintain more information.

An efficient solution to this is to create a dictionary that contains the item key, and the value is the index of that item in the heap. You have to maintain the dictionary, however:

  1. When you insert an item into the heap, add a dictionary entry
  2. When you remove an item from the heap, remove the dictionary entry
  3. Whenever you change the position of an item in the heap, update that item's value in the dictionary.

With that dictionary in place, you have O(1) access to an item's position in the heap, and you can remove it in O(log n).

Impracticable answered 31/10, 2017 at 17:59 Comment(0)
I
0

I would have have each element be a data structure with a flag for whether to ignore it. When you heappop, you'll just pop again if it is an element that got flagged. This is very easy, obvious, and involves knowing nothing about how the heap works internally. For example you don't need to know where the element actually is in the heap to flag it.

The downside of this approach is that the flagged elements will tend to accumulate over time. Occasionally you can just filter them out then heapify.

If this solution is not sufficient for your needs, you should look for a btree implementation in Python of some sort. That will behave like the treemap that you are used to in Java.

Illbred answered 31/10, 2017 at 17:53 Comment(0)
C
0

Yes, there is a more efficient way - if you have its index or pointer (depending on implementation method).

Replace the number on the index/pointer you need to remove with its largest child and repeat the process recursively (replace the child with its largest child,etc...) until you get to a node that has no children, which you remove easily.

The complexity of this algorithm is O(log n).

http://algorithms.tutorialhorizon.com/binary-min-max-heap/

Clawson answered 31/10, 2017 at 18:42 Comment(0)
S
0

This is an old question, but it feels incomplete without code. I have implemented a happypath heap with the ability to remove an element based on its value.

Hope this will be useful to someone who looks here later.

from collections import defaultdict


class HeapSet:
    def __init__(self):
        self.heap = []
        self.positions = defaultdict(set)

    def _swap(self, i: int, j: int):
        obj = self.heap[i]
        swap_with_obj = self.heap[j]
        if obj == swap_with_obj:
            return

        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]

        self.positions[obj].add(j)
        self.positions[obj].remove(i)

        self.positions[swap_with_obj].add(i)
        self.positions[swap_with_obj].remove(j)

    def _sink(self, i: int):
        while i < self.size():
            swap_with = i
            if i * 2 + 1 < self.size() and self.heap[swap_with] > self.heap[i * 2 + 1]:
                swap_with = i * 2 + 1
            if i * 2 + 2 < self.size() and self.heap[swap_with] > self.heap[i * 2 + 2]:
                swap_with = i * 2 + 2
            if swap_with == i:
                break
            else:
                self._swap(i, swap_with)
                i = swap_with

    def _swim(self, i: int):
        while i > 0:
            swap_with = i
            if (i - 1) // 2 >= 0 and self.heap[swap_with] < self.heap[(i - 1) // 2]:
                swap_with = (i - 1) // 2
            if swap_with == i:
                break
            else:
                self._swap(i, swap_with)
                i = swap_with

    def add(self, obj):
        self.heap.append(obj)
        self.positions[obj].add(self.size() - 1)
        self._swim(self.size() - 1)

    def remove(self, obj):
        pos = next(iter(self.positions[obj]))
        self._swap(pos, self.size() - 1)
        self.positions[obj].remove(self.size() - 1)
        self.heap.pop()

        if pos != self.size():
            self._sink(pos)
            self._swim(pos)

    def get_top(self):
        if not self.heap:
            return None

        return self.heap[0]

    def size(self):
        return len(self.heap)
Stranger answered 29/10, 2023 at 20:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.