How to delete in a heap data structure?
Asked Answered
U

6

68

I understand how to delete the root node from a max heap but is the procedure for deleting a node from the middle to remove and replace the root repeatedly until the desired node is deleted?

  1. Is O(log n) the optimal complexity for this procedure?

  2. Does this affect the big O complexity since other nodes must be deleted in order to delete a specific node?

Urus answered 2/1, 2012 at 20:38 Comment(5)
why would you want to delete a node in the middle in a max heap?Monegasque
@BrokenGlass: One very real use of such a thing is a heap representation of a priority queue of scheduled jobs, and somebody cancels one of the jobs.Propulsion
@BrokenGlass: I recently implemented the LPA* pathfinding algorithm, a replanning algorithm based on A*. It required the ability to remove from the middle of the priority-queue.Sheltonshelty
In general, you want to post a new question rather than add new questions to a four year old post. But to answer your questions: 1) In a standard binary heap, O(log n) is the optimal complexity. 2) Deleting from the middle of the heap will never be more expensive than deleting the root node, and that operation is already proven to be O(log n). O(log n) is the worst case complexity for deleting a node anywhere in the heap. Note, however, that what I pointed out in my original answer still remains true: it takes O(n) to find the node to delete.Propulsion
mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/…Felecia
P
106

Actually, you can remove an item from the middle of a heap without trouble.

The idea is to take the last item in the heap and, starting from the current position (i.e. the position that held the item you deleted), sift it up if the new item is greater than the parent of the old item. If it's not greater than the parent, then sift it down.

That's the procedure for a max heap. For a min heap, of course, you'd reverse the greater and less cases.

Finding an item in a heap is an O(n) operation, but if you already know where it is in the heap, removing it is O(log n).

I published a heap-based priority queue for DevSource a few years back. The full source is at http://www.mischel.com/pubs/priqueue.zip

Update

Several have asked if it's possible to move up after moving the last node in the heap to replace the deleted node. Consider this heap:

        1
    6       2
  7   8   3

If you delete the node with value 7, the value 3 replaces it:

        1
    6       2
  3   8

You now have to move it up to make a valid heap:

        1
    3       2
  6   8

The key here is that if the item you're replacing is in a different subtree than the last item in the heap, it's possible that the replacement node will be smaller than the parent of the replaced node.

Propulsion answered 2/1, 2012 at 23:23 Comment(8)
I'm not quite sure I understand you. Are you saying that we remove the arbitrary element and replace its position with the last element of the heap tree, and either sift up or down? I doubt that this what you're saying because 1) It cannot sift up ever 2) The tree might not be complete any more... Can you elaborate? Thank you.Cuneiform
@RoronoaZoro: That's exactly what I'm saying. And it does indeed work. Take a look at my sample code, or look at any heap implementation that allows removal.Propulsion
@RoronoaZoro: It can actually sift up because it might not be part of the same tree. Picture the root -- at the very beginning the tree is split into two independent parts. You might be deleting a node from the left subtree, but if you take the last item from the last level of the heap that might be an item from the right subtree. There isn't ordering between the subtrees, only relative to their parent.Wean
is shift up possible for a value for the last element after swapping? also indeces are mostly less used for heaps as they changes frequently while pushing and poping. what would be strategy for delete by value?Simons
@SazzadHissainKhan If you want to delete by value, you have to maintain an dictionary or hash map, keyed by value, that contains the index of the item in the heap. And, yes, it is possible for a value to move up after swapping. See my update.Propulsion
Your link's expired.Kast
@ptr_user7813604 Yes, the DevSource article is gone. The code link is still good, though.Propulsion
@JosephGarvin the issue is that in some cases it has to sift up and in others it has to sift down. Your answer already shows when sift up is necessary. Here's an example of when it needs to sift down: excalidraw.com/…Catlin
A
25

The problem with removing an arbitrary element from a heap is that you cannot find it.

In a heap, looking for an arbitrary element is O(n), thus removing an element [if given by value] is O(n) as well.

If it is important for you to remove arbitrary elements form the data structure, a heap is probably not the best choice, you should consider full sorted data structurs instead such as balanced BST or a skip list.

If your element is given by reference, it is however possible to remove it in O(logn) by simply 'replacing' it with the last leaf [remember a heap is implemented as a complete binary tree, so there is a last leaf, and you know exactly where it is], remove these element, and re-heapify the relevant sub heap.

Aboulia answered 2/1, 2012 at 21:28 Comment(8)
Note that you can find it if you're willing to roll your own heap implementation. You just need it to save a mapping from object to index, or to intrusively store the index on the object, and update it every time it swaps elements. Updating isn't expensive since it's only ever on a swap, where you just swap the indexes; you don't have to do any scanning.Wean
@Joseph Garvin I was looking for such an answer. Had an interview problem "find k most frequent elements in a stream". Your approach of custom heap implementation fits better.Nominative
@JosephGarvin Could you please elaborate on "save a mapping from object to index"? How would this index look like?Caucasian
@Caucasian one way is to have the object store its index into the heap as a field. When you move objects in the heap with push/pop, you also make sure to change the field to reflect the new index. The other way would be a hash table from Object* to index, and have your heap store Object* rather than Object directly. Then when you move objects in the heap you also look them up in the hash table and change the index value stored there.Wean
@JosephGarvin If I understand your first solution correctly, the nodes of the head contain the objects that you want to heap, as well as an index. So instead of searching for a particular object through the heap, you are searching for a particular index. Why is this better?Caucasian
@Caucasian A heap is usually just a vector/array with the items arranged a particular way. The challenge with removing an element is knowing where it is in the heap. If you already have access to the object, and just need to clean it out of the heap, then an index stored on the object tells you the index into the vector/array so you know exactly where it is in the heap, you don't search. You still have to do the lg(n) work to remove it from there though. This only works if whenever you move things stored in the heap you also change the index stored on them.Wean
@JosephGarvin A max heap of the numbers 100, 150, 200, 250 is [250, 150, 200, 100]. The best way I can think of to create an index that we don't have to search would to create an array index with index[250] = 0, index[150] = 1], index[200] = 2, index[100] = 3. But this is very wasteful, so it's not how it's done?Caucasian
@Caucasian Instead of storing integers directly in the heap, you store a struct that contains the integer and the index together. Or you can store the index separately but then you do it with a hash table, not a plain array. If you haven't learned hash tables yet, they solve the wastefulness problem you are talking about.Wean
M
3

If you have a max heap, you could implement this by assigning a value larger than any other (eg something like int.MaxValue or inf in whichever language you are using) possible to the item to be deleted, then re-heapify and it will be the new root. Then perform a regular removal of the root node.

This will cause another re-heapify, but I can't see an obvious way to avoid doing it twice. This suggests that perhaps a heap isn't appropriate for your use-case, if you need to pull nodes from the middle of it often.

(for a min heap, you can obviously use int.MinValue or -inf or whatever)

Mossback answered 2/1, 2012 at 20:52 Comment(1)
See my answer for a straight-forward way that avoids doing the work twice. It gets kind of obvious if you think about similarities to the increase function in a max-heap or the intermediate steps in the build-heap function - when looking at generalizing max-extraction to any-node-extraction.Heraclitus
H
2

Removing an element from a known heap array position has O(log n) complexity (which is optimal for a heap). Thus, this operation has the same complexity as extracting (i.e. removing) the root element.

The basic steps for removing the i-th element (where 0<=i<n) from heap A (with n elements) are:

  1. swap element A[i] with element A[n-1]
  2. set n=n-1
  3. possibly fix the heap such that the heap-property is satisfied for all elements

Which is pretty similar to how the extraction of the root element works.

Remember that the heap-property is defined in a max-heap as:

A[parent(i)] >= A[i], for 0 < i < n

Whereas in a min-heap it's:

A[parent(i)] <= A[i], for 0 < i < n

In the following we assume a max-heap to simplify the description. But everything works analogously with a min-heap.

After the swap we have to distinguish 3 cases:

  1. new key in A[i] equals the old key - nothing changes, done
  2. new key in A[i] is greater than the old key. Nothing changes for the sub-trees l and r of i. If previously A[parent(i)] >= A[j] was true then now A[parent(i)]+c >= A[j] must be true as well (for j in (l, r) and c>=0). But the ancestors of element i might need fixing. This fix-up procedure is basically the same as when increasing A[i].
  3. new key in A[i] is smaller than the old key. Nothing changes for the ancestors of element i, because if the previous value already satisfied the heap property, a smaller value values does it as well. But the sub-trees might now need fixing, i.e. in the same way as when extracting the maximum element (i.e. the root).

An example implementation:

void heap_remove(A, i, &n)
{
    assert(i < n);
    assert(is_heap(A, i));
    
    --n;
    if (i == n)
      return;
    
    bool is_gt = A[n] > A[i];

    A[i] = A[n]; 
    
    if (is_gt)
        heapify_up(A, i);
    else
        heapify(A, i, n);
}

Where heapifiy_up() basically is the textbook increase() function - modulo writing the key:

void heapify_up(A, i)
{
    while (i > 0) {
        j = parent(i);
        if (A[i] > A[j]) {
            swap(A, i, j);
            i = j;
        } else {
            break;
        }
    }
}

And heapify() is the text-book sift-down function:

void heapify(A, i, n)
{
    for (;;) {
        l = left(i);
        r = right(i);

        maxi = i;
        if (l < n && A[l] > A[i])
            maxi = l;
        if (r < n && A[r] > A[i])
            maxi = r;
        if (maxi == i)
            break;
        swap(A, i, maxi);
        i = maxi;
    }
}

Since the heap is an (almost) complete binary tree, its height is in O(log n). Both heapify functions have to visit all tree levels, in the worst case, thus the removal by index is in O(log n).

Note that finding the element with a certain key in a heap is in O(n). Thus, removal by key value is in O(n) because of the find complexity, in general.

So how can we keep track of the array position of an element we've inserted? After all, further inserts/removals might move it around.

We can keep track by also storing a pointer to an element record next to the key, on the heap, for each element. The element record then contains a field with the current position - which thus has to be maintained by modified heap-insert and heap-swap functions. If we retain the pointer to the element record after insert, we can get the element's current position in the heap in constant time. Thus, in that way, we can also implement element removal in O(log n).

Heraclitus answered 22/9, 2020 at 18:31 Comment(0)
C
0

What you want to achieve is not a typical heap operation and it seems to me that once you introduce "delete middle element" as a method some other binary tree(for instance red-black or AVL tree) is a better choice. You have a red-black tree implemented in some languages(for instance map and set in c++).

Otherwise the way to do middle element deletion is as proposed in rejj's answer: assign a big value(for max heap) or small value(for min heap) to the element, sift it up until it is root and then delete it.

This approach still keeps the O(log(n)) complexity for middle element deletion, but the one you propose doesn't. It will have complexity O(n*log(n)) and therefor is not very good. Hope that helps.

Chronopher answered 2/1, 2012 at 21:27 Comment(0)
S
0

Adding a such heap Python example.

Hope this will be useful to someone who looks here later.

from collections import defaultdict


class HeapSet:
    def __init__(self):
        self.heap = []
        self.positions = defaultdict(set)

    def _swap(self, i: int, j: int):
        obj = self.heap[i]
        swap_with_obj = self.heap[j]
        if obj == swap_with_obj:
            return

        self.heap[i], self.heap[j] = self.heap[j], self.heap[i]

        self.positions[obj].add(j)
        self.positions[obj].remove(i)

        self.positions[swap_with_obj].add(i)
        self.positions[swap_with_obj].remove(j)

    def _sink(self, i: int):
        while i < self.size():
            swap_with = i
            if i * 2 + 1 < self.size() and self.heap[swap_with] > self.heap[i * 2 + 1]:
                swap_with = i * 2 + 1
            if i * 2 + 2 < self.size() and self.heap[swap_with] > self.heap[i * 2 + 2]:
                swap_with = i * 2 + 2
            if swap_with == i:
                break
            else:
                self._swap(i, swap_with)
                i = swap_with

    def _swim(self, i: int):
        while i > 0:
            swap_with = i
            if (i - 1) // 2 >= 0 and self.heap[swap_with] < self.heap[(i - 1) // 2]:
                swap_with = (i - 1) // 2
            if swap_with == i:
                break
            else:
                self._swap(i, swap_with)
                i = swap_with

    def add(self, obj):
        self.heap.append(obj)
        self.positions[obj].add(self.size() - 1)
        self._swim(self.size() - 1)

    def remove(self, obj):
        pos = next(iter(self.positions[obj]))
        self._swap(pos, self.size() - 1)
        self.positions[obj].remove(self.size() - 1)
        self.heap.pop()

        if pos != self.size():
            self._sink(pos)
            self._swim(pos)

    def get_top(self):
        if not self.heap:
            return None

        return self.heap[0]

    def size(self):
        return len(self.heap)
Santossantosdumont answered 29/10, 2023 at 20:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.