Remove an element from the middle of an std::heap
Asked Answered
V

6

20

I'm using a priority queue as a scheduler with one extra requirement. I need to be able to cancel scheduled items. This equates to removing an item from the middle of the priority queue.

I can't use std::priority_queue as access to any element other than top is protected.

I'm trying to use the algorithm's heap functions. But I'm still missing the piece I need. When I remove an element I from the middle of the heap I want it to rebuild itself efficiently. C++ provides these heap functions:

  • std::make_heap O(3n)
  • std::push_heap O(lg(n))
  • std::pop_heap O(2 lg(n))

I want a new function like std::repair_heap with a big-O < 3n. I'd provide it with location of the hole where the canceled item used to reside and it would properly adjust the heap.

It seems to be a huge oversight to not to provide a std::repair_heap function. Am I missing something obvious?

Is there library that provides an stl-compliant std::repair_heap?

Is there a better data structure for modeling a scheduler?

NOTE:
I'm not using an std::map for a few reasons.

  • A heap has constant memory overhead.
  • A heap has awesome cache locality.
Vehicle answered 19/1, 2011 at 17:17 Comment(8)
Please correct me if I'm wrong, but I think you have to call std::make_heap for this, since you'll have to move elements around anyways.Palpitant
I can just use std::make_heap. But it feels like there should be a faster alternative. I suspect repair_heap could be written to be O(lg(n)), like push and pop. My reasoning is repair_heap is just popping from the middle of the heap instead of the head.Vehicle
I wrote something like repair_heap for a similar scheduling problem (not in C++) and it can be done. I later hit the same problem as you with std::priority_queue and ultimately I decided that removal was so infrequent compared to insertion (possibly not true for you) that I just copied the heap while removing the element. My goal in that case was to create as little new/untested code as possible.Stonehenge
This is really nitpicky, but while it's technically correct to say O(3n) or O(2 lg n), this is usually not done because it misses the point of big-O. Big-O categorizes relative growth rates without considering constants. Instead of writing O(3n), just write O(n). Similarly, don't write O(2 lg n), write O(lg n). Now, if you really do want to say that your code runs with at most 2 lg n comparisons, then that's perfectly fine - just say that directly and don't use big-O notation.Eaves
@templatetypedef, OP is actually just adopting cplusplus.com's way of explaining the time complexity of their algorithms in this instance, so we shouldn't fault OP for it.Stephniestepladder
@Stephniestepladder Looking at the page you linked, it says "Up to linear in three times the distance between first and last." That's not the same as saying O(3n); the first of these is a very specific bound that's not specified asymptotically, while the second is technically correct but misleading.Eaves
I don't consider @Eaves s correction of the use of Big-O to be in the least bit nit-picky. In fact, if you pay attention to it you might notice your overall problem could be O(n) in any case..... How would you find the element you want to remove in the first place? (Without special effort it would be O(n)) What is the cost of deleting? (Usually O(1)) Even with O(log(n)) repair: Find O(n), Delete O(1), Repair O(log(n)) is still O(n) This is the same as Find, Delete, Rebuild O(n).Almondeyed
Posting on old question, but still: Here is how to do it, in Java, but still useful: mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/…Joscelin
E
7

Unfortunately, the standard is missing this (fairly important) function. With g++, you can use the non-standard function std::__adjust_heap to do this, but there's no easy portable way of doing it -- and __adjust_heap is slightly different in different versions of g++, so you can't even do it portably over g++ versions.

Ecuador answered 19/1, 2011 at 18:17 Comment(4)
std::__adjust_heap looks like the right thing but there isn't any documentation. In particular I don't know what the __value parameter is for.Vehicle
#229283 suggests that a function starting with "__" like __adjust_heap is for implementation only.Stability
@gregg: __adjust_heap is for implementation only. That makes it a bit risky to use as the function is undocumented, may change signature, or go away entirely. Worse case scenario is if the __adjust_heap changes semantics is a way that doesn't change its signature. eg it stops adjusting the heap and now randomizes it. So I'll have to be very careful while using it and write some awesome unit tests.Vehicle
I don't think the function is that important. Finding an item in the middle of the heap (so it can be deleted) is already O(n). So if this is something that needs to be done a lot: a "repair/adjust heap" function is going to be far less useful than a more appropriate algorithm/data structure.Almondeyed
A
13

I guess you know which element in the heap container (index n) you want to delete.

  1. Set the value v[n] = BIG; the value BIG is really bigger than any other values in the heap.
  2. Call std::push_heap( v.begin(), v.begin()+n+1 );
  3. Call std::pop_heap( v.begin(), v.end() );
  4. Call v.pop_back();
  5. Done

Operation is O(ln(n))

RE: request for proof

First, a qualifier: This method assumes something about the algorithm used by std push_heap.
Specifically, it assumes that std push_heap( v.begin(), v.begin()+n+1 ) will only alter the range [0, n]
for those elements which are ascendants of n, i.e., indices in the following set:

A(n)={n,(n-1)/2,((n-1)/2-1)/2....0}.  

Here is a typical spec for std push_heap:

http://www.cplusplus.com/reference/algorithm/push_heap/ "Given a heap range [first,last-1), this function extends the range considered a heap to [first,last) by placing the value in (last-1) into its corresponding location in it."

Does it guarantee to use the "normal heap algorithm" that you read about in textbooks? You tell me.

Anyway, here is the code which you can run and see, empirically, that it works. I am using VC 2005.

#include <algorithm>
#include <vector>
#include <iostream>

bool is_heap_valid(const std::vector<int> &vin)
{
    std::vector<int> v = vin;
    std::make_heap(v.begin(), v.end());
    return std::equal(vin.begin(), vin.end(), v.begin());
}


int _tmain(int argc, _TCHAR* argv[])
{
    srand(0);
    std::vector<int> v;
    for (int i=0; i<100; i++)
    {
        v.push_back( rand() % 0x7fff );
    }
    std::make_heap(v.begin(), v.end());

    bool bfail = false;
    while( v.size() >= 2)
    {
        int n = v.size()/2;
        v[n] = 0x7fffffff;
        std::push_heap(v.begin(), v.begin()+n+1);
        std::pop_heap(v.begin(), v.end());
        v.resize(v.size()-1);
        if (!is_heap_valid(v))
        {
            std::cout << "heap is not valid" << std::endl;
            bfail = true;
            break;
        }
    }
    if (!bfail)
        std::cout << "success" << std::endl;

    return 0;

}

But I have another problem, which is how to know the index "n" which needs to be deleted. I cannot see how to keep track of that (know the place in the heap) while using std push_heap and std pop_heap. I think you need to write your own heap code and write the index in the heap to an object every time the object is moved in the heap. Sigh.

Anthraquinone answered 9/8, 2011 at 3:1 Comment(3)
I'm not convinced this will work. Convince me and I'll change the accepted answer to this one.Vehicle
I think you actually should be able to use std::push_heap and std::pop_heap if you overload std::swap for the type you're keeping in the heap. Make a member on your object for storing the index into the heap, and before inserting set it to the current size of the heap. This will be the right value at the beginning because the first heap insert step is to make the new element be the last one on the last level, so its index is the size of the heap before the insert. Then every time the heap functions call std::swap, have your override swap the index stored on each element as well.Lieabed
Unfortunately empirical evidence is insufficient. In fact, even a complete mathematical proof for a particular implementation is no good because another compiler or different version of the same compiler might use a different algorithm. The problem is: the only thing that makes a heap "a heap" is the properties maintained by the tree structure (it doesn't even have to be a binary heap). A heap implementation transforms a random access iterator into a virtual tree. So when you rearrange the tree over begin..begin()+n+1 are heap properties still valid over begin..end?Almondeyed
E
7

Unfortunately, the standard is missing this (fairly important) function. With g++, you can use the non-standard function std::__adjust_heap to do this, but there's no easy portable way of doing it -- and __adjust_heap is slightly different in different versions of g++, so you can't even do it portably over g++ versions.

Ecuador answered 19/1, 2011 at 18:17 Comment(4)
std::__adjust_heap looks like the right thing but there isn't any documentation. In particular I don't know what the __value parameter is for.Vehicle
#229283 suggests that a function starting with "__" like __adjust_heap is for implementation only.Stability
@gregg: __adjust_heap is for implementation only. That makes it a bit risky to use as the function is undocumented, may change signature, or go away entirely. Worse case scenario is if the __adjust_heap changes semantics is a way that doesn't change its signature. eg it stops adjusting the heap and now randomizes it. So I'll have to be very careful while using it and write some awesome unit tests.Vehicle
I don't think the function is that important. Finding an item in the middle of the heap (so it can be deleted) is already O(n). So if this is something that needs to be done a lot: a "repair/adjust heap" function is going to be far less useful than a more appropriate algorithm/data structure.Almondeyed
S
3

How does your repair_heap() work? Here's my guess:

If your heap is defined by some iterator range, say (heapBegin, heapEnd). The element you want to remove is the root of some subtree of the heap, which is defined by some subrange (subHeapBegin, subHeapEnd). Use std::pop_heap(subHeapBegin, subHeapEnd), then if subHeapEnd != heapEnd, swap the values at *(subHeapEnd-1) and *(heapEnd-1), which should put your deleted item at the end of the heap container. Now you have to percolate the element at *(subHeapEnd-1) up in your subheap. If I haven't missed something, which is possible, then all that remains is to chop the end element off of the heap container.

Before going to the trouble of trying to code that correctly (I've skipped some details like calculating subHeapBegin and subHeapEnd), I'd run some tests to determine if make_heap() really slows you down. Big-O is useful, but it's not the same thing as actual execution time.

Stability answered 19/1, 2011 at 18:30 Comment(0)
S
-1

It seems to me that removing from the middle of a heap might mean the entire heap has to be rebuilt: The reason there's no repair_heap is because it would have to do the same (big-oh) work as make_heap.

Are you able to do something like put std::pair<bool, Item> in the heap and just invalidate items instead of removing them? Then when they finally get to the top just ignore the item and move along.

Selfreliant answered 19/1, 2011 at 17:32 Comment(4)
Ignoring scheduled items was my original strategy. However, one user of the scheduler overwhelmed it with canceled items. Less than a 100 actively schedule items but 1000s of canceled items. To solve this I'm trying to add true cancellation.Vehicle
@deft_code, how about keeping a count of cancelled items and only rebuild the heap when the count reaches a threshold?Offcolor
removing an element from the middle of a heap is never more expensive that removing the top element (O(lg n)), so this should be possible, its just missing from the STLEcuador
@ChrisDodd While the act of removing an element from the middle should be O(log(n)), the catch is you cannot remove something without first finding the element to remove. Since the properties of a heap make no guarantees about which subtree will contain a particular element that is not the highest value, finding an arbitrary item is effectively O(n). Therefore find+ delete is also O(n). Conclusion: Mark B is right about the overall operation still being O(n), but for the wrong reason. And 'repair heap' is probably not that useful; a different approach/structure may be better.Almondeyed
T
-1

You can try ‘std::multiset’ which is implemented as the heap structure and support ‘std::erase’ operation, so you could ‘std::find’ the element then erase it.

Tacita answered 7/4, 2020 at 4:22 Comment(1)
This answer is incorrect because std::multiset is a self-balancing binary tree, not a binary heap (the inorder traversal of the tree is sorted, as opposed to the tree having the heap invariant)Baleful
B
-2

Here's a bit of delphi code i used to remove items from a heap. I don't know this C++ of which you speak and don't have a repair function, but hey..

first the pop, so you get an idea of how the thing works:

function THeap.Pop: HeapItem;
begin
  if fNextIndex > 1 then begin
    Dec(fNextIndex);
    Result:= fBuckets[1];   //no zero element
    fBuckets[1] := fBuckets[fNextIndex];
    fBuckets[fNextIndex] := nil;
    FixHeapDown;            //this has a param defaulting to 
    end
  else
    Result:= nil;
end;

now to contrast, the deletion:

procedure THeap.Delete(Item: HeapItem);
var
  i:integer;
begin
  for i:=1 to pred(fNextIndex) do
    if Item=fBuckets[i] then begin
      dec(fNextIndex);
      fBuckets[i] := fBuckets[fNextIndex];
      fBuckets[fNextIndex] := nil;
      FixHeapDown(i);
      break;
      end;
end;

its of course a no-no to even think about doing what we're doing here, but hey, costs do change sometimes and jobs do get canceled.

enjoy. i hope this helps.

Benitabenites answered 27/1, 2011 at 19:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.