Can max/min heap trees contain duplicate values?

Asked 21/3, 2014 at 21:47 Answered 13/7, 2023 at 14:41

Solved java binary-tree heapsort min-heap max-heap

I'm wondering if a max or min heap tree is allowed to have duplicate values? I've been unsuccessful in trying to find information regarding this with online resources alone.

Tradescantia answered 21/3, 2014 at 21:47 Comment(1)

I expect that would depend on the implementation. Do you have have one in mind? – Disassemble 21/3, 2014 at 21:50

Yes, they can. You can read about this in 'Introduction to Algorithms' (by Charles E. Leiserson, Clifford Stein, Thomas H. Cormen, and Ronald Rivest). According to the definition of binary heaps in Wikipedia:

All nodes are either [greater than or equal to](max heaps) or [less than or equal to](min heaps) each of its children, according to a comparison predicate defined for the heap.

Adriell answered 21/3, 2014 at 21:55 Comment(3)

Thanks! On a side note, Binary Search trees cannot have duplicate values, correct? – Tradescantia 21/3, 2014 at 21:57

They can have as well. You just have to implement your traversal algorithm properly. On BSTs, wikipedia says that they cannot have duplicate values but the book that I just mentioned allows for it - just modifies the traversal algorithms a bit. Bottom line - BSTs can have them too. – Adriell 21/3, 2014 at 22:0

I'm 99% sure most basic trees can, it depends moreso on your search algorithm – Extraditable 21/3, 2014 at 22:1

Yes they can have duplicates. From wikipedia definition of Heap:

Either the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node (this kind of heap is called max heap) or the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node (min heap)

So if they have children nodes that are equal means that they can have duplicated.

Pix answered 21/3, 2014 at 21:56 Comment(0)

Yes, but I would say no. For efficiency they shouldn't have different nodes with duplicate values or it loses it's purpose a bit (you would have to search child nodes and such). However, you could design each node to contain a variable that declares how many copies of that value you have in your data.

Again this is my opinion. If this is a bad way of doing it I would love if someone could explain why. I might just have to do some efficiency testing. If you are storing simple data types like ints then I would see it being less efficient but for larger object nodes that have ids it's been nice, it seems.

Extraditable answered 21/3, 2014 at 21:56 Comment(3)

I'd like to see proof of this. Not because of "Oh yeah!? PROOVE IT!" But more like... It sounds legitimate and common-sensical, so you should write a full-length article with mathmatical proofs. It would probably get a lot of traffic and attention, and would be a beneficial bit of information. Focus on the point of "It loses its purpose" – Cantlon 6/1, 2016 at 21:27

I'm stupid late to the party, but I think the idea is that the advantage of heaps is that you get O(log(n)) insertions and pops. If you only use those operations it's all good regardless of whether you add duplicates, but if you want to add, say, an (expected) O(log(n)) search, you can't use a hash set anymore (at least not in a straightforward manner--you could track copies as OP suggests) because you have duplicates. – Sarnoff 24/5, 2022 at 23:38

Granted, I would argue that since the insert/pop operations are the "point" of a heap IMHO, duplicates are entirely fine. But it depends on the application, of course. – Sarnoff 24/5, 2022 at 23:38

My short answer is that heaps can have duplicate keys but no duplicate elements. What does that mean?

In heaps, we have two different entities: (i) the element (which Cormen calls an object); and (ii) the key. Each element has a key associated with it. The element is an application object stored in the heap, and the key is a comparable value used in heap organization.

In Cormen's book, in the section about Heaps, he uses integers as keys in his examples. In this case, the elements are implicit, so we can have duplicate values because these values are keys.

On the other hand, in the section about Priority Queues, he explains:

"When you use a heap to implement a priority queue within a given application, elements of the priority queue correspond to objects in the application. Each object contains a key".

One example of this is the Prim algorithm. In this algorithm, the heap (priority queue) stores vertices as elements, and each vertex has a key value (the weight of a light edge connecting this vertex to the tree). In other words, the vertices are elements, and the keys are numbers associated with these vertices. Different vertices can have the same key value.

Ok! But why can't we have duplicate elements? Cormen explains:

"If the priority queue is implemented by a heap, you need to determine which application object corresponds to a given heap element, and vice versa. Because the heap elements are stored in an array, you need a way to map application objects to and from array indices. One way to map between application objects and heap elements uses handles... As an alternative to incorporating handles in application objects, you can store within the priority queue a mapping from application objects to array indices in the heap… One option for the mapping is a hash table."

In other words, given an element, the heap must find its heap index in constant time. We can do this using a hash table. How can we have the same element in two different places on the heap if the hash function will generate the same index for it!? The answer is that we cannot. Using handles, we have the same limitation.

Covetous answered 13/7, 2023 at 14:41 Comment(0)

Recommended topics

Hot tags