Why use a flat list in heapsort?
Asked Answered
W

2

0

In heapsort, the data is stored in something called a "heap". Almost all the implementations I've seen use a flat list for the data structure.

Can someone explain to me why this is?

Why not use nested arrays or an instance of a binary Tree? Isn't explicit better than implicit?

Is it because of implementation difficulties like traversing the structure, or something else?

Warden answered 30/3, 2012 at 6:58 Comment(2)
@NickBarnes You too. It'd be really nice as I've just been automatically 'banned' for asking too many (to be specific, only 3) "low-quality" questions.Warden
@Will b'coz? (Wasting space to circumvent SOs minimum char limit)Warden
S
3

If you want to see how heapsort can be implemented in Python then look no further than the standard library module heapq. Python has both C and Python implementations of heapsort and the heapq module defines the Python ones then overwrites them (if available) with the C ones. That means you can read and understand the Python implementation but get the benefit of the C version if you actually use it.

A quick example of using the module is given at the end:

heap = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
for item in data:
    heappush(heap, item)
sort = []
while heap:
    sort.append(heappop(heap))
print sort

A heap is represented by a partially sorted list which has the constraint that for every element at index n in the list, the relationship holds that heap[n] <= heap[n*2+1] and heap[n] <= heap[n*2+2] (ignoring elements that don't exist). This is a simple way to collapse a binary tree down to a simple list for ease of storage.

heappush() puts a new element into the list maintaining that invariant, heappop() removes the smallest element. heapify(somelist) reorders the list in-place to satisfy the invariant.

heapsort is very useful when you want to only sort part of the list (give me the smallest k items), or where you want to process the smallest items while continually receiving new items that go into the list. A good example of the latter would be an operating system task scheduler where you keep a heap of runnable threads in priority order and can quickly pop the highest priority runnable thread off the heap whenever you need to schedule a thread to run.

Edit: There are several reasons why a list/array is more appropriate for heap storage than an explicit tree structure. The most obvious ones are that the explicit tree has a greater memory overhead (either involving pointers within each object or a separate object to be allocated for each object in the heap) and is also slower as any time an object moves within the heap you have to update multiple pointers to children and possibly parents.

Slightly less obvious is that you need to be able to easily get at the last element which is easy in a list but would mean you also need to store and update sibling pointers on each element. The reason why you need to be able to get at the last element easily is that to add an element you make it the last element and then reorder it with respect to its parent and sibling (an O(log n) operation) or to remove the smallest you simply put the current last element in its place and reorder downwards. If you don't have O(1) access to the final element of the tree then both of these operations take a bad performance hit.

Scourge answered 30/3, 2012 at 7:45 Comment(2)
Why don't we just use nested arrays as that makes the data-structure explicit instead of implicitWarden
@YatharthROCK I've edited my answer to give you some reasons why the implicit structure works better.Scourge
F
3

You should take a look at it you will find a lot of important thing about the sorting algorithms: sorting

On the other hand heap is a tree-based data structure with special properties. In case of maximum heap the greatest element is in the root node and if B is a child node of A, then key(A) ≥ key(B).
Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heapsort.
You should also check out this about heap sort.
[EDIT] Python-related implementation of heap sort can be found here.

Frangipane answered 30/3, 2012 at 7:2 Comment(6)
Maybe you could add a note that heaps are really a tree-based structure, but often implemented using an array, which sometimes confuses a bit.Frink
So can I just use a Tree class I've built to implement heapsort?Warden
How can you represent a tree w/ an array? A nested array perhaps?Warden
@dexametason: It can either be maximum element on top or minimum element i.e. a max or a min heap.Araliaceous
@YatharthROCK Everything in memory is written using a single long byte array. Heaps, objects, strings, lists, maps, activities, algorithms...Constantine
@YatharthROCK Binary Heaps are usually represented with complete binary trees (that is, there are no holes in the tree if you "read" it line by line from left to right, top to bottom - but there are other terms for this kind of tree). Such a tree can quite comfortably be embedded into a continuous array.Frink
S
3

If you want to see how heapsort can be implemented in Python then look no further than the standard library module heapq. Python has both C and Python implementations of heapsort and the heapq module defines the Python ones then overwrites them (if available) with the C ones. That means you can read and understand the Python implementation but get the benefit of the C version if you actually use it.

A quick example of using the module is given at the end:

heap = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
for item in data:
    heappush(heap, item)
sort = []
while heap:
    sort.append(heappop(heap))
print sort

A heap is represented by a partially sorted list which has the constraint that for every element at index n in the list, the relationship holds that heap[n] <= heap[n*2+1] and heap[n] <= heap[n*2+2] (ignoring elements that don't exist). This is a simple way to collapse a binary tree down to a simple list for ease of storage.

heappush() puts a new element into the list maintaining that invariant, heappop() removes the smallest element. heapify(somelist) reorders the list in-place to satisfy the invariant.

heapsort is very useful when you want to only sort part of the list (give me the smallest k items), or where you want to process the smallest items while continually receiving new items that go into the list. A good example of the latter would be an operating system task scheduler where you keep a heap of runnable threads in priority order and can quickly pop the highest priority runnable thread off the heap whenever you need to schedule a thread to run.

Edit: There are several reasons why a list/array is more appropriate for heap storage than an explicit tree structure. The most obvious ones are that the explicit tree has a greater memory overhead (either involving pointers within each object or a separate object to be allocated for each object in the heap) and is also slower as any time an object moves within the heap you have to update multiple pointers to children and possibly parents.

Slightly less obvious is that you need to be able to easily get at the last element which is easy in a list but would mean you also need to store and update sibling pointers on each element. The reason why you need to be able to get at the last element easily is that to add an element you make it the last element and then reorder it with respect to its parent and sibling (an O(log n) operation) or to remove the smallest you simply put the current last element in its place and reorder downwards. If you don't have O(1) access to the final element of the tree then both of these operations take a bad performance hit.

Scourge answered 30/3, 2012 at 7:45 Comment(2)
Why don't we just use nested arrays as that makes the data-structure explicit instead of implicitWarden
@YatharthROCK I've edited my answer to give you some reasons why the implicit structure works better.Scourge

© 2022 - 2024 — McMap. All rights reserved.