Your best option would be to use heap sort, which is both reasonably time efficient (with an O(n log n) time complexity) and space efficient (with a guaranteed O(1) space complexity).
While Python has a built-in module heapq
that implements a binary heap, it only exports one of the two functions necessary to perform an in-place heap sort, heapify
, which transforms a list into a min heap; the other necessary function, _siftup
, a function that bubbles up the smaller child of a given starting position (and so on with that child's children, etc) until hitting a leaf, is left unexported.
Without _siftup
, one can only perform a heap sort by popping smallest values from a heap into a new list as documented, which costs O(n) in space complexity:
A heapsort can be implemented by pushing all values onto a heap and
then popping off the smallest values one at a time:
>>> def heapsort(iterable):
... h = []
... for value in iterable:
... heappush(h, value)
... return [heappop(h) for i in range(len(h))]
...
>>> heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
What's more, heapify
transforms a list into a min heap, which is not ideal for in-place sorting because we want to swap the larger sorted items towards the end, not the other way around. Quote from heapq
's documentation:
Our pop method returns the smallest item, not the largest (called a
“min heap” in textbooks; a “max heap” is more common in texts because
of its suitability for in-place sorting).
Luckily, for the needs of heapq.merge
in reverse mode, heapq
actually also implements a max heap with other unexported functions, including _heapify_max
, a max heap version of heapify
, and _siftup_max
, a max heap version of _siftup
.
However, the heapq._siftup_max
function doesn't take an end position as an argument, which is needed to limit the size of the heap to preserve the already sorted items towards the end of the list. So to work around the lack of an end position argument while maintaining an O(1) space complexity, we can pass to it a sliced memoryview
of an array.array
instead because you mentioned in a comment that what you have are "typically integers that can always fit in memory", which can easily be loaded as an array
of type 'q'
(64-bit signed integers).
But then, the heapq
module would try to import from its C implementation, _heapq
, if available, and _heapq._heapify_max
would specifically validate the argument as a list
and deny an array
. The Python implementation of heapq._heapify_max
doesn't have this limitation, so to import it we would need to first import _heapq
ourselves and delete _heapify_max
from the _heapq
module object so that heapq
's _heapify_max
would not be overwritten when we import heapq
:
import sys
import _heapq
del sys.modules['_heapq']._heapify_max
from heapq import _heapify_max, _siftup_max
So here's how you can perform heap sort with heapq
's built-in functions, by first heapifying the array with _heapify_max
, and then iteratively swapping the largest number at the root with the leaf at the end and using _siftup_max
to sift it up to where all of its children are smaller:
def heapsort(arr):
_heapify_max(arr)
view = memoryview(arr)
for size in reversed(range(1, len(arr))):
arr[0], arr[size] = arr[size], arr[0]
_siftup_max(view[:size], 0)
or to perform heap sort with a min heap, you would have to reverse the result in the end:
import sys
import _heapq
del sys.modules['_heapq'].heapify
from heapq import heapify, _siftup
from array import array
def heapsort(arr):
view = memoryview(arr)
heapify(view)
for size in reversed(range(1, len(arr))):
view[0], view[size] = view[size], view[0]
_siftup(view[:size], 0)
arr.reverse()
so that:
arr = array('q', [1, 5, 0, 7, 3, 6, 9, 8, 2, 4])
heapsort(arr)
print(arr)
outputs:
array('q', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Demo here with max heap and min heap
Alternatively, as @KellyBundy pointed out in the comments, we can work around heapq._siftup
's lack of an end position argument by using a proxy object that limits the size of the heap with an artificially set size
attribute when sliced and reports that attribute as the length of the heap when its __len__
method is called.
Besides allowing any list as an input, a bonus benefit with this approach compared to memoryview
is that we don't need to import _heapq
first to delete heapify
because we can now pass to it the actual list:
from heapq import heapify, _siftup
class heapsort:
def __init__(self, lst):
self.list = lst
heapify(lst)
for self.size in reversed(range(1, len(lst))):
lst[0], lst[self.size] = lst[self.size], lst[0]
_siftup(self, 0)
lst.reverse()
def __len__(self):
return self.size
def __getitem__(self, index):
return self.list[index]
def __setitem__(self, index, value):
self.list[index] = value
Demo here with proxy object
Similarly, we can use a proxy object to make the heappop
-based heap sort work in-place by making the proxy object merely return the item at the end of the heap instead of actually removing it when popped.
In this case, however, the C implementation of heappop
would not only validate the proxy object as a list
object (which can be remedied by making the proxy object inherit from list
) but also access its length directly as a C attribute instead of calling our overridden __len__
method, so we do have to delete the C implementation to invoke the Python version instead.
This approach has the benefit of sticking to the publicly available API and is therefore the least prone to be affected by a change in heapq
's implementation:
import sys
import _heapq
del sys.modules['_heapq'].heappop
from heapq import heapify, heappop
class heapsort:
def __init__(self, lst):
heapify(lst)
self.list = lst
for self.size in range(len(lst), 0, -1):
lst[self.size - 1] = heappop(self)
lst.reverse()
def __len__(self):
return self.size
def __getitem__(self, index):
return self.list[index]
def __setitem__(self, index, value):
self.list[index] = value
def pop(self):
return self.list[self.size - 1]
Demo here with heappop
Finally, note that you can always fall back to implementing heap sort from scratch in the highly unlikely event that heapq
completely changes its implementation to the point that none of the approaches above would work (again, highly unlikely especially for the heappop
-based solution):
def heapsort(lst):
for child in range(1, length := len(lst)):
while lst[child] > lst[parent := int((child - 1) / 2)]:
lst[child], lst[child := parent] = lst[parent], lst[child]
for size in range(length - 1, 0, -1):
lst[child := 0], lst[size] = lst[size], lst[0]
while (parent := child) < size:
if (child := 2 * parent + 1) < size - 1 and lst[child] < lst[child + 1]:
child += 1
if child < size and lst[parent] < lst[child]:
lst[parent], lst[child] = lst[child], lst[parent]
sorted
. – Plott'quicksort'
having 0 work space while being faster on average than'heapsort'
. Wouldn't that make it a better choice or is the table incorrect in some way? – HereofO(1)
extra space, but most tables lie ;-) about quicksort, which generally requiresO(log n)
extra space (either implicitly due to recursion, or explicitly to maintain a stack of slice bounds remaining to be sorted). In practice that's essentially "0" to almost everyone, but can't guess about the OP here. – Luminal