Quicksort with Python
In real life, we should always use the builtin sort provided by Python. However, understanding the quicksort algorithm is instructive.
My goal here is to break down the subject such that it is easily understood and replicable by the reader without having to return to reference materials.
The quicksort algorithm is essentially the following:
- Select a pivot data point.
- Move all data points less than (below) the pivot to a position below the pivot - move those greater than or equal to (above) the pivot to a position above it.
- Apply the algorithm to the areas above and below the pivot
If the data are randomly distributed, selecting the first data point as the pivot is equivalent to a random selection.
Readable example:
First, let's look at a readable example that uses comments and variable names to point to intermediate values:
def quicksort(xs):
"""Given indexable and slicable iterable, return a sorted list"""
if xs: # if given list (or tuple) with one ordered item or more:
pivot = xs[0]
# below will be less than:
below = [i for i in xs[1:] if i < pivot]
# above will be greater than or equal to:
above = [i for i in xs[1:] if i >= pivot]
return quicksort(below) + [pivot] + quicksort(above)
else:
return xs # empty list
To restate the algorithm and code demonstrated here - we move values above the pivot to the right, and values below the pivot to the left, and then pass those partitions to same function to be further sorted.
Golfed:
This can be golfed to 88 characters:
q=lambda x:x and q([i for i in x[1:]if i<=x[0]])+[x[0]]+q([i for i in x[1:]if i>x[0]])
To see how we get there, first take our readable example, remove comments and docstrings, and find the pivot in-place:
def quicksort(xs):
if xs:
below = [i for i in xs[1:] if i < xs[0]]
above = [i for i in xs[1:] if i >= xs[0]]
return quicksort(below) + [xs[0]] + quicksort(above)
else:
return xs
Now find below and above, in-place:
def quicksort(xs):
if xs:
return (quicksort([i for i in xs[1:] if i < xs[0]] )
+ [xs[0]]
+ quicksort([i for i in xs[1:] if i >= xs[0]]))
else:
return xs
Now, knowing that and
returns the prior element if false, else if it is true, it evaluates and returns the following element, we have:
def quicksort(xs):
return xs and (quicksort([i for i in xs[1:] if i < xs[0]] )
+ [xs[0]]
+ quicksort([i for i in xs[1:] if i >= xs[0]]))
Since lambdas return a single epression, and we have simplified to a single expression (even though it is getting more unreadable) we can now use a lambda:
quicksort = lambda xs: (quicksort([i for i in xs[1:] if i < xs[0]] )
+ [xs[0]]
+ quicksort([i for i in xs[1:] if i >= xs[0]]))
And to reduce to our example, shorten the function and variable names to one letter, and eliminate the whitespace that isn't required.
q=lambda x:x and q([i for i in x[1:]if i<=x[0]])+[x[0]]+q([i for i in x[1:]if i>x[0]])
Note that this lambda, like most code golfing, is rather bad style.
In-place Quicksort, using the Hoare Partitioning scheme
The prior implementation creates a lot of unnecessary extra lists. If we can do this in-place, we'll avoid wasting space.
The below implementation uses the Hoare partitioning scheme, which you can read more about on wikipedia (but we have apparently removed up to 4 redundant calculations per partition()
call by using while-loop semantics instead of do-while and moving the narrowing steps to the end of the outer while loop.).
def quicksort(a_list):
"""Hoare partition scheme, see https://en.wikipedia.org/wiki/Quicksort"""
def _quicksort(a_list, low, high):
# must run partition on sections with 2 elements or more
if low < high:
p = partition(a_list, low, high)
_quicksort(a_list, low, p)
_quicksort(a_list, p+1, high)
def partition(a_list, low, high):
pivot = a_list[low]
while True:
while a_list[low] < pivot:
low += 1
while a_list[high] > pivot:
high -= 1
if low >= high:
return high
a_list[low], a_list[high] = a_list[high], a_list[low]
low += 1
high -= 1
_quicksort(a_list, 0, len(a_list)-1)
return a_list
Not sure if I tested it thoroughly enough:
def main():
assert quicksort([1]) == [1]
assert quicksort([1,2]) == [1,2]
assert quicksort([1,2,3]) == [1,2,3]
assert quicksort([1,2,3,4]) == [1,2,3,4]
assert quicksort([2,1,3,4]) == [1,2,3,4]
assert quicksort([1,3,2,4]) == [1,2,3,4]
assert quicksort([1,2,4,3]) == [1,2,3,4]
assert quicksort([2,1,1,1]) == [1,1,1,2]
assert quicksort([1,2,1,1]) == [1,1,1,2]
assert quicksort([1,1,2,1]) == [1,1,1,2]
assert quicksort([1,1,1,2]) == [1,1,1,2]
Conclusion
This algorithm is frequently taught in computer science courses and asked for on job interviews. It helps us think about recursion and divide-and-conquer.
Quicksort is not very practical in Python since our builtin timsort algorithm is quite efficient, and we have recursion limits. We would expect to sort lists in-place with list.sort
or create new sorted lists with sorted
- both of which take a key
and reverse
argument.
my_list = list1 + list2 + ...
. Or unpack lists to new listmy_list = [*list1, *list2]
– Chelsae