Benefits of Quichesort
Asked Answered
A

1

8

I created this program for an assignment in which we were required to create an implementation of Quichesort. This is a hybrid sorting algorithm that uses Quicksort until it reaches a certain recursion depth (log2(N), where N is the length of the list), then switches to Heapsort, to avoid exceeding the maximum recursion depth.

While testing my implementation, I discovered that although it generally performed better than regular Quicksort, Heapsort consistently outperformed both. Can anyone explain why Heapsort performs better, and under what circumstances Quichesort would be better than both Quicksort and Heapsort?

Note that for some reason, the assignment referred to the algorithm as "Quipsort".

Edit: Apparently, "Quichesort" is actually identical to Introsort.

I also noticed that a logic error in my medianOf3() function was causing it to return the wrong value for certain inputs. Here is an improved version of the function:

def medianOf3(lst):
    """
    From a lst of unordered data, find and return the the median value from
    the first, middle and last values.
    """

    first, last = lst[0], lst[-1]
    if len(lst) <= 2:
        return min(first, last)
    middle = lst[(len(lst) - 1) // 2]
    return sorted((first, middle, last))[1]

Would this explain the algorithm's relatively poor performance?

Code for Quichesort:

import heapSort             # heapSort
import math                 # log2 (for quicksort depth limit)

def medianOf3(lst):
    """
    From a lst of unordered data, find and return the the median value from
    the first, middle and last values.
    """

    first, last = lst[0], lst[-1]
    if len(lst) <= 2:
        return min(first, last)
    median = lst[len(lst) // 2]
    return max(min(first, median), min(median, last))

def partition(pivot, lst):
   """
   partition: pivot (element in lst) * List(lst) -> 
        tuple(List(less), List(same, List(more))).  
   Where:
        List(Less) has values less than the pivot
        List(same) has pivot value/s, and
        List(more) has values greater than the pivot

   e.g. partition(5, [11,4,7,2,5,9,3]) == [4,2,3], [5], [11,7,9]
   """

   less, same, more = [], [], []
   for val in lst:
      if val < pivot:
         less.append(val)
      elif val > pivot:
         more.append(val)
      else:
         same.append(val)
   return less, same, more

def quipSortRec(lst, limit):
    """
    A non in-place, depth limited quickSort, using median-of-3 pivot.
    Once the limit drops to 0, it uses heapSort instead.
    """

    if lst == []:
        return []

    if limit == 0:
        return heapSort.heapSort(lst)

    limit -= 1
    pivot = medianOf3(lst)
    less, same, more = partition(pivot, lst)
    return quipSortRec(less, limit) + same + quipSortRec(more, limit)

def quipSort(lst):
    """
    The main routine called to do the sort.  It should call the
    recursive routine with the correct values in order to perform
    the sort
    """

    depthLim = int(math.log2(len(lst)))
    return quipSortRec(lst, depthLim)

Code for Heapsort:

import heapq    # mkHeap (for adding/removing from heap)

def heapSort(lst):
    """
    heapSort(List(Orderable)) -> List(Ordered)
        performs a heapsort on 'lst' returning a new sorted list
    Postcondition: the argument lst is not modified
    """

    heap = list(lst)
    heapq.heapify(heap)
    result = []
    while len(heap) > 0:
        result.append(heapq.heappop(heap))
    return result
Anthropomorphic answered 21/12, 2014 at 18:34 Comment(10)
Quichesort sounds delicious.Scroll
oh my, my python skills are flooding back to me after 2 years of not even touching python!Cabana
Maybe Using heapsort and quicksort together will help? At least, there's a chance that you're both going to Rochester Institute of Technology since a Google search for 'quipsort' leads to three different URLs at the rit.edu domain.Frontality
Searching for 'quichesort' suggests Median of 3 quicksort implementation which switches to heapsort after a recursion depth limit is met and Sorting Python heaps and lists, too.Frontality
@JonathanLeffler Yep, I go to RIT. I also saw those questions, and I'm pretty sure they're based on the same assignment. However, they don't address my question. What I'm trying to figure out is whether this algorithm is actually useful.Anthropomorphic
It's a complicated question, in some respects. You should look at some of the references in Quicksort: Choosing the pivot. You should also lookup Introsort. I'm not sure how your testing is set up, but there are all sorts of different ways to cause sorts problems, with initial distributions that are already sorted order, or reverse sorted order, or in a V shape or inverted-V (organ pipe), or all the same, or random. What works well for some may be bad for others.Frontality
@JonathanLeffler Huh. I just looked up Introsort, and it seems that Quichesort is just another word for the same algorithm.Anthropomorphic
Well, when you say "useful," what do you mean? Generally, the simple algorithms you learn in school are not actually used in the real world; we usually favor library implementations, which tend to be aggressively optimized. Timsort, for instance.Livvi
If the MedianOf3 pivot selection was faulty, it could account for the slowness. However, since another valid pivot selection algorithm is the random choice of an element in range, it should not break the sort (the data should still be sorted) — it can just lead to sub-optimal sorting performance.Frontality
It's not objective to compare multiple algorithms by such practical performance, since the specificality of the input data, and especially you implemented some of them while calling optimized third-party library for others. We usually compare algorithms in a mathematical way.Groove
R
3

The basic facts are as follows:

  • Heapsort has worst-case O(n log(n)) performance but tends to be slow in practice.
  • Quicksort has O(n log(n)) performance on average, but O(n^2) in the worst case but is fast in practice.
  • Introsort is intended to harness the fast-in-practice performance of quicksort, while still guaranteeing the worst-case O(n log(n)) behavior of heapsort.

One question to ask is, why is quicksort faster "in practice" than heapsort? This is a tough one to answer, but most answers point to how quicksort has better spatial locality, leading to fewer cache misses. However, I'm not sure how applicable this is to Python, as it is running in an interpreter and has a lot more junk going on under the hood than other languages (e.g. C) that could interfere with cache performance.

As to why your particular introsort implementation is slower than Python's heapsort - again, this is difficult to determine. First of all, note that the heapq module is written in Python, so it's on a relatively even footing with your implementation. It may be that creating and concatenating many smaller lists is costly, so you could try rewriting your quicksort to act in-place and see if that helps. You could also try tweaking various aspects of the implementation to see how that affects performance, or run the code through a profiler and see if there are any hot spots. But in the end I think it's unlikely you'll find a definite answer. It may just boil down to which operations are particularly fast or slow in the Python interpreter.

Restriction answered 30/12, 2014 at 4:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.