Quicksort - Hoare's partitioning with duplicate values
Asked Answered
F

6

12

I have implemented the classic Hoare's partitioning algorithm for Quicksort. It works with any list of unique numbers [3, 5, 231, 43]. The only problem is when I have a list with duplicates [1, 57, 1, 34]. If I get duplicate values I enter an infinite loop.

private void quicksort(int[]a, int lo, int hi) {
    if (lo < hi) {
        int q = hoare_partition(a, lo, hi);
        quicksort(a, lo, q - 1);
        quicksort(a, q + 1, hi);
    }
}

private int hoare_partition(int[] a, int lo, int hi) {

    int pivot = a[hi];
    int i = lo;
    int j = hi;

    while (true) {

        while (a[i] < pivot) i++;
        while (a[j] > pivot) j--;

        if (i < j) swap(a, i, j);
        else return j;
    }
}

Is it possible that Hoare's implementation is incorrect and is unable to cope with duplicates?

I know this has been asked many times (and I tried them all) but I am having difficulties figuring the solution for this implementation.

Friedman answered 1/11, 2016 at 20:58 Comment(3)
while (a[j] >= pivot) shall help.Penney
I tried it but im getting a java.lang.StackOverflowError when it's while (a[j] >= pivot).Friedman
getting [StackOverflowError] (remember where you're discussing this?) Recurse into the non-biggest partition, iterate for the other/non-smaller ones.Taffeta
C
12
algorithm partition(A, lo, hi) is
    pivot := A[lo]
    i := lo – 1
    j := hi + 1
    loop forever
        do
            i := i + 1
        while A[i] < pivot

        do
            j := j – 1
        while A[j] > pivot

        if i >= j then
            return j

        swap A[i] with A[j]

The pseudocode above is taken from Wikipedia. Let's compare it with your code.

The problem is that you have to move the indices after the swap. The pseudocode uses do-while loop instead of while loop to move the indices after the swap. Also pay attention to the initial values of i and j.

algorithm quicksort(A, lo, hi) is
    if lo < hi then
        p := partition(A, lo, hi)
        quicksort(A, lo, p)
        quicksort(A, p + 1, hi)

For the recursive step, you might need to take care of the edge cases (i.e., the indices). It should work if you change quicksort(a, lo, q-1) to quicksort(a, lo, q).

A complete working version I just wrote:

import java.util.Arrays;

public class Test {

    public static void main(String[] args) throws Exception {
        int[] input = {1, 57, 1, 34};
        quicksort(input, 0, input.length - 1);
        System.out.println(Arrays.toString(input));
    }

    private static void quicksort(int[]a, int lo, int hi) {
        if (lo < hi) {
            int q = hoare_partition(a, lo, hi);
            quicksort(a, lo, q);
            quicksort(a, q + 1, hi);
        }
    }

    private static int hoare_partition(int[] a, int lo, int hi) {

        int pivot = a[lo];
        int i = lo - 1;
        int j = hi + 1;

        while (true) {
            do {
                i++;
            }
            while (a[i] < pivot);

            do {
                j--;
            }
            while (a[j] > pivot);

            if (i >= j) {
                return j;
            }
            swap(a, i, j);

        }
    }

    private static void swap(int[] a, int i, int j) {
        int temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }

}

If you prefer the while loop instead of do-while:

private static int hoare_partition(int[] a, int lo, int hi) {

            int pivot = a[lo];
            int i = lo;
            int j = hi;

            while (true) {

                while (a[i] < pivot) i++;

                while (a[j] > pivot) j--;

                if (i >= j) {
                    return j;
                }
                swap(a, i, j);
                i++;
                j--;

            }
        }
Collect answered 1/11, 2016 at 23:16 Comment(2)
Thank you very much for your answer it was extremely helpful. I was missing the part where you move the indices after a swap which do...while solves. Would you suggest having the normal while and do i++; j--; right after you use swap(a,i,j); instead of do...while?Friedman
@ValentinK. Absolutely. It's just a matter of style. I updated the answer in case you'd like to stick to the while loop :)Collect
J
2

Here is a C++ example that implements Hoare scheme plus a median of 3 check, a duplicate of pivot check to exclude middle values equal to pivot (note that values equal to pivot can end up anywhere, not just the middle, so this doesn't help much), only using recursion on the smaller part, looping back for the larger part (this prevents stack overflow). Worst case time complexity is still O(n^2), but it takes fairly specific data patterns to produce this (median of 3 would have to consistently keep picking near smallest or near largest values).

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        while(i > lo && a[i] == p)  // exclude middle values == pivot
            i--;
        while(k < hi && a[k] == p)
            k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}
Janeyjangle answered 2/11, 2016 at 6:9 Comment(0)
C
2

An address to @Terry Li's answer above as my reputation is not sufficient to comment. First of all, thank you very much for the detailed post. However, I believe there are some issues with the alternative function with while loop you provided. It doesn't return the index of the original pivot and it is inaccurate per se. (Please try with hoare_partition([6,7,8,9,1,2,3,4,5], 0, 8)) The problem is caused by incrementing i and decrementing j at the same time, causing you to lose track of the pivot. Hence in the proposed fix below, I have inserted a small condition to ensure whichever index storing the pivot is not changed. Please correct me if I'm wrong.

private static int hoare_partition(int[] a, int lo, int hi) {

            int pivot = a[lo];
            int i = lo;
            int j = hi;

            while (true) {

                while (a[i] < pivot) i++;

                while (a[j] > pivot) j--;

                if (i >= j) {
                    return i;
                }
                swap(a, i, j);
                if (a[j] == pivot)
                    i++;
                elif (a[i] == pivot)
                    j--;

            }
        }
Carhop answered 27/9, 2020 at 15:39 Comment(2)
I ran his code on your input and it gives the correct output. Can you please explain what was the problem here? Also, in the Hoare partition, the index that is returned is not always the pivot.Torpedoman
You are correct about @Terry Li's code. This code has an issue too, for input [100, 3, 100] this code produces [100, 3, 100] which is incorrect.Wonderstricken
C
1

I know this post is old, but I believed I've figured out a solution. The problem arises when hoare_partition immediately finds that a[i] at i = lo needs to be swapped out of its position, but it never finds a suitable a[j] to swap with. We know this has occured if q == lo. This issue arises when the pivot is the smallest value in the array and is a duplicate. To address this, we check if q == lo in quicksort, and if so, we manually swap a[lo] with a[pivot_index] to make sure that a[lo] is moved out of its position and into a valid position. We only need to make one recursive call, since the left partition will be of size 1.

I also made slight modifications to the the conditions in the inner while loops. You need to check if i < j in the inner loops, in addition to the outer loop. The second inner while now checks if a[j] is >= pivot, rather than strictly >. The last recursive call uses q instead of q+1 as the new lo. I took pivot_index as a parameter. Lastly, I returned j instead of i.

private void quicksort(int[] a, int lo, int hi) {
  if (lo < hi) {
    // int pivot_index = (rand() % (hi - lo + 1)) + lo; // randomized pivot_index is better. should be in range [lo, hi] inclusive 
    int pivot_index = hi; 
    int q = hoare_partition(a, lo, hi, pivot_index);

    if (q == lo) {
      swap(a, lo, pivot_index);
      quicksort(a, lo + 1, hi);

    } else {
      quicksort(a, lo, q - 1);
      quicksort(a, q, hi);

     }
  }
}

private int hoare_partition(int[] a, int lo, int hi, int pivot_index) {
  int pivot = a[pivot_index];
  int i = lo;
  int j = hi;

  while (true) {
    
    while (i < j && a[i] < pivot) i++;
    while (i < j && a[j] >= pivot) j--;
    
    if (i < j) swap(a, i, j);
    else return j;
  }
}
Collete answered 16/2, 2022 at 22:20 Comment(0)
E
0

Here is one more example with while loop

private static int partition(int[] a, int start, int end){
   int i = start-1, j = end+1;

   while(true){
      while(a[++i] < a[start]);
      while(a[start] < a[--j]);

      if (i >= j) return j;
      swap(a, i, j);
   }
}
Epicardium answered 25/1, 2017 at 4:48 Comment(1)
How does this answer Is it possible that Hoare's implementation is incorrect and is unable to cope with duplicates?Taffeta
O
0

Is it possible that Hoare's implementation is incorrect and is unable to cope with duplicates?

That's correct, Hoare's algorithm breaks when the pivot appears multiple times in the array.

One fix is to use a three-way partitioning algorithm. Instead of returning a single index, it returns two indices: Where the pivot section begins, and where it ends.

Obnubilate answered 29/12, 2021 at 3:30 Comment(1)
As worded, this answer is misleading instead of useful. Algorithms are valid solutions by definition. Implementations may be broken. As originally stated, quicksort with Hoare partitioning may use a huge number of nested calls. Take an input of, say 999 identical values to see what happens with such an input.Taffeta

© 2022 - 2024 — McMap. All rights reserved.