Parallel Sort Algorithm
Asked Answered
C

5

26

I'm looking for a simple implementation of a parallelized (multi-threaded) sort algorithm in C# that can operate on List<T> or Arrays, and possibly using Parallel Extensions but that part isn't strictly necessary.

Edit: Frank Krueger provides a good answer, however I wish to convert that example to one that doesn't use LINQ. Also note that Parallel.Do() seems to have been superceded by Parallel.Invoke().

Thanks.

Chrysarobin answered 13/12, 2009 at 19:29 Comment(1)
If the sorting should be stable (equal elements keep their original order), or if the comparisons are expensive (less comparisons), the mergesort algorithm is a good alternative to quicksort.Esau
H
46

From "The Darkside" in his article Parallel Extensions to the .Net Framework we have this parallel extensions version of quicksort:

(Edit: Since the link is now dead, interested readers may find an archive of it at the Wayback Machine)

private void QuicksortSequential<T>(T[] arr, int left, int right) 
where T : IComparable<T>
{
    if (right > left)
    {
        int pivot = Partition(arr, left, right);
        QuicksortSequential(arr, left, pivot - 1);
        QuicksortSequential(arr, pivot + 1, right);
    }
}

private void QuicksortParallelOptimised<T>(T[] arr, int left, int right) 
where T : IComparable<T>
{
    const int SEQUENTIAL_THRESHOLD = 2048;
    if (right > left)
    {
        if (right - left < SEQUENTIAL_THRESHOLD)
        {

            QuicksortSequential(arr, left, right);
        }
        else
        {
            int pivot = Partition(arr, left, right);
            Parallel.Do(
                () => QuicksortParallelOptimised(arr, left, pivot - 1),
                () => QuicksortParallelOptimised(arr, pivot + 1, right));
        }
    }
}

Notice that he reverts to a sequential sort once the number of items is less than 2048.

Humoral answered 13/12, 2009 at 19:36 Comment(7)
Thank you Richard - I thought I was going to have to answer a VB question or something to get 10,000 :-) It also feels pretty lame breaking the barrier with someone else's code, but I'll take it.Humoral
I'm not sure on the semantics of Parallel.Do, but I would expect this would perform badly for large arrays due to a new thread possibly being created for every <2048 block of data. Lots of threads is very bad. If the runtime limits the number of threads maybe it's ok.Hypothermia
Make that 10,010. Thanks, this complete and quick answer is actually really useful. You can rest assured that you got to 10K on a 'proper' question ;)Chrysarobin
@Hypothermia your concerns are completely warranted but I selected this implementation because the author actually bothered to time his results and this one did perform better than the sequential version. Also remember that the parallel extensions don't just create threads willy-nilly, threads are pooled and it's smart enough not to create more threads than your CPU can handle.Humoral
ps. Parallel.Do is now as Parallel.Invoke, and given big enough arrays, you can expect to gain massive gains(50%!).Galvanic
Server not found. Source link does not exist now :(Limulus
For those looking to read the link, you can use this instead: web.archive.org/web/20120201062954/http://www.darkside.co.za/…Wahl
H
8

Update I now achieve better than 1.7x speedup on a dual core machine.

I thought I would try writing a parallel sorter that worked in .NET 2.0 (I think, check me on this) and that doesn't use anything other than the ThreadPool.

Here are the results of sorting a 2,000,000 element array:

Time Parallel    Time Sequential
-------------    ---------------
2854 ms          5052 ms
2846 ms          4947 ms
2794 ms          4940 ms
...              ...
2815 ms          4894 ms
2981 ms          4991 ms
2832 ms          5053 ms

Avg: 2818 ms     Avg: 4969 ms
Std: 66 ms       Std: 65 ms
Spd: 1.76x

I got a 1.76x speedup - pretty close to the optimal 2x I was hoping for - in this environment:

  1. 2,000,000 random Model objects
  2. Sorting objects by a comparison delegate that compares two DateTime properties.
  3. Mono JIT compiler version 2.4.2.3
  4. Max OS X 10.5.8 on 2.4 GHz Intel Core 2 Duo

This time I used Ben Watson's QuickSort in C#. I changed his QuickSort inner loop from:

QuickSortSequential:
    QuickSortSequential (beg, l - 1);
    QuickSortSequential (l + 1, end);

to:

QuickSortParallel:
    ManualResetEvent fin2 = new ManualResetEvent (false);
    ThreadPool.QueueUserWorkItem (delegate {
        QuickSortParallel (l + 1, end);
        fin2.Set ();
    });
    QuickSortParallel (beg, l - 1);
    fin2.WaitOne (1000000);
    fin2.Close ();

(Actually, in the code I do a little load balancing that does seem to help.)

I've found that running this parallel version only pays off when there are more than 25,000 items in an array (though, a minimum of 50,000 seems to let my processor breath more).

I've made as many improvements as I can think of on my little dual core machine. I would love to try some ideas on 8-way monster. Also, this work was done on a little 13" MacBook running Mono. I'm curious how others fare on a normal .NET 2.0 install.

The source code in all its ugly glory is availble here: http://www.praeclarum.org/so/psort.cs.txt. I can clean it up if there's any interest.

Humoral answered 13/12, 2009 at 23:31 Comment(2)
Im interested, but the article link and the source code link above are broken.Hulahula
You should merge your answers,Kirima
C
7

For the record here is a version without lamda expressions that will compile in C#2 and .Net 2 + Parallel Extensions. This should also work with Mono with its own implementation of Parallel Extensions (from Google Summer of code 2008):

/// <summary>
/// Parallel quicksort algorithm.
/// </summary>
public class ParallelSort
{
    #region Public Static Methods

    /// <summary>
    /// Sequential quicksort.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="arr"></param>
    public static void QuicksortSequential<T>(T [] arr) where T : IComparable<T>
    {
        QuicksortSequential(arr, 0, arr.Length - 1);
    }

    /// <summary>
    /// Parallel quicksort
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="arr"></param>
    public static void QuicksortParallel<T>(T[] arr) where T : IComparable<T>
    {
        QuicksortParallel(arr, 0, arr.Length - 1);
    }

    #endregion

    #region Private Static Methods

    private static void QuicksortSequential<T>(T[] arr, int left, int right) 
        where T : IComparable<T>
    {
        if (right > left)
        {
            int pivot = Partition(arr, left, right);
            QuicksortSequential(arr, left, pivot - 1);
            QuicksortSequential(arr, pivot + 1, right);
        }
    }

    private static void QuicksortParallel<T>(T[] arr, int left, int right) 
        where T : IComparable<T>
    {
        const int SEQUENTIAL_THRESHOLD = 2048;
        if (right > left)
        {
            if (right - left < SEQUENTIAL_THRESHOLD)
            {
                QuicksortSequential(arr, left, right);
            }
            else
            {
                int pivot = Partition(arr, left, right);
                Parallel.Invoke(new Action[] { delegate {QuicksortParallel(arr, left, pivot - 1);},
                                               delegate {QuicksortParallel(arr, pivot + 1, right);}
                });
            }
        }
    }

    private static void Swap<T>(T[] arr, int i, int j)
    {
        T tmp = arr[i];
        arr[i] = arr[j];
        arr[j] = tmp;
    }

    private static int Partition<T>(T[] arr, int low, int high) 
        where T : IComparable<T>
    {
        // Simple partitioning implementation
        int pivotPos = (high + low) / 2;
        T pivot = arr[pivotPos];
        Swap(arr, low, pivotPos);

        int left = low;
        for (int i = low + 1; i <= high; i++)
        {
            if (arr[i].CompareTo(pivot) < 0)
            {
                left++;
                Swap(arr, i, left);
            }
        }

        Swap(arr, low, left);
        return left;
    }

    #endregion
}
Chrysarobin answered 13/12, 2009 at 22:33 Comment(1)
Unfortunly, this partition is very slow when data is alreasy sorted. For example, when called on array of zeroes. int[] arr = new int[1024*1024*128]; ParallelSort.QuicksortParallel(arr); and then partition will be [1,2,3,... array.Length]. It seems to be not valid.Conrado
O
6

A merge sort based on the size of the processor cache, with the blocks being divided between the processors comes to mind.

The merge stage could be done by splitting the merge into n bits with each processor starting to merge the blocks from the correct offset into the blocks.

I have not tried writing this.

(As the relative speed of CPU cache and main ram, is not that far off the relative speed of RAM and Tape as the time the merge sort was discovered, maybe we should start using merge sorts again)

Oilcloth answered 23/2, 2010 at 15:32 Comment(0)
H
1

Divide the list you need sorted into equal sized sublists, depending on how many processors you have, and then whenever two parts are done, merge them together to a new part, until there is only one left and all threads completed. Very simple you should be able to implement it yourself, and sorting the lists within each thread can be done using any existing sort algorithm (like in the library).

Hypothermia answered 13/12, 2009 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.