Is there a C# equivalent to C++ std::partial_sort?
Asked Answered
P

3

13

I'm trying to implement a paging algorithm for a dataset sortable via many criteria. Unfortunately, while some of those criteria can be implemented at the database level, some must be done at the app level (we have to integrate with another data source). We have a paging (actually infinite scroll) requirement and are looking for a way to minimize the pain of sorting the entire dataset at the app level with every paging call.

What is the best way to do a partial sort, only sorting the part of the list that absolutely needs to be sorted? Is there an equivalent to C++'s std::partial_sort function available in the .NET libraries? How should I go about solving this problem?

EDIT: Here's an example of what I'm going for:

Let's say I need to get elements 21-40 of a 1000 element set, according to some sorting criteria. In order to speed up the sort, and since I have to go through the whole dataset every time anyway (this is a web service over HTTP, which is stateless), I don't need the whole dataset ordered. I only need elements 21-40 to be correctly ordered. It is sufficient to create 3 partitions: Elements 1-20, unsorted (but all less than element 21); elements 21-40, sorted; and elements 41-1000, unsorted (but all greater than element 40).

Parkland answered 13/3, 2013 at 20:5 Comment(13)
possible duplicate #2541102Giff
Not at all-- that's a selection question, and this is a partial sort question. By all means, though, feel free to provide an answer on how this problem can be solved in terms of that problem, if that's possible.Parkland
When paging, if something was at the end of the list and by sorting it belonged at the beginning how would a partial sort work? Wouldn't any sort need to touch every element?Humankind
Was about to post the Array.Sort answer too, but as a more general querstion: If you use this to facilitate paging, wouldn't a partially sorted output be kind of confusing to the end user? I.e. how would a "new" record on page 2 behave that should go to page 1, according to order?Tincture
Can you clarify this a bit more? Are you saying if you have data like this: 9,8,7,6,5,4,3,2,1 and you say Top 3: you really want to get 1,2,3. If the rest of the array is sorted, you don't care. Is that...sort of what you're after?Solomon
@aquinas: That's a special case of what I'm after. What I really want is slightly more general: In your example, I want to be able to say "4th through 6th" and get something like: 3, 2, 1 (unsorted, but all less than proper 4th element); 4, 5, 6 (sorted and in the same place they would be for a sorted list); 8, 7, 9 (unsorted, but all greater than proper 6th element). As you can see, the Top 3 is a special case of that (and same with the Bottom 3, for that matter).Parkland
It's like a divide a conquer sort just stop dividing and conquering early once the range you are after is satisfied?Humankind
You could put it that way, sure.Parkland
why not take the values you have declared in the var someCollection = new[]{5,2,7,9,0,6,8} create an empty int[] sortCollection = {} then from there create a var sortList = someCollection.ToList(); then sort the sortList.Sort() then do sortCollection = sortList.ToArray(); now the sortCollection will have all your integer values sorted..Anzac
@DJKRAZE That's a full sort, which I'm explicitly avoiding.Parkland
I misunderstood Platinum if needed I will delete my answer unless someone else may find it useful in the event that they need to do a full sort..Anzac
why would you not store your data pre-sorted in the first place? you might look into a NoSQL solution to better optimize indexed data storage.Giff
FlavorScape: The data is partitioned into a SQL database (which we control) and a remote database accessible via REST interface, and we have no choice but to integrate them in the app layer.Parkland
S
5

OK. Here's what I would try based on what you said in reply to my comment.

I want to be able to say "4th through 6th" and get something like: 3, 2, 1 (unsorted, but all less than proper 4th element); 4, 5, 6 (sorted and in the same place they would be for a sorted list); 8, 7, 9 (unsorted, but all greater than proper 6th element).

Lets add 10 to our list to make it easier: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.

So, what you could do is use the quick select algorithm to find the the ith and kth elements. In your case above i is 4 and k is 6. That will of course return the values 4 and 6. That's going to take two passes through your list. So, so far the runtime is O(2n) = O(n). The next part is easy, of course. We have lower and upper bounds on the data we care about. All we need to do is make another pass through our list looking for any element that is between our upper and lower bounds. If we find such an element we throw it into a new List. Finally, we then sort our List which contains only the ith through kth elements that we care about.

So, I believe the total runtime ends up being O(N) + O((k-i)lg(k-i))

static void Main(string[] args) {
    //create an array of 10 million items that are randomly ordered
    var list = Enumerable.Range(1, 10000000).OrderBy(x => Guid.NewGuid()).ToList();

    var sw = Stopwatch.StartNew();
    var slowOrder = list.OrderBy(x => x).Skip(10).Take(10).ToList();
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~8 seconds on my machine

    sw.Restart();
    var smallVal = Quickselect(list, 11);
    var largeVal = Quickselect(list, 20);
    var elements = list.Where(el => el >= smallVal && el <= largeVal).OrderBy(el => el);
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~1 second on my machine
}

public static T Quickselect<T>(IList<T> list , int k) where T : IComparable {
    Random rand = new Random();
    int r = rand.Next(0, list.Count);
    T pivot = list[r];
    List<T> smaller = new List<T>();
    List<T> larger = new List<T>();
    foreach (T element in list) {
        var comparison = element.CompareTo(pivot);
        if (comparison == -1) {
            smaller.Add(element);
        }
        else if (comparison == 1) {
            larger.Add(element);
        }
    }

    if (k <= smaller.Count) {
        return Quickselect(smaller, k);
    }
    else if (k > list.Count - larger.Count) {
        return Quickselect(larger, k - (list.Count - larger.Count));
    }
    else {
        return pivot;
    }
}
Solomon answered 13/3, 2013 at 21:19 Comment(4)
Thanks-- this does look promising. I'll give it a try!Parkland
Doesn't this approach assume that values aren't repeating? Suppose every value in the list is 4(smallVal) or 5(largeVal). Also aren't you assuming that the largeVal you are after is predictable? Maybe this is a weakness with std::partial_sort as well (this stuff is WAY over my head at the moment).Humankind
@Solomon Unfortunately, the link is now dead :-(Parkland
@PlatinumAzure, I love that you came back to this answer 4 years after you asked the question :) Fixed link.Solomon
R
-1

Here is my attempt to make an extension method for this (I always sort from the beginning of the list, so the function does not take the start index as an argument ; I leave this as an exercise for the reader :-) )

public static class ListExtensions
{
    public static void PartialSort<T>(this List<T> list, int count) where T : IComparable
    {
        ArgumentNullException.ThrowIfNull(list);
        if (count > list.Count)
            throw new ArgumentOutOfRangeException(nameof(count), "Count must be less than or equal to the list count.");

        for (int i = 0; i < count; i++)
        {
            T min = list[i];
            int minIndex = i;
            for (int j = i + 1; j < list.Count; j++)
            {
                if (min.CompareTo(list[j]) == 1)
                {
                    min = list[j];
                    minIndex = j;
                }
            }

            T tmp = list[i];
            list[i] = min;
            list[minIndex] = tmp;
        }
    }
}

Not sure it's very effective, though. I don't have time to make a benchmark. May be someone smarter than me can may be make an improvement from this base.

Ruyle answered 10/5 at 16:29 Comment(0)
A
-2

You can use List<T>.Sort(int, int, IComparer<T>):

inputList.Sort(startIndex, count, Comparer<T>.Default);
Aneurysm answered 13/3, 2013 at 20:11 Comment(1)
That implies that the elements I want are all located contiguously, something that cannot be guaranteed on an unordered dataset. Once the partitioning I have described above is completed, this method (or the Array.Sort answer provided by Dai) will prove useful.Parkland

© 2022 - 2024 — McMap. All rights reserved.