Simply maintain first, second, third.
first = array[0];
second = array[1];
third = array[2];
/* scratch sort for three elements */
if(first < second)
swap(first, second);
if(first < third)
swap(first, third);
if(second < third)
swap(second, third);
/* now go through, bubbling up if we have a hit */
for(i=3;i<N;i++)
{
if(third < array[i])
{
third = array[i];
if(second < third)
{
swap(second, third);
if(first < second)
swap(first, second);
}
}
}
I wouldn't try to scale up to k = four. I think three is about the limit for hardcoding it. As k get large you need to move to a formal method.
This doesn't answer the question you actually asked, which was how to partially sort, but it seems to be what you want.
If you wish to partially sort, you can use quicksort, and simply return early when the pivot goes above the bound you are interested it. So our first pivot divides into five, two. Ignore the last two, and only actually do the sub-sorts of the last five. But whilst it will be faster than quicksort, it won't be a game changer. If you can get a conservative upper bound on the k'th item (eg it's always going to be at most 25% between the minimum and the mean) you can quickly eliminate most of the data. If you get it wrong it's just another pass or two.
Using the quicksort method
int sortfirstk_r(int *array, int N, int k)
{
int pivot = 0;
int j = n -1;
int i = 1;
while(i <= j)
{
if(array[pivot] < array[i])
swap(array[i], array[j--])
else
i++;
}
sortfirstk_r(array, i, k < i ? k : i);
if(i < k)
sortfirstk_r(array +i, N -i, k - i);
}
(Untested and there might be bugs in the slightly tricky sort logic).
However we've naively used the first element as the pivot. If we're sorting a large data set, and it has a normal distribution, and we want the top 1%, the z-score is 2.326. Take a bit more to allow us some sampling error, and we make a first pass with a pivot set at say 2.3 standard deviations above the mean. Then we split the distribution into two sets, the top 1% plus a bit, and the rest. We don't need to further process the rest, and just sort the top group.
qsort
? – Whipcordk
most extreme values in an array is O(kn) and assumingk<n
it is indeed faster. – WireO(n)
step, and popping offk
values from it costsO(log n)
per item. So you can make the costO(max(n, k log n))
(which means ifk == n
, you just paid roughly the expectedO(n log n)
cost of a full sort, where doingk
linear scans would scale toO(n**2)
). I assume stuff likestd::partial_sort
uses an algorithm of this sort to avoid poor scaling whenk
is an appreciable percentage ofn
. – Beepk
, then sort only the values on one side of the resulting partition, which achievesO(max(n, k log k))
performance, is probably whatpartial_sort
really uses. Oops. – Beep