Algorithm to find k smallest numbers in array of n items
Asked Answered
P

14

37

I'm trying to write an algorithm which can print the k smallest numbers in an n-size-array in O(n) time, but I cannot reduce the time complexity to n. How can I do this?

Pentastich answered 21/3, 2011 at 16:28 Comment(3)
I think some clarification is in order. Are you looking for the K smallest numbers in an array of N numbers?Booklet
nop that's all explanations written in exercise...... i think i have to show all k small numbers in array.... :(Pentastich
@Pentastich A similar question was asked here: gateoverflow.in/27194/tifr2014-b-9Sisk
G
58

I've done this in an interview before, and one of the most elegant/efficient ways to do this is

O(n log k). 
with space: O(k) (thanks, @Nzbuu)

Basically you're going to use a max-heap of size limited to k. For each item in the array, check to see if it's smaller than the max (only O(1)). If it is, remove the max and put that in the heap (O(log k)). If its bigger, go to the next item.

Of course, the heap doesn't yield a sorted list of k items, but that can be done in O(k log k) which is easy.

Similarly, you can do the same for finding the largest k items, in which case you would use a min-heap.

Gentlemanly answered 6/10, 2011 at 14:30 Comment(10)
+1. This is what I would do. It's also easy to implement and only requires O(k) space.Interval
@Gentlemanly how do you start the loop ? What is in the root of the tree before the first iteration and how many child nodes does each node have in this case ...I know it can have atmost two nodes but what about in this case ?Atheling
@Atheling The heap is initialized as empty and then you want to iterate over the first k items in your array to fill it up. Then proceed as I described, and your code will maintain the heap's size at a constant k. A heap is a standard tree data structure which usually has two children per node (see: en.wikipedia.org/wiki/Heap_(data_structure))Gentlemanly
Blum et al.'s selection algorithm takes O(n) time and O(1) space in the worst case. Even if you want to report the items sorted, you can do it in O(n + k log(k)) time/O(1) space.Magnesia
@YvesDaoust Theory not withstanding, the theoretically right algorithm is usually smaller in practice in this case. Why? Well first if the list is random and k << n then only O(k log(n)) times is a number smaller than the largest of the k smallest so far, which makes this effectively O(n) with a better constant. Plus this approach is an online algorithm, which leads to huge improvements in processing speed when random access is expensive. Such as when your data is in a log file on disk.Leakey
@btilly: arguments that empirically mix asymptotic bounds and "practical" considerations are inadmissible.Magnesia
@YvesDaoust Inadmissible in what context? Theory's job is to inform us about what to do in practice. However it is important for practitioners to also understand how and why theory can mislead us. For example I once switched one monthly job from hashing to sorting. In theory hashing was O(n) vs O(n log(n)) with a better constant for hashing. In practice processing time dropped from 5 days to 1 hour.Leakey
In this case the theoretically right selection algorithm will be faster if k and n are comparable in size. The theoretically wrong heap algorithm will be faster if k is much smaller than n, and doubly so if your data is stored on disk. A practitioner should therefore know both solutions and know which to use when if optimization is important.Leakey
I came up with same solution in an Interview. I gave nlog(k) as time complexity, but the interviewer meticulously calculated time complexity as O(k + (n-k)Logk). I told that it equates to O(nlog(k)) he wasn't convinced.Sayer
@Sayer hmm. I guess technically I'd count it as k*logk [setup] + (n-k)logk [iteration] + k [teardown] which simplifies to O(nlogk + k), the worst case being a reverse sorted list. Sorry if you didn't get the job!Gentlemanly
I
27

you will need to find the k'th smallest element using 'selection algorithm', which is O(n), and then iterate the array again and return each element which is smaller/equals it.
selection algorithm: http://en.wikipedia.org/wiki/Selection_algorithm
you will have to pay attention if you have repeats: you will need to make sure you are not returning more then k elements (it is possible if for instance you have 1,2,...,k,k,k,...)

EDIT :
the full algorithm, and returning a list, as requested: let the array be A

 1. find the k'th element in A using 'selection algorithm', let it be 'z'
 2. initialize an empty list 'L'
 3. initialize counter<-0
 4. for each element in A: 
 4.1. if element < z: 
   4.1.1. counter<-counter + 1 ; L.add(element)
 5. for each element in A:
 5.1. if element == z AND count < k:
   5.1.1. counter<-counter + 1 ; L.add(element)
 6. return L

note here that a 3rd iteration is required if your list might have duplicates. if it can't - it is needless, just change the condition in 4.1 to <=.
also note: L.add is inserting an element to a linked list and thus is O(1).

Illmannered answered 21/3, 2011 at 16:35 Comment(2)
A small optimization to your 5th step: 5. while ( counter <= k ): 5.1 L.add(z); 5.2 counter <- counter + 1;. This will ensure that you don't go over the entire array again since you know the only element that will be added to the list L in step 5 is z and only (k - counter) iterations are required instead of N.Horgan
@srikfreak: I decided not to make this optimization since it's clearly HW question, so I wanted to keep it simple and easy to understand, by avoiding adding conditions. In the case of implementing this algorithm to a real software - of course you are right, and this optimization must be done.Illmannered
B
5

Assuming you're trying to show the K smallest numbers, you can use Hoare's Select algorithm to find the kth smallest number. That partitions the array into the smaller numbers, the kth number, and the larger numbers.

Booklet answered 21/3, 2011 at 16:36 Comment(1)
+1, although beware that Hoare's quickselect algorithm isn't O(n), it has bad worst cases. The fixed version is called the "median-of-medians" method, and was not by Hoare.Eu
I
2

This can be done in expected linear time(O(n)). First find the kth smallest element of the array (using pivot partition method for finding kth order statistic) and then simply iterate through the loop to check which elements are less than the kth smallest element. Note that this works correctly only for distinct element.

Here is the code in c:

    /*find the k smallest elements of an array in O(n) time. Using the Kth order 
statistic-random pivoting algorithm to find the kth smallest element and then looping 
through the array to find the elements smaller than kth smallest element.Assuming 
distinct elements*/


    #include <stdio.h>
    #include <math.h>
    #include <time.h>
    #define SIZE 10
    #define swap(X,Y) {int temp=X; X=Y; Y=temp;}


    int partition(int array[], int start, int end)
    {
        if(start==end)
            return start;
        if(start>end)
            return -1;
        int pos=end+1,j;
        for(j=start+1;j<=end;j++)
        {       
            if(array[j]<=array[start] && pos!=end+1)
            {
                swap(array[j],array[pos]);
                pos++;
            }
            else if(pos==end+1 && array[j]>array[start])
                pos=j;
        }
        pos--;
        swap(array[start], array[pos]);
        return pos;
    }

    int order_statistic(int array[], int start, int end, int k)
    {
        if(start>end || (end-start+1)<k)
            return -1;                   //return -1 
        int pivot=rand()%(end-start+1)+start, position, p;
        swap(array[pivot], array[start]);
        position=partition(array, start, end);
        p=position;
        position=position-start+1;                  //size of left partition
        if(k==position)
            return array[p];
        else if(k<position)
            return order_statistic(array, start,p-1,k);
        else
            return order_statistic(array,p+1,end,k-position);
    }


    void main()
    {
        srand((unsigned int)time(NULL));
        int i, array[SIZE],k;
        printf("Printing the array...\n");
        for(i=0;i<SIZE;i++)
            array[i]=abs(rand()%100), printf("%d ",array[i]);
        printf("\n\nk=");
        scanf("%d",&k);
        int k_small=order_statistic(array,0,SIZE-1,k);
        printf("\n\n");
        if(k_small==-1)
        {
            printf("Not possible\n");
            return ;
        }
        printf("\nk smallest elements...\n");
        for(i=0;i<SIZE;i++)
        {
            if(array[i]<=k_small)
                printf("%d ",array[i]);
        }
    }
Individuation answered 21/8, 2013 at 11:39 Comment(0)
D
2

It is possible to find the k smallest of n elements in O(n) time (by which I mean true O(n) time, not O(n + some function of k)). Refer to the Wikipedia article "Selection algorithm", especially the subsections on "unordered partial sorting" and "median selection as pivot strategy", and also to the article "Median of medians" for the essential piece that makes this O(n).

Daimon answered 16/5, 2014 at 17:49 Comment(0)
P
1

As mentioned, there are two ways to accomplish such task:

1) You can sort the whole array of n elements with quicksort, heapsort or any O (n log n) sorting algorithm you want, and then pick the m smallest values in your array. This method will work in O(n log n).

2) You can use selection algorithm to fink m smallest elements in your array. It will take O(n) time to find the kth smallest value, since you will iterate this algorithm m times, the overall time will be m x O(n) = O(n) .

Practitioner answered 20/2, 2015 at 17:48 Comment(0)
M
1

Best possible solution to the problem is as follows.Use Quick sort to find pivots and discard the part where this kth element doesn't lie, and recursively find the next pivot. (It's kth Max finder , You need to change the if else condition to make it kth Min Finder) .Here is the JavaScript code-

  // Complexity is O(n log(n))
  var source = [9, 2, 7, 11, 1, 3, 14, 22];

  var kthMax = function(minInd, MaxInd, kth) {
      // pivotInd stores the pivot position 
      // for current iteration
      var temp, pivotInd = minInd;
      if (minInd >= MaxInd) {
        return source[pivotInd];
      }

      for (var i = minInd; i < MaxInd; i++) {
        //If an element is greater than chosen pivot (i.e. last element)
        //Swap it with pivotPointer element. then increase ponter
        if (source[i] > source[MaxInd]) {
          temp = source[i];
          source[i] = source[pivotInd];
          source[pivotInd] = temp;
          pivotInd++;
        }
      }
      // we have found position for pivot elem. 
      // swap it to that position place .
      temp = source[pivotInd];
      source[pivotInd] = source[MaxInd];
      source[MaxInd] = temp;

      // Only try to sort the part in which kth index lies.
      if (kth > pivotInd) {
        return kthMax(pivotInd + 1, MaxInd, kth);
      } else if (kth < pivotInd) {
        return kthMax(minInd, pivotInd - 1, kth);
      } else {
        return source[pivotInd];
      }

    }
    // last argument is kth-1 , so if give 2 it will give you,
    // 3rd max which is 11

  console.log(kthMax(0, source.length - 1, 2));
Milliliter answered 18/10, 2015 at 7:45 Comment(2)
Worse case QuickSort is O(n*n)Veradi
Yes , but , best case in O(n log n) , and that's best you can do to solve this problem . If you go for merge or Heap , it will take O(n) extra memory , so quick sort is pretty acceptable for average n log n sorting solutions.Milliliter
V
1

I know not exactly what you are looking for but pretty simple O(n * k) time and O(k) space. This is the biggest K so would need to flop it around.

For the brute for min of k (result) can substitute a heap

private int[] FindKBiggestNumbersM(int[] testArray, int k)
{
    int[] result = new int[k];
    int indexMin = 0;
    result[indexMin] = testArray[0];
    int min = result[indexMin];

    for (int i = 1; i < testArray.Length; i++)
    {
        if(i < k)
        {
            result[i] = testArray[i];
            if (result[i] < min)
            {
                min = result[i];
                indexMin = i;
            }
        }
        else if (testArray[i] > min)
        {
            result[indexMin] = testArray[i];
            min = result[indexMin];
            for (int r = 0; r < k; r++)
            {
                if (result[r] < min)
                {
                    min = result[r];
                    indexMin = r;
                }
            }
        }
    }
    return result;
}
Veradi answered 27/1, 2017 at 16:46 Comment(0)
D
1

Another Technique- Use QuickSelect Algorithm and the result would be all the elements to the left of the returned result. The average time complexity would be O(n) and in worst case it would be O(n^2). The space complexity would be O(1).

Differential answered 21/4, 2017 at 18:11 Comment(0)
F
1

This can be done in O(n) time using O(n) space I believe. As was mentioned you can use Hoares algorithm, or a variation of quickselect.

Basically you run Quicksort on the array, but run only on the side of the partition needed to ensure that there are K or K-1 larger elements than the pivot (you can either include lr exclude the pivot). If the list does not need to be sorted, then, you can just print the remaineder of the array from the pivot. Since quicksort can be done in place, this takes O(n) space, and since you are halfing the portion of the array (on average) that you check each time is takes O(2n) == O(n) time

Finegrain answered 12/4, 2019 at 12:8 Comment(0)
B
0

Just sort the array with Merge Sort and then print the first k number, it will take n*log2(n) in the worst case.

Brainstorming answered 21/3, 2011 at 16:32 Comment(3)
@Illmannered gr: you are right, I was just correcting that while you commented, thanks for commenting anyway.Brainstorming
i tried but the exercise on the box suggests to write an algorithm in O(n).......Pentastich
I don't think that's possible.Lust
S
0

How about using a Heap to store the values. This cost is n when you go through each value in the array.

Then go through the Heap to get the smallest k values.

Runtime is O(n) + O(k) = O(n)

Of course, memory space is now O(n + n)

Siphon answered 24/3, 2011 at 23:31 Comment(2)
note that deletion from a heap is O(n), so this solution will result in O(n+klog(n)) which might be bigger then O(n) for large kIllmannered
For this problem, as defined, k is fixed, so O(k log(n)) is o(n).Interval
O
0

This is a slight variation in the recursion's base condition, in the selection algorithm ,to return pointer to the dynamic array containing all the first k smallest elements in random order , it is O(n).

void swap(int *a, int *b){
int temp = *a;
*a = *b;
*b = temp;
}
int partition(int *A, int left, int right){
int pivot = A[right], i = left, x;

for (x = left; x < right; x++){
    if (A[x] < pivot){
        swap(&A[i], &A[x]);
        i++;
    }
}

swap(&A[i], &A[right]);
return i;
}


 int* quickselect(int *A, int left, int right, int k){

//p is position of pivot in the partitioned array
int p = partition(A, left, right);

//k equals pivot got lucky
if (p == k-1){
    int*temp = malloc((k)*sizeof(int));
    for(int i=left;i<=k-1;++i){
        temp[i]=A[i];
    }
    return temp;
}
//k less than pivot
else if (k - 1 < p){
    return quickselect(A, left, p - 1, k);
}
//k greater than pivot
else{
    return quickselect(A, p + 1, right, k);
}

}

Otter answered 1/9, 2019 at 17:16 Comment(0)
D
0

This is close to O(n) when the value of k is small relative to n, and this works when the array has duplicated elements:

a=[1,1,13,8,10,5,17,1,2,12,3,15,7,16,14,20,18,19,4,9,6,11]
k=5
n=a.length

outmax=NaN
out=[]

for(i=0;i<n;i++){
  if(i<k||a[i]<outmax){
    insertat=k-1
    for(j=0;j<k-1;j++)if(a[i]<out[j]||j==i){insertat=j;break}
    for(j=k-1;j>insertat;j--)out[j]=out[j-1]
    out[insertat]=a[i]
    outmax=out[k-1]
  }
}

console.log(out)
Darky answered 8/8, 2022 at 14:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.