Java, Finding Kth largest value from the array [duplicate]
Asked Answered
E

5

19

I had an interview with Facebook and they asked me this question.

Suppose you have an unordered array with N distinct values

$input = [3,6,2,8,9,4,5]

Implement a function that finds the Kth largest value.

EG: If K = 0, return 9. If K = 1, return 8.

What I did was this method.

private static int getMax(Integer[] input, int k)
{
    List<Integer> list = Arrays.asList(input);
    Set<Integer> set = new TreeSet<Integer>(list);

    list = new ArrayList<Integer>(set);
    int value = (list.size() - 1) - k;

    return list.get(value);
}

I just tested and the method works fine based on the question. However, interviewee said, in order to make your life complex! lets assume that your array contains millions of numbers then your listing becomes too slow. What you do in this case? As hint, he suggested to use min heap. Based on my knowledge each child value of heap should not be more than root value. So, in this case if we assume that 3 is root then 6 is its child and its value is grater than root's value. I'm probably wrong but what you think and what is its implementation based on min heap?

Emergency answered 1/9, 2015 at 5:19 Comment(7)
why didn't you ask for a code sample from the interviewer?Marilynnmarimba
In a min heap, every node is less than or equal to both its children. Hence the root would be 2 rather than 3. One possible layout would be the tree 2 -> [3,4], 3 -> [5,6], 4 -> [8,9].Rahal
Related - How to find the kth largest element in an unsorted array of length n in O(n)?Uncouth
Why convert to a TreeSet and back, rather than just calling Collections.sort?Retrorocket
@immibis oh, yup, I wasn't familiar with that :( I wanted to delete all duplicates from the list and sort it in ascending order. I'm not sure Collections.sort removes duplicates!Emergency
You could directly add it into a TreeSet it doesn't allow duplicates also maintains sorted order.Partridgeberry
Check this answer for complete explanation: #252281Sid
O
19

He has actually given you the whole answer. Not just a hint.

And your understanding is based on max heap. Not min heap. And it's workings are self-explanatory.

In a min heap, the root has the minimum (less than it's children) value.

So, what you need is, iterate over the array and populate K elements in min heap. Once, it's done, the heap automatically contains the lowest at the root.

Now, for each (next) element you read from the array, -> check if the value is greater than root of min heap. -> If yes, remove root from min heap, and add the value to it.

After you traverse your whole array, the root of min heap will automtically contain the kth largest element.

And all other elements (k-1 elements to be precise) in the heap will be larger than k.

Osmen answered 1/9, 2015 at 5:28 Comment(4)
Thanks man, I found where is my problem. Will try to find the right way for implementation.Emergency
@Osmen It would be great if you could explain a bit of the example given in the question. I want to learn it. ThanksPartridgeberry
@UmaKanth, do you want an example of the heap implementation (google has so many examples of it) or explain the other parts of the algorithm a bit more?Osmen
@UmaKanth I have provide the implementation of the problem. Do check in the answers.Craniometry
C
5

Here is the implementation of the Min Heap using PriorityQueue in java. Complexity: n * log k.

import java.util.PriorityQueue;

public class LargestK {

  private static Integer largestK(Integer array[], int k) {
    PriorityQueue<Integer> queue = new PriorityQueue<Integer>(k+1);
    int i = 0;
    while (i<=k) {
      queue.add(array[i]);
      i++;
    }
    for (; i<array.length; i++) {
      Integer value = queue.peek();
      if (array[i] > value) {
        queue.poll();
        queue.add(array[i]);
      }
    }
    return queue.peek();
  }

  public static void main(String[] args) {
    Integer array[] = new Integer[] {3,6,2,8,9,4,5};
    System.out.println(largestK(array, 3));
  }
}

Output: 5

The code loop over the array which is O(n). Size of the PriorityQueue (Min Heap) is k, so any operation would be log k. In the worst case scenario, in which all the number are sorted ASC, complexity is n*log k, because for each element you need to remove top of the heap and insert new element.

Craniometry answered 1/9, 2015 at 5:46 Comment(2)
@njzk2 The complexity is n*log k, not n*log n. The code loop over the array which is O(n). Size of the PriorityQueue (Min Heap) is k, so any operation would be log k. In the worst case scenario, in which all the number are sorted ASC, complexity is n*log k. because for each element you need to remove top of the heap and insert new element.Craniometry
right, I didn't consider that the queue.poll(); made sure the queue was always of size k.Isidraisidro
S
3

Edit: Check this answer for O(n) solution.

You can probably make use of PriorityQueue as well to solve this problem:

public int findKthLargest(int[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums){
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements-k+1 > 0){
            p = pq.poll();
            k++;
        }

        return p;
    }

From the Java doc:

Implementation note: this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).

The for loop runs n times and the complexity of the above algorithm is O(nlogn).

Sid answered 1/9, 2015 at 5:24 Comment(12)
Thanks @akhil, but I think it still has O(n) issue if you assume num array contains a million items. your For and While loop are O(n) each. Or all my knowledge is messed up :(Emergency
Also, in this your space complexity is unnecessarily high because your copying the whole array to the queue.Osmen
One insight I would be looking for: since I'm looking for the K'th-largest value, if I already have K elements in my heap/priority queue, and a number A comes in that's smaller than the smallest of those -- then that A will definitely not be one of the largest K, and thus should not even be put into the data structure. In other words, the data structure should never contain more than K elements. That way, even if you have tens of millions of values -- billions, even! -- and they're streaming in (because they don't all fit in memory), you can still solve the problem if K is small enough.Cumine
And the time comp is actually O(nlog(n)), it's really difficult (if not impossible) to achieve O(n) in this problem.Osmen
Your while loop is O(n) and each poll() inside it is O(log(n)). So the total comp is O(nlog(n))Osmen
@Osmen no, because the while is k, not n. the add loop, however, is n*logn, because add is logn too, as indicated in the documentation quoted.Isidraisidro
@Isidraisidro Codebender means the for loop, not the while loop.Retrorocket
@immibis It seems quite clear to me that the while loop was meant, as evidenced by the reference to the polls which are indeed in the while loopIsidraisidro
@Isidraisidro Well nonetheless... The for loop is O(n) and each add() inside it is O(log(n)). So the total comp is O(nlog(n)).Retrorocket
@njzk2, his while loop has n-k-1 iterations if I am not wrong which is still O(n) if I am not wrong again. But what you said is true, his while loop actually needs only k iterations.Osmen
@Codebender, sorry, i didn't realize the queue was polled from the wrong end, for some reason. n operations indeed, but that can be fixed. the for loop, however, is still n*logn, because add is lognIsidraisidro
@immibis: exactly, the add loop (...) is n*lognIsidraisidro
M
0

Heap based solution is perfect if the number of elements in array/stream is unknown. But, what if they are finite but still you want an optimized solution in linear time.

We can use Quick Select, discussed here.

Array = [3,6,2,8,9,4,5]

Let's chose the pivot as first element:

pivot = 3 (at 0th index),

Now partition the array in such a way that all elements less than or equal are on left side and numbers greater than 3 on right side. Like it's done in Quick Sort (discussed on my blog).

So after first pass - [2,3,6,8,9,4,5]

pivot index is 1 (i.e it's the second lowest element). Now apply the same process again.

chose, 6 now, the value at index after previous pivot - [2,3,4,5,6,8,9]

So now 6 is at the proper place.

Keep checking if you have found the appropriate number (kth largest or kth lowest in each iteration). If it's found you are done else continue.

Maculate answered 1/9, 2015 at 6:20 Comment(0)
I
0

One approach for constant values of k is to use a partial insertion sort.

(This assumes distinct values, but can easily be altered to work with duplicates as well)

last_min = -inf
output = []
for i in (0..k)
    min = +inf
    for value in input_array
        if value < min and value > last_min
            min = value
    output[i] = min
print output[k-1]

(That's pseudo code, but should be easy enough to implement in Java).

The overall complexity is O(n*k), which means it works pretty well if and only if k is constant or known to be less that log(n).

On the plus side, it is a really simple solution. On the minus side, it is not as efficient as the heap solution

Isidraisidro answered 1/9, 2015 at 13:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.