How to find the kth smallest element in the union of two sorted arrays?

Asked 5/1, 2011 at 18:43 Answered 25/5, 2020 at 5:8

Solved arrays algorithm binary-search divide-and-conquer

116

This is a homework question, binary search has already been introduced:

Given two arrays, respectively N and M elements in ascending order, not necessarily unique:
What is a time efficient algorithm to find the kth smallest element in the union of both arrays?

They say it takes O(logN + logM) where N and M are the arrays lengths.

Let's name the arrays a and b. Obviously we can ignore all a[i] and b[i] where i > k.
First let's compare a[k/2] and b[k/2]. Let b[k/2] > a[k/2]. Therefore we can discard also all b[i], where i > k/2.

Now we have all a[i], where i < k and all b[i], where i < k/2 to find the answer.

What is the next step?

Viewable answered 5/1, 2011 at 18:43 Comment(6)

Is O(logN + logM) only referring to the time it takes to find the kth element? Can preprocessing be done to the union beforehand? – Rotl 5/1, 2011 at 19:16

@David. No preprocessing is expected. – Viewable 5/1, 2011 at 19:29

Are duplicates allowed in the arrays? – Rotl 5/1, 2011 at 19:58

possible duplicate of nth smallest number among two databases of size n each using divide and conquer – Chester 5/1, 2011 at 23:57

@David Yes, duplicates are allowed. – Viewable 6/1, 2011 at 7:10

Ok what if N and/or M is less than k/2? – Lionize 27/7, 2012 at 12:4

You've got it, just keep going! And be careful with the indexes...

To simplify a bit I'll assume that N and M are > k, so the complexity here is O(log k), which is O(log N + log M).

Pseudo-code:

i = k/2
j = k - i
step = k/4
while step > 0
    if a[i-1] > b[j-1]
        i -= step
        j += step
    else
        i += step
        j -= step
    step /= 2

if a[i-1] > b[j-1]
    return a[i-1]
else
    return b[j-1]

For the demonstration you can use the loop invariant i + j = k, but I won't do all your homework :)

Reprehension answered 5/1, 2011 at 19:21 Comment(13)

When initializing j, did you mean j = k-i ? – Rotl 6/1, 2011 at 16:1

Do you have a correctness proof for this? I want to believe that this works but honestly I don't see why this should give you the right answer. Can you provide a more detailed explanation or a link to a proof? – Axial 16/1, 2011 at 9:59

This is not a real proof, but the idea behind the algorithm is that we maintain i + j = k, and find such i and j so that a[i-1] < b[j-1] < a[i] (or the other way round). Now since there are i elements in 'a' smaller than b[j-1], and j-1 elements in 'b' smaller than b[j-1], b[j-1] is the i + j-1 + 1 = kth smallest element. To find such i,j the algorithm does a dichotomic search on the arrays. Makes sense? – Lamblike 16/1, 2011 at 10:56

How come O(log k) is O(log n + log m) ? – Masonry 19/2, 2012 at 13:36

Is this the same as follows: Trim the size of A and B to k elements each, then find the median of A[1..k] and B[1..k] ? Thus, kth smallest element in A and B will be the median of A[1..k] and B[1..k]. – Cloudland 6/5, 2012 at 5:29

This doesn't work if all of the values in array 1 come before the values in array 2. – Barbour 24/9, 2012 at 0:34

Why did you use k/4 as a step at first? – Lepido 8/9, 2013 at 9:24

This is plain wrong. If you use the arrays {3, 12, 13, 14, 21, 29, 35, 36, 38, 40, 41} and {-5, -3, 1, 5, 7, 9, 10, 11, 13, 14, 19} then the algorithm returns 7 as the result of the 5th smallest element, while the correct answer is 5. Also the algorithm returns 14 as the 10th smallest element while the correct answer is 12. If you ask for the 14th smallest element then it just simply throws out of bound exception. – Embitter 17/3, 2016 at 18:39

If anybody interested in the correct answer, look at @Fei's explanation below. – Embitter 17/3, 2016 at 19:10

@CaptainFogetti there must be something wrong in your implementation, I get the correct results with exactly the algo above as you can see here (Python) – Lamblike 17/3, 2016 at 22:28

@Jules Yup, that's right. Sorry about that. I was writing j = k - 1 instead of j = k - i. My bad. – Embitter 17/3, 2016 at 23:43

As @JohnKurlak mentioned it doesn't work for values where whole a is smaller than b see repl.it/HMYf/0 – Chook 17/4, 2017 at 16:29

The correct complexity is log(k) = log (m+n). log (m+n) is not equal to log(m) + log (n) – Emoryemote 26/7, 2017 at 23:41

I hope I am not answering your homework, as it has been over a year since this question was asked. Here is a tail recursive solution that will take log(len(a)+len(b)) time.

Assumption: The inputs are correct, i.e., k is in the range [0, len(a)+len(b)].

Base cases:

If length of one of the arrays is 0, the answer is kth element of the second array.

Reduction steps:

If mid index of a + mid index of b is less than k:
- If mid element of a is greater than mid element of b, we can ignore the first half of b, adjust k.
- Otherwise, ignore the first half of a, adjust k.
If k is less than sum of mid indices of a and b:
- If mid element of a is greater than mid element of b, we can safely ignore second half of a.
- Otherwise, we can ignore second half of b.

Code:

def kthlargest(arr1, arr2, k):
    if len(arr1) == 0:
        return arr2[k]
    elif len(arr2) == 0:
        return arr1[k]

    mida1 = len(arr1) // 2  # integer division
    mida2 = len(arr2) // 2
    if mida1 + mida2 < k:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1, arr2[mida2+1:], k - mida2 - 1)
        else:
            return kthlargest(arr1[mida1+1:], arr2, k - mida1 - 1)
    else:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1[:mida1], arr2, k)
        else:
            return kthlargest(arr1, arr2[:mida2], k)

Please note that my solution is creating new copies of smaller arrays in every call, this can be easily eliminated by only passing start and end indices on the original arrays.

Vaulting answered 20/1, 2012 at 0:2 Comment(12)

why do you call it kthlargest() it returns (k+1)-th smallest elements e.g., 1 is the second smallest element in 0,1,2,3 i.e., your function returns sorted(a+b)[k]. – Freberg 27/7, 2012 at 1:11

I've converted your code to C++. It seems to work – Freberg 27/7, 2012 at 3:54

Won't it be kth smallest instead of kth largest? – Spinach 21/2, 2013 at 14:34

could you please explain why is it important to compare sum of mid indexes of a and b with k? – Lepido 3/11, 2013 at 11:34

In the reduction steps, it is important to get rid of a number of elements in one of the arrays proportional to its length in order to make the run-time logarithmic. (Here we are getting rid of half). In order to do that, we need to select one array whose one of the halves we can safely ignore. How do we do that? By confidently eliminating the half we know for sure is not going to have the kth element. – Vaulting 4/11, 2013 at 18:51

Comparing k with the sum of half-lengths of the arrays gives us information about which half of one of the arrays can be eliminated. If k is larger than sum of half-lengths, we know that first half of one of the arrays can be eliminated. Opposite if k is smaller. Note that we can't eliminate one half from each array at once. For deciding which half of which array to eliminate, we take advantage of the fact that both arrays are sorted, so if k is larger than sum of half-lengths, we can eliminate first half of the array whose middle element is the smaller of the two middle elements. Vice versa. – Vaulting 4/11, 2013 at 19:8

This won't work for a = (0,2,9,10,12,13,14,16) and b = (8) and k = 4 – Synsepalous 9/12, 2013 at 22:29

@JacksonTale, it works for me. I get 10 as the answer, which is expected with 0 based indexing. – Vaulting 19/12, 2013 at 8:36

@Vaulting could you please explain how the run time is log(len(a)+len(b)) ? – Castleberry 21/8, 2016 at 6:55

@PrashantBhanarkar It looks like it should be log(len(a))+log(len(B)), not log(len(a)+len(b) as lambdapilgrim describes. The algorithm cuts A or B in half each iteration, until A or B is of length 0. So, in the worst cast, the algorithm will cut A to length 1 (log(len(a)) recursions) and B to length 0 (log(len(B)) recursions). This would be log(len(a))+log(len(B)). – Humph 3/2, 2018 at 3:35

@AdityaJoshee it returns 40 which is correct if you take indexes starting from 0. And this function actually returns the kth smallest element – Peipeiffer 17/10, 2018 at 5:49

In the python code why is it "arr2[mida2+1:]" rather than just "arr2[mida2:]". That seems to remove 1 extra element from the half we're trying to keep – Alyssaalyssum 7/6, 2019 at 3:5

You've got it, just keep going! And be careful with the indexes...

To simplify a bit I'll assume that N and M are > k, so the complexity here is O(log k), which is O(log N + log M).

Pseudo-code:

i = k/2
j = k - i
step = k/4
while step > 0
    if a[i-1] > b[j-1]
        i -= step
        j += step
    else
        i += step
        j -= step
    step /= 2

if a[i-1] > b[j-1]
    return a[i-1]
else
    return b[j-1]