Correctness of greedy algorithm
Asked Answered
G

2

13

In non-decreasing sequence of (positive) integers two elements can be removed when enter image description here. How many pairs can be removed at most from this sequence?

So I have thought of the following solution:

  • I take given sequence and divide into two parts (first and second).
  • Assign to each of them iterator - it_first := 0 and it_second := 0, respectively. count := 0
  • when it_second != second.length
    • if 2 * first[it_first] <= second[it_second]
      • count++, it_first++, it_second++
    • else
      • it_second++
  • count is the answer

Example:

count := 0

[1,5,8,10,12,13,15,24] --> first := [1,5,8,10], second := [12,13,15,24]

  1. 2 * 1 ?< 12 --> true, count++, it_first++ and it_second++

  2. 2 * 5 ?< 13 --> true, count++, it_first++ and it_second++

  3. 2 * 8 ?< 15 --> false, it_second++

  4. 8 ?<24 --> true, count ++it_second reach the last element - END.

  5. count == 3

Linear complexity (the worst case when there are no such elements to be removed. n/2 elements compare with n/2 elements). So my missing part is 'correctness' of algorithm - I've read about greedy algorithms proof - but mostly with trees and I cannot find analogy. Any help would be appreciated. Thanks!

EDIT: By correctness I mean: * It works * It cannot be done faster(in logn or constant)

I would like to put some graphics but due to reputation points < 10 - I can't. (I've meant one latex at the beginning ;))

Grampositive answered 11/3, 2015 at 21:25 Comment(9)
Your approach will not cover all the cases. Imagine [1, 2, 4, 9]. [4, 9] could also be such pair. Or as in your example, [12, 24] could also be such pair.Arabela
It doesn't matter. We ask about how many at most pairs can be removed.Grampositive
Oh ok, I missed that.Arabela
Are you supposed to write a function to compute this given an input sequence, or are you asking theoretically, with a sequence of n non-decreasing numbers, what the maximum number of pairs that can be removed?Feign
Just to be precise, if the original list has an odd number of entries, would the extra entry go into the first or second sequence?Acevedo
Can you expand a little more on "'correctness' of algorithm"? Are you referring to the solution the Greedy Algorithm gives?Riancho
(the second one) I'm asking how to prove algorithm's correctness. I mean: it cannot be done better (in constant time or logn) and always finds the good solution Sorry if I didn't explain it clearly.Grampositive
@DragosRizescu - I'm referring to the solution that my greedy algorithm gives.Grampositive
My first attempt would be proof by contradiction.Mohamed
S
3
  1. Correctness:

    • Let's assume that the maximum number of pairs that can be removed is k. Claim: there is an optimal solution where the first elements of all pairs are k smallest elements of the array. Proof: I will show that it is possible to transform any solution into the one that contains the first k elements as the first elements of all pairs.

      1. Let's assume that we have two pairs (a, b), (c, d) such that a <= b <= c <= d, 2 * a <= b and 2 * c <= d. In this case, pairs (a, c) and (b, d) are valid, too. And now we have a <= c <= b <= d. Thus, we can always transform out pairs in such a way that the first element from any pair is not greater than the second element of any pair.

      2. When we have this property, we can simply substitute the smallest element among all first all elements of all pairs with the smallest element in the array, the second smallest among all first elements - with the second smallest element in the array and so on without invalidating any pair.

    • Now we know that there is an optimal solution that contains k smallest elements. It is clear that we cannot make the answer worse by taking the smallest unused element(making it bigger can only reduce the answer for the next elements) which fits each of them. Thus, this solution is correct.

    • A note about the case when the length of the array is odd: it doesn't matter where the middle element goes: to the first or to the second half. In the first half it is useless(there are not enough elements in the second half). If we put it to the second half, it is useless two(let's assume that we took it. It means that there is "free space" somewhere in the second half. Thus, we can shift some elements by one and get rid of it).

  2. Optimality in terms of time complexity: the time complexity of this solution is O(n). We cannot find the answer without reading the entire input in the worst case and reading is already O(n) time. Thus, this algorithm is optimal.

Spiky answered 11/3, 2015 at 22:32 Comment(6)
It's very nice. Hope you enjoyed that one. Thank you!Grampositive
Well, I guess I can stop typing my response :)Zoophobia
Actually, we can make a stronger claim: if there exists a valid answer of length k, the answer taking first k values and pairing them with last k values is valid. However, using the stronger claim does not seem any easier than running the proposed algorithm.Flipflop
@Flipflop That was the approach I was intending. I do think it makes the proof a little clearer, actually. But it isn't clear that it makes the algorithm any simpler.Zoophobia
@JeremyWest Same here :)Flipflop
(The importance of 2. being that the algorithm producing the correct result does terminate.)Mohamed
C
1

Presuming your method. Indices are 0-based.

Denote in general:

  • end_1 = floor(N/2) boundary (inclusive) of first part.

Denote while iterating:

  • i index in first part, j index in second part,
  • optimal solution until this point sol(i,j) (using algorithm from front),
  • pairs that remain to be paired-up optimally behind (i,j) point i.e. from (i+1,j+1) onward rem(i,j) (can be calculated using algorithm from back),
  • final optimal solution can be expressed as the function of any point as sol(i,j) + rem(i,j).

Observation #1: when doing algorithm from front all points in [0, i] range are used, some points from [end_1+1, j] range are not used (we skip a(j) not large engough). When doing algorithm from back some [i+1, end_1] points are not used, and all [j+1, N] points are used (we skip a(i) not small enough).

Observation #2: rem(i,j) >= rem(i,j+1), because rem(i,j) = rem(i,j+1) + M, where M can be 0 or 1 depending on whether we can pair up a(j) with some unused element from [i+1, end_1] range.

Argument (by contradiction): let's assume 2*a(i) <= a(j) and that not pairing up a(i) and a(j) gives at least as good final solution. By the algorithm we would next try to pair up a(i) and a(j+1). Since:

  • rem(i,j) >= rem(i,j+1) (see above),
  • sol(i,j+1) = sol(i,j) (since we didn't pair up a(i) and a(j))

we get that sol(i,j) + rem(i,j) >= sol(i,j+1) + rem(i,j+1) which contradicts the assumption.

Celandine answered 11/3, 2015 at 22:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.