python - prefix sum algorithm

Asked 31/10, 2016 at 3:56 Answered 25/9, 2019 at 12:33

I am trying to grasp the idea behind the prefix sum concept looking at the example presented in the Prefix Sum Lesson by Codility here (The mushroom picker problem)

My understanding is that the whole concept is based on the simple property where for finding a sum of all elements between two positions A(pos_left, pos_right) of an array A a second array P is used where all elements are consecutively summed and where the searched sum is calculated as
value(P(pos_right + 1)) - value(P(pos_left)).

A 1 2 3 4 5  6
P 0 1 3 6 10 15 21
sum of all elements between A[2] and A[5] = 3+ 4 + 5 = 12
or using the prefix sums"   P[5+1] - P[2] = 15 -3 = 12

The problem
There is a street with mushroom at every place represented by a non-empty vector. Given the initial position of a picker and its movement range, possible maximum number of mushrooms to collect is looked for.

Looking at the example I don't understand the logic behind the constuction of the loops. Can anybody clarify the mechanics of this algorithm?

Secondly, I found the positoin indexing in this example very confusing and cumbersome. Is it common practise to "shift" the vector with prefix sums with the zero in the begining? (the fact that counting elements in vectors start by defualt from 0 in python causes already some confusion).

The solution

def prefix_sums(A):
  n = len(A)
  P = [0] * (n + 1)
  for k in xrange(1, n + 1):
      P[k] = P[k - 1] + A[k - 1]
  return P


def count_total(P, x, y):
    return P[y + 1] - P[x]

# A mushroom picker is at spot number k on the road and should perform m moves
def mushrooms(A, k, m):
    n = len(A)
    result = 0
    pref = prefix_sums(A)
    for p in xrange(min(m, k) + 1):   # going left
        left_pos = k - p
        right_pos = min(n - 1, max(k, k + m - 2 * p))
        result = max(result, count_total(pref, left_pos, right_pos))
    for p in xrange(min(m + 1, n - k)):
        right_pos = k + p
        left_pos = max(0, min(k, k - (m - 2 * p)))
        result = max(result, count_total(pref, left_pos, right_pos))
    return result

I have run some example for a small array A= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] , chose the position k=5 and the range m = 3. I don't understand the logic of creating the ranges to check by the two loops.

I get the following parameters for the loops

(p=, left_pos=, right_pos=)   
loop 1  (0,5,8), (1,4,6),(2,3,5),(3,2,5)
loop 2  (0,2,5), (1,4,6), (2,5,7), (3,5,8)

The rangies vary. Why?

version for debugging

def mushrooms2(A, k, m):
    n = len(A)
    result = 0
    pref = prefix_sums(A)
    l1 =min(m, k) + 1
    print 'loop p in xrange(min(m, k) + 1): %d' % l1
    for p in xrange(min(m, k) + 1):
        print 'p %d' % p
        print 'A= %r' % A
        print 'pref= %r' % pref
        left_pos = k - p
        right_pos = min(n - 1, max(k, k + m - 2 * p))
        result = max(result, count_total(pref, left_pos, right_pos))
        print 'left_pos = k - p= %d' % left_pos
        print 'right_pos= min(n-1,max(k,k+m-2*p))= %d' % right_pos
        print 'max'
        print '(result %d' % result
        print 'count_total(pref, left_pos, right_pos)) %r, %r, %r, %r' % (pref,left_pos, right_pos,count_total(pref, left_pos, right_pos))
        print 'result= %d' % result
        print 'next p'
    l2=min(m + 1, n - k)
    print   'loop xrange(min(m + 1, n - k)): %d' % l2
    for p in xrange(min(m + 1, n - k)):
        print 'p %d' % p
        print 'A= %r' % A
        print 'pref= %r' % pref
        right_pos = k + p
        left_pos = max(0, min(k, k - (m - 2 * p)))
        result = max(result, count_total(pref, left_pos, right_pos))
        print 'right_pos = k + p= %d' % right_pos
        print 'left_pos = max(0, min(k, k - (m - 2 * p)))= %d' % left_pos
        print 'max'
        print '(result %d' % result
        print 'count_total(pref, left_pos, right_pos)) %r, %r, %r, %r' % (pref,left_pos, right_pos,count_total(pref, left_pos, right_pos))
        print 'result= %d' % result
        print 'next p'
    print 'result %d' % result
    return result

Merline answered 31/10, 2016 at 3:56 Comment(1)

Python Indexing/slices are zero based. Depending on what you are trying to accomplish, using calculated loop variables for slices or indices can be productive. – Beggs 31/10, 2016 at 5:2

You are not alone in considering the loop construction to be counter-intuitive, as I had to spend a few minutes on it as well. Here's what I figured out.

Now, the solution in the link you provided further details the optimal strategy is walking on path in such a way that one changes directions only once. In that manner, one is able to cover a range with left and right endpoints, which left_pos and right_pos seems to represent.

As to the particulars of the loops, instead of thinking of the loop in terms of the loop variables(i.e. p) it is easier to figure out what changes through the course of the loop, and how p is used. Otherwise, figuring out what is in those min and max expressions seems a bit too peculiar in the beginning.

For instance, in the first loop, instead of figuring out what that range represents, try how left_pos is affected by different values p gets. After a bit of thinking, one notices that left_pos changes in a manner complying to the possible left endpoints.

Specifically, when p == 0, left endpoint is the starting index(i.e. k) and when p is min(m, k), then it is either 0(i.e. if k < m) or (k - m). In the former case, that is as far as the left endpoint can go, as it would get out of the valid range of spots on the road. In the latter case, the number of moves prohibit any solution with a left_pos smaller than (k - m) since it is impossible to go from k to those indices in m moves.

The assignment made to right_pos in the first loop can be explained similarly. min statement includes (n-1), which is the rightmost legal index that can be reached and it serves to keep the right endpoint in the allowed limits. The inner max statement features k, as it is the least possible value for right_pos. (i.e. due to k being the starting point) It also has an expression (k + m - 2 * p). This expression represents the following process:

Go to left for p moves.
Change direction, and go to right for p moves to reach the starting point.
Go to right with the remaining (m - 2p) moves.

The second loop is just the reflection of this first loop, and you may explain it simply by adapting my explanation of the first loop.

As to your second question, I do not think it is common practice to shift the indices for prefix sum arrays. I typically use this method in online programming contests and my implementation of the prefix sum array you use in Python would be as follows.

def prefix_sums(A):
    n = len(A)
    P = [0] * n
    P[0] = A[0]
    for k in xrange(1, n):
        P[k] = P[k - 1] + A[k]
    return P

def count_total(P, x, y):
    return (P[y] - P[x - 1] if x > 0 else P[y])

The fundamental idea behind the implementation above is that, at P[x], we have the sum A[0] + A[1] + ... + A[x].

Hexapody answered 31/10, 2016 at 10:57 Comment(8)

@ ilim your version of prefix_sums(A) returns an error, I guess it is because there are only n elements in A, but the loop runs up to n+1, so there is missing argument for the A[n+1] – Merline 31/10, 2016 at 20:53

You're correct. Apologies for not observing carefully enough. – Hexapody 31/10, 2016 at 21:16

@Merline did you try it yet? – Hexapody 2/11, 2016 at 14:9

@Hexapody Wonderful explanation. Thanks :) – Zenda 3/5, 2017 at 7:8

@ParitoshGupta Glad to help. – Hexapody 3/5, 2017 at 7:44

Should I use p[y] - (p[x - 1] if x > 0 else 0) in the count, so that if I want to get the count between the 0th position and the 4th it would still work instead of overunning the bounds? I think thats why people extend it with a leading [0], so the x-1 still works. – Spindling 23/4, 2019 at 2:55

@AndrewBacker Seems like I have overlooked that edge case. Fixed it. – Hexapody 8/5, 2019 at 22:34

@ilim: Great explanation! Thank you! – Beefsteak 6/6, 2019 at 19:58

After reading the topic it was still hard to understand the idea, until i implemented naive solution(which is first in the codility document)

Hard to understand solution #2 simply imitates moving left and right and all these weird looking calculations only for getting left and right limits of the area(as you would really move inside it). So each iteration means one full cycle of using 6 steps.

If you move to the left and then to the right (p=0...M), you have

0 steps left, 6 steps right(really 0 and 2 steps cause out of array border), so left border of area is at index 4 and right border is at index 6
1 steps left, 5 steps right(really 1 and 3), so left border is at index 3 and right border is at index 6
2 steps left, 4 steps right(really 2 and 4)...continue calculations

Here is my PHP version with oversimplified code and additional variables for easier understanding

function prefix_sums(array $a)
{
    $n = count($a);
    $p = array_fill(0, $n + 1, 0);
    for ($i = 1; $i <= $n; $i++) {
        $p[$i] = $p[$i - 1] + $a[$i - 1];
    }
    return $p;
}

function count_total($p, $x, $y)
{
    return $p[$y + 1] - $p[$x];
}

function mushrooms(array $a, int $k, int $m)
{
    $n = count($a) - 1;
    $max = 0;
    $sums = prefix_sums($a);
    //start  moving to the left and then the right
    for ($p = 0; $p < $m; $p++) {
        $stepsLeft = $p;
        $realStepsLeft = min($k, $stepsLeft);
        $leftBorder = $k - $realStepsLeft;

        $stepsRight = $m - $stepsLeft;
        $realStepsRight = min($n - $leftBorder, $stepsRight);
        $rightBorder = $leftBorder + $realStepsRight;

        $max = max($max, count_total($sums, $leftBorder, $rightBorder));
    }
    //moving to the right and then the left
    for ($p = 0; $p < $m; $p++) {
        $stepsRight = $p;
        $realStepsRight = min($p, $n - $k);
        $rightBorder = $k + $realStepsRight;

        $stepsLeft = $m - $stepsRight;
        $realStepsLeft = min(($k + $realStepsRight), $stepsLeft);
        $leftBorder = $rightBorder - $realStepsLeft;

        $max = max($max, count_total($sums, $leftBorder, $rightBorder));
    }
    return $max;
}

assert(ASSERT_EXCEPTION, 1);
assert(mushrooms([2, 3, 7, 5, 1, 3, 9], 4, 6) == 25);

echo 'Success';

Slovenly answered 25/9, 2019 at 12:33 Comment(0)

Recommended topics

Hot tags