Compute the cumulative sum of a list until a zero appears

Asked 15/2, 2018 at 10:31 Answered 27/4, 2019 at 19:17

I have a (long) list in which zeros and ones appear at random:

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

I want to get the list_b

sum of the list up to where 0 appears

where 0 appears, retain 0 in the list

list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

I can implement this as follows:

list_b = []
for i, x in enumerate(list_a):
    if x == 0:
        list_b.append(x)
    else:
        sum_value = 0
        for j in list_a[i::-1]:
            if j != 0:
                sum_value += j
            else:
                break
        list_b.append(sum_value)
print(list_b)

but the actual list's length is very long.

So, I want to improve code for high speed. (if it is not readable)

I change the code like this:

from itertools import takewhile
list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]
print(list_c)

But it is not fast enough. How can I do it in more efficient way?

Heiney answered 15/2, 2018 at 10:31 Comment(6)

are there only 1s and 0s in the original list? – Literati 15/2, 2018 at 10:33

There can't be a sequential solution that is better than linear in the length of the list. However, you can try to parallelize your algorithm by starting at different positions in the list, search for the first non-zero entry right-adjecent to a zero and proceed from there. Then each thread performs its work until it reaches the starting point from the thread of the neighboring block. Bonus: you may modify the list concurrently because the threads will operate on different blocks. – Slipon 15/2, 2018 at 10:38

@Ev. Kounis Yes. But I change the list [a, a, a, b, b, b, c, c .....] to the list_a, if change value 0 s and not change 1 s. – Heiney 15/2, 2018 at 10:50

@HiroyukiTaniichi If your actual data really looks like that, then try: s.groupby(s.ne(s.shift()).cumsum()).cumcount(). – Crossstitch 15/2, 2018 at 11:50

@Slipon That involves reading the list all the way through at least one more time than needed. You could index in a fraction and search for the next 0 and start from there, preventing the read-through. However this is python, so you would need to slice it and then multi-process the operation, then combine the list again (python threads are time-sliced within one operating thread. This may take longer than just processing it linearly) – Turgite 15/2, 2018 at 14:41

@Turgite "You could index in a fraction and search for the next 0 and start from there" -> that's quite what I suggested by saying "find the next non-zero right-adjacent to a zero". Maybe I expressed this unnecessarily complicated while attempting to skip additional zeros, which is indeed unnecessary. – Slipon 15/2, 2018 at 15:59

You're overthinking this.

Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.

data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

for i in range(1, len(data)):
    if data[i]:  
        data[i] += data[i - 1]

That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.

print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.

Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:

import pandas as pd

s = pd.Series(data)    
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()

print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

Performance

First, the setup -

data = data * 100000
s = pd.Series(data)

Next,

%%timeit
new_data = data.copy()
for i in range(1, len(data)):
    if new_data[i]:  
        new_data[i] += new_data[i - 1]

328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And, timing the copy separately,

%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, the copy doesn't really take much time. Finally,

%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.

Crossstitch answered 15/2, 2018 at 10:34 Comment(14)

I think you should have that as another list, without rewriting the original list. It might be needed further. – Torose 15/2, 2018 at 10:37

@big_bang Yup, I've mentioned that OP can create a copy: new_data = data.copy() and iterate over new_data :) – Crossstitch 15/2, 2018 at 10:38

Tank you very much! I was misunderstanding a list comprehension type is more speedy than normal code. – Heiney 15/2, 2018 at 10:40

@HiroyukiTaniichi In general, it can be, when done right. However, I'm not sure your problem can be successfully translated into a list comprehension without sacrificing performance. I've added timings to my answer. – Crossstitch 15/2, 2018 at 10:47

Thank you for adding more efficient way! pandas method cumsum() is my best answer. Actually I want to count same value. I need to study more. – Heiney 15/2, 2018 at 11:4

@HiroyukiTaniichi You're welcome. Pandas is a powerful API, but it will take some getting used to. Don't be afraid to ask questions if you have any problem. Good luck! – Crossstitch 15/2, 2018 at 11:6

@cᴏʟᴅsᴘᴇᴇᴅ Hmm, the new text makes me wonder even more. How could it be sublinear? Do you mean without .tolist() and with some output-sensitive complexity analysis? – Steric 19/2, 2018 at 17:26

@StefanPochmann i thought it was clear now. It's better than linear because the operations are parallelised. – Crossstitch 19/2, 2018 at 19:33

@cᴏʟᴅsᴘᴇᴇᴅ Well, I don't know much about pandas, so I wasn't thinking about parallelization. And even with that, it sounds dubious. Anyway, can you tell more tightly what the complexity is, not just "sublinear"? I googled +pandas +sublinear +parallel and found nothing. Well, google showed five results, but none of them actually contain "sublinear". The first result even sounds like pandas doesn't do parallel, but it's 1.5 years old. – Steric 19/2, 2018 at 20:37

@StefanPochmann I can understand your unwillingness to take my word for it, but pandas is built on numpy, and numpy is parallelised, for performance. Think SIMD: Single-Instruction-Multiple-Data. MSeifert's answer is based on the same principle: write a loop and then cythonize it, meaning you compile it to SIMD. Another thing to think about is the matrix multiplication. Conceptually, it is cubic, but because of SIMD and caching, it is effectively quadratic in practice. If you're asking for a source, well, I am the source. I've worked with this stuff before, so I'm familiar with how it works. – Crossstitch 19/2, 2018 at 21:40

Hmm, I now timed it with sizes from thousands to millions, even just the s.groupby(s.eq(0).cumsum()).cumsum() part, and doubling the input size pretty much doubled the time. So looks linear to me, not sublinear. – Steric 19/2, 2018 at 22:41

@StefanPochmann Looks like the groupby is linear, so I'm wrong in this case. There are a lot of operations in pandas that are vectorised and whose complexity does not grow linearly, so I am not wrong in general. Anyway, thanks for the nitpicks, I've edited my answer again. – Crossstitch 19/2, 2018 at 22:44

@cᴏʟᴅsᴘᴇᴇᴅ "MSeifert's answer is based on the same principle: write a loop and then cythonize it, meaning you compile it to SIMD."... compile it to SIMD? What do you mean? Does Cython actually implicitly make use of SIMD intrinsics? – Robertson 21/2, 2018 at 22:51

@Robertson I'm not sure! But I'm aware that it complies the code in such a way that the loop is vectorised. Whether it uses SIMD or not is something I'm not aware of (my comment may be misleading in that respect, apologies). – Crossstitch 21/2, 2018 at 23:58

If you want a compact native Python solution that is probably the most memory efficient, although not the fastest (see the comments), you could draw extensively from itertools:

>>> from itertools import groupby, accumulate, chain
>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

The steps here are: group the list into sublists based on presence of 0 (which is falsy), take the cumulative sum of the values within each sublist, flatten the sublists.

As Stefan Pochmann comments, if your list is binary in contents (like consisting of only 1s and 0s only) then you don't need to pass a key to groupby() at all and it will fall back on the identity function. This is ~30% faster than using bool for this case:

>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

Episcopal answered 15/2, 2018 at 10:43 Comment(8)

:hmm.jpg: if OP is looking for performance, this may not be the right solution to use. – Crossstitch 15/2, 2018 at 11:1

@cᴏʟᴅsᴘᴇᴇᴅ Well I upvoted yours for the Pandas approach, but did you benchmark mine? Occurred to me it might be faster with groupby(list_a, int)? – Episcopal 15/2, 2018 at 11:32

Yup. It clocked in at 681 ms ± 4.26 ms per loop with my setup, I made sure to verify this before commenting :) – Crossstitch 15/2, 2018 at 11:33

Regardless of whether you pass bool or int, it is the same. – Crossstitch 15/2, 2018 at 11:35

@cᴏʟᴅsᴘᴇᴇᴅ Thanks, how disappointing, I curse the standard library ;) – Episcopal 15/2, 2018 at 11:36

Your answer is still nice though, you already have my vote :) – Crossstitch 15/2, 2018 at 11:36

Seems to be about 10% shorter and faster as a list comprehension: [x for _, g in groupby(list_a, bool) for x in accumulate(g)] – Sanitation 15/2, 2018 at 18:56

Why not use groupby(list_a), without a key? – Steric 19/2, 2018 at 15:12

Personally I would prefer a simple generator like this:

def gen(lst):
    cumulative = 0
    for item in lst:
        if item:
            cumulative += item
        else:
            cumulative = 0
        yield cumulative

Nothing magic (when you know how yield works), easy to read and should be rather fast.

If you need more performance you could even wrap this as Cython extension type (I'm using IPython here). Thereby you lose the "easy to understand" portion and it's requiring "heavy dependencies":

%load_ext cython

%%cython

cdef class Cumulative(object):
    cdef object it
    cdef object cumulative
    def __init__(self, it):
        self.it = iter(it)
        self.cumulative = 0

    def __iter__(self):
        return self

    def __next__(self):
        cdef object nxt = next(self.it)
        if nxt:
            self.cumulative += nxt
        else:
            self.cumulative = 0
        return self.cumulative

Both need to be consumed, for example using list to give the desired output:

>>> list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
>>> list(gen(list_a))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
>>> list(Cumulative(list_a))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

However since you asked about speed I wanted to share the results from my timings:

import pandas as pd
import numpy as np
import random

import pandas as pd
from itertools import takewhile
from itertools import groupby, accumulate, chain

def MSeifert(lst):
    return list(MSeifert_inner(lst))

def MSeifert_inner(lst):
    cumulative = 0
    for item in lst:
        if item:
            cumulative += item
        else:
            cumulative = 0
        yield cumulative

def MSeifert2(lst):
    return list(Cumulative(lst))

def original1(list_a):
    list_b = []
    for i, x in enumerate(list_a):
        if x == 0:
            list_b.append(x)
        else:
            sum_value = 0
            for j in list_a[i::-1]:
                if j != 0:
                    sum_value += j
                else:
                    break
            list_b.append(sum_value)

def original2(list_a):
    return [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]

def Coldspeed1(data):
    data = data.copy()
    for i in range(1, len(data)):
        if data[i]:  
            data[i] += data[i - 1] 
    return data

def Coldspeed2(data):
    s = pd.Series(data)    
    return s.groupby(s.eq(0).cumsum()).cumsum().tolist()

def Chris_Rands(list_a):
    return list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool)))

def EvKounis(list_a):
    cum_sum = 0
    list_b = []
    for item in list_a:
        if not item:            # if our item is 0
            cum_sum = 0         # the cumulative sum is reset (set back to 0)
        else:
            cum_sum += item     # otherwise it sums further
        list_b.append(cum_sum)  # and no matter what it gets appended to the result

def schumich(list_a):
    list_b = []
    s = 0
    for a in list_a:
        s = a+s if a !=0 else 0
        list_b.append(s)
    return list_b

def jbch(seq):
    return list(jbch_inner(seq))

def jbch_inner(seq):
    s = 0
    for n in seq:
        s = 0 if n == 0 else s + n
        yield s


# Timing setup
timings = {MSeifert: [], 
           MSeifert2: [],
           original1: [], 
           original2: [],
           Coldspeed1: [],
           Coldspeed2: [],
           Chris_Rands: [],
           EvKounis: [],
           schumich: [],
           jbch: []}
sizes = [2**i for i in range(1, 20, 2)]

# Timing
for size in sizes:
    print(size)
    func_input = [int(random.random() < 0.75) for _ in range(size)]
    for func in timings:
        if size > 10000 and (func is original1 or func is original2):
            continue
        res = %timeit -o func(func_input)   # if you use IPython, otherwise use the "timeit" module
        timings[func].append(res)


%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

baseline = MSeifert2 # choose one function as baseline
for func in timings:
    ax.plot(sizes[:len(timings[func])], 
            [time.best / ref.best for time, ref in zip(timings[func], timings[baseline])], 
            label=func.__name__)  # you could also use "func.__name__" here instead
ax.set_ylim(0.8, 1e4)
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time relative to {}'.format(baseline)) # you could also use "func.__name__" here instead
ax.grid(which='both')
ax.legend()
plt.tight_layout()

In case you're interested in the exact results I put them in this gist.

It's a log-log plot and relative to the Cython answer. In short: The lower the faster and the range between two major tick represents one order of magnitude.

So all solutions tend to be within one order of magnitude (at least when the list is big) except for the solutions you had. Strangely the pandas solution is quite slow compared to the pure Python approaches. However the Cython solution beats all of the other approaches by a factor of 2.

Menander answered 15/2, 2018 at 21:29 Comment(7)

The thing is I didn't time the conversion of the list to series, and assumed that was a given. The conversion constitutes 3/4th of the time right there. Great answer by the way. – Crossstitch 15/2, 2018 at 21:53

@cᴏʟᴅsᴘᴇᴇᴅ Thanks for the elaboration. I suspected it might be the conversion but haven't investigated that. Especially the pd.Series conversion might be particularly slow (probably like np.array this already does multiple passes over the list). – Menander 15/2, 2018 at 21:58

I would use cumulative = nxt instead of cumulative = 0, since that is what you'd do in the generalized case of the problem (i.e: compute cumulative sum until you find the value x and then restart). I doubt the change in performance would be significant. – Oslo 16/2, 2018 at 9:53

@GiacomoAlzetta If you like that, however given the if item check it only enters the else branch in case it's zero. So it doesn't matter which one you use. And generalizing it even more would negatively affect the performance which was of primary concern for the OP :) – Menander 16/2, 2018 at 13:15

I don't think that changing if item with if item == pivot will have that much of an impact. If performance is that important I think the OP should drop python altogether and write in assembly. – Oslo 16/2, 2018 at 14:33

@GiacomoAlzetta Have you timed it? Also the OP isn't asking for a generalized way, he asks for a specific and fast way - that's why I didn't bother generalizing it. :) – Menander 16/2, 2018 at 14:35

Great, if all benchmarks were this good, SO would be a better place! Also a reminder that I need to learn Cython! – Episcopal 16/2, 2018 at 22:36

You are playing with the indices too much in the code you posted when you do not really have to. You can just keep track of a cumulative sum and reset it to 0 every time you meet a 0.

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

cum_sum = 0
list_b = []
for item in list_a:
    if not item:            # if our item is 0
        cum_sum = 0         # the cumulative sum is reset (set back to 0)
    else:
        cum_sum += item     # otherwise it sums further
    list_b.append(cum_sum)  # and no matter what it gets appended to the result
print(list_b)  # -> [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

Literati answered 15/2, 2018 at 10:39 Comment(4)

@cᴏʟᴅsᴘᴇᴇᴅ Thanks for timing it! – Literati 15/2, 2018 at 11:43

you can speed it up by avoiding the lookup of the append method using append = list_b.append – Kimmel 15/2, 2018 at 13:42

@Leonhard really? no-one does this? you think that will significantly improve lookup speed? looks like name space pollution to me – Episcopal 15/2, 2018 at 14:51

I tried it with the timeit module it was around 15% in this case where the loop does only a branch, an addition and an append. Not that this is probably relevant except to beat another answer in this thread. – Kimmel 15/2, 2018 at 18:31

It doesn't have to be as complicated as made in the question asked, a very simple approach could be this.

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
list_b = []
s = 0
for a in list_a:
    s = a+s if a !=0 else 0
    list_b.append(s)

print list_b

Grogram answered 15/2, 2018 at 10:48 Comment(0)

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and increment a variable within a list comprehension:

# items = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
total = 0
[total := (total + x if x else x) for x in items]
# [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

This:

Initializes a variable total to 0 which symbolizes the running sum
For each item, this both:
- either increments total with the current looped item (total := total + x) via an assignment expression or set it back to 0 if the item is 0
- and at the same time, maps x to the new value of total

Saimon answered 27/4, 2019 at 19:17 Comment(1)

Or [total := x and total + x for x in items]. Would also be good to mention how fast it is (I think in MSeifert's benchmark, it would be second place, only beaten by the Cython solution). – Checkoff 22/9, 2020 at 0:13

I would use a generator if you want performance (and it's simple too).

def weird_cumulative_sum(seq):
    s = 0
    for n in seq:
        s = 0 if n == 0 else s + n
        yield s

list_b = list(weird_cumulative_sum(list_a_))

I don't think you'll get better than that, in any case you'll have to iterate over list_a at least once.

Note that I called list() on the result to get a list like in your code but if the code using list_b is iterating over it only once with a for loop or something there is no use converting the result to a list, just pass it the generator.

Tenantry answered 15/2, 2018 at 16:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags