Counting consecutive 1's in NumPy array - McMap

About

Counting consecutive 1's in NumPy array

Asked 9/2, 2017 at 5:37 Answered 9/2, 2017 at 6:22

Solved python numpy cumsum

C

2

6

[1, 1, 1, 0, 0, 0, 1, 1, 0, 0]

I have a NumPy array consisting of 0's and 1's like above. How can I add all consecutive 1's like below? Any time I encounter a 0, I reset.

[1, 2, 3, 0, 0, 0, 1, 2, 0, 0]

I can do this using a for loop, but is there a vectorized solution using NumPy?

Coccyx answered 9/2, 2017 at 5:37 Comment(0)

C

9

Here's a vectorized approach -

def island_cumsum_vectorized(a):
    a_ext = np.concatenate(( [0], a, [0] ))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]

Sample run -

In [91]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [92]: island_cumsum_vectorized(a)
Out[92]: array([1, 2, 3, 0, 0, 0, 1, 2, 0, 0])

In [93]: a = np.array([0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1])

In [94]: island_cumsum_vectorized(a)
Out[94]: array([0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 1])

Runtime test

For the timings , I would use OP's sample input array and repeat/tile it and hopefully this should be a less opportunistic benchmark -

Small case :

In [16]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [17]: a = np.tile(a,10)  # Repeat OP's data 10 times

# @Paul Panzer's solution
In [18]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
10000 loops, best of 3: 73.4 µs per loop

In [19]: %timeit island_cumsum_vectorized(a)
100000 loops, best of 3: 8.65 µs per loop

Bigger case :

In [20]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [21]: a = np.tile(a,1000)  # Repeat OP's data 1000 times

# @Paul Panzer's solution
In [22]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
100 loops, best of 3: 6.52 ms per loop

In [23]: %timeit island_cumsum_vectorized(a)
10000 loops, best of 3: 49.7 µs per loop

Nah, I want really huge case :

In [24]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [25]: a = np.tile(a,100000)  # Repeat OP's data 100000 times

# @Paul Panzer's solution
In [26]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
1 loops, best of 3: 725 ms per loop

In [27]: %timeit island_cumsum_vectorized(a)
100 loops, best of 3: 7.28 ms per loop

Carrageen answered 9/2, 2017 at 6:22 Comment(20)

benchmark these: a1 = np.repeat(np.arange(1000)%2, np.random.randint(1, 1000, (1000,))); a2 = np.repeat(np.arange(100)%2, np.random.randint(1, 10000, (100,))); a3 = np.repeat(np.random.random((100,)) < 0.2, np.random.randint(1, 10000, (100,))) ;-) – Lockwood 9/2, 2017 at 6:55

@PaulPanzer I decided to run all of these benchmarks myself on your suggested inputs. Divakar's implementation is still faster :P... by about 2x. – Tucker 9/2, 2017 at 7:4

@rayreng Also on the last one? But I admit his is better; I'm just objecting to opportunistic benchmarks ;-) – Lockwood 9/2, 2017 at 7:7

@PaulPanzer :D. I've already upvoted yours as it's very nice and in one line. The last one I just ran... it's just beaten by a slight margin. – Tucker 9/2, 2017 at 7:8

@Tucker Strange, on my laptop (100 repeats) Divakar:0.47050797403790057;0.26334572909399867;0.3771821779664606 Paul Panzer:0.6545195570215583;0.4508163968566805;0.17192376195453107 – Lockwood 9/2, 2017 at 7:17

@PaulPanzer Added few less opportunistic benchmarks. – Carrageen 9/2, 2017 at 8:15

I could start a big argument on how to scale a toy example, but who cares. You can have this one. Besides, I've already conceded yours is better. :-) – Lockwood 9/2, 2017 at 8:23

@PaulPanzer You are looping, mine has some setup overhead. The only way a loopy one would be better would be if we are iterating less, i.e. either we are working with a very small array or we have less number of 1s islands. I tried your a3 and that's ~2x slower with the loopy one. – Carrageen 9/2, 2017 at 8:54

@Carrageen that's interesting on my laptop the 'loopy one' is twice as fast, on your rig it's the other way round and for rayreng they are apparently almost tied. (I mean with a3). Any idea? Are you testing directly from the shell? I made a small file. Could the byte compilation make a difference? – Lockwood 9/2, 2017 at 9:16

As I said earlier, if you feed very less number of 1s islands, a loopy one would be comparable or even faster. Use your a3 and then count the number of islands with len([c for c in np.split(a, 1 + np.where(np.diff(a))[0])])//2 and then compare this number with the length of array with a.size. You would see its a very very small number of islands. For the sample, OP had the no. of island as 2 for an array length of 10. Don't want to make this a research project, as my point about when to use a vectorized solution stays the same :) – Carrageen 9/2, 2017 at 9:43

@Carrageen yes, I understand that which is why I designed the example in the first place :-)) What puzzles me are the differences between your and rayreng's and my benchmarking on the same type of sample. The ratio apparently varies by a factor of 4... – Lockwood 9/2, 2017 at 11:36

@PaulPanzer Use np.random.random((10,)) < 0.2. you would have more erratic results across different runs. Use np.random.random((100,)) < 0.2 for less erratic runs. Use np.random.random((1000,)) < 0.2 and so on for lesser erratic ones, if you see the pattern there. – Carrageen 9/2, 2017 at 11:42

@Carrageen Yes, but I can run it 10 times and I do see some variation, but nowhere near enough to explain the differences. – Lockwood 9/2, 2017 at 11:51

@PaulPanzer Maybe you need to run it more number of times. Also, you need to wake up rayreng and ask him to do so too and finally ask me to do the same. Maybe we can sit together and discuss this together with politics sometime :) – Carrageen 9/2, 2017 at 11:57

@Carrageen I give up :-) – Lockwood 9/2, 2017 at 12:3

@PaulPanzer Offer a bounty and we might have an answer on that! One day a bounty might solve politics issues too, hopefully :) – Carrageen 9/2, 2017 at 12:5

@Carrageen Cursed be the devil who invented this rep system! So addictive. I'm spending way too much time here. – Lockwood 9/2, 2017 at 12:14

@PaulPanzer haha yeahh! Tell me about it! – Carrageen 9/2, 2017 at 12:18

Would you have any idea how to do this for a 2d-array, counting on axis 1? – Mullock 2/12, 2017 at 10:43

@Mullock Would need considerable modifications to accommodate for 2D arrays I think. Post a new question? – Carrageen 2/12, 2017 at 10:50

L

2

If a list comprehension is acceptable

np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])

Lockwood answered 9/2, 2017 at 5:57 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.