Cumulative summation of a numpy array by index

H

5

12

Assume you have an array of values that will need to be summed together

d = [1,1,1,1,1]

and a second array specifying which elements need to be summed together

i = [0,0,1,2,2]

The result will be stored in a new array of size max(i)+1. So for example i=[0,0,0,0,0] would be equivalent to summing all the elements of d and storing the result at position 0 of a new array of size 1.

I tried to implement this using

c = zeros(max(i)+1)
c[i] += d

However, the += operation adds each element only once, thus giving the unexpected result of

[1,1,1]

instead of

[2,1,2]

How would one correctly implement this kind of summation?

Holograph answered 31/8, 2010 at 4:32 Comment(2)

This would be a lot clearer if the values of d were unique. For instance, if d = [0,1,2,3,4] Im guessing for i = [0,0,0,0,0]` you want c = [10], while for i = [0,0,1,2,2] you want c = [1,2,7]? – Pardo 31/8, 2010 at 8:20

In that case, juxstapose's solution, with the change I suggest in the comments, should do the trick. – Pardo 31/8, 2010 at 19:53

L

2

This solution should be more efficient for large arrays (it iterates over the possible index values instead of the individual entries of i):

import numpy as np

i = np.array([0,0,1,2,2])
d = np.array([0,1,2,3,4])

i_max = i.max()
c = np.empty(i_max+1)
for j in range(i_max+1):
    c[j] = d[i==j].sum()

print c
[1. 2. 7.]

Lajuanalake answered 2/9, 2010 at 15:42 Comment(0)

S

14

If I understand the question correctly, there is a fast function for this (as long as the data array is 1d)

>>> i = np.array([0,0,1,2,2])
>>> d = np.array([0,1,2,3,4])
>>> np.bincount(i, weights=d)
array([ 1.,  2.,  7.])

np.bincount returns an array for all integers range(max(i)), even if some counts are zero

Sendai answered 11/9, 2010 at 1:0 Comment(1)

that is the best solution for the case described here. For a general sum of labeled array, you can use scipy.ndimage.sum. This modules also have other useful functions such as maximum, minimum, mean, variance, ... – Overliberal 5/3, 2013 at 15:49

M

3

Juh_'s comment is the most efficient solution. Here's working code:

import numpy as np
import scipy.ndimage as ni

i = np.array([0,0,1,2,2])
d = np.array([0,1,2,3,4])

n_indices = i.max() + 1
print ni.sum(d, i, np.arange(n_indices))

Mantoman answered 17/6, 2014 at 10:36 Comment(0)

K

2

def zeros(ilen):
 r = []
 for i in range(0,ilen):
     r.append(0)

i_list = [0,0,1,2,2]
d = [1,1,1,1,1]
result = zeros(max(i_list)+1)

for index in i_list:
  result[index]+=d[index]

print result

Krystynakshatriya answered 31/8, 2010 at 4:53 Comment(1)

Close, but I think the OP wants for didx,ridx in enumerate(i_list): result[ridx] += d[didx]. Also, since the tags include [numpy], you might use numpy.zeros. – Pardo 31/8, 2010 at 8:18

L

2

This solution should be more efficient for large arrays (it iterates over the possible index values instead of the individual entries of i):

import numpy as np

i = np.array([0,0,1,2,2])
d = np.array([0,1,2,3,4])

i_max = i.max()
c = np.empty(i_max+1)
for j in range(i_max+1):
    c[j] = d[i==j].sum()

print c
[1. 2. 7.]

Lajuanalake answered 2/9, 2010 at 15:42 Comment(0)

T

0

In the general case when you want to sum submatrices by labels you can use the following code

import numpy as np
from scipy.sparse import coo_matrix

def labeled_sum1(x, labels):
     P = coo_matrix((np.ones(x.shape[0]), (labels, np.arange(len(labels)))))
     res = P.dot(x.reshape((x.shape[0], np.prod(x.shape[1:]))))
     return res.reshape((res.shape[0],) + x.shape[1:])

def labeled_sum2(x, labels):
     res = np.empty((np.max(labels) + 1,) + x.shape[1:], x.dtype)
     for i in np.ndindex(x.shape[1:]):
         res[(...,)+i] = np.bincount(labels, x[(...,)+i])
     return res

The first method use the sparse matrix multiplication. The second one is the generalization of user333700's answer. Both methods have comparable speed:

x = np.random.randn(100000, 10, 10)
labels = np.random.randint(0, 1000, 100000)
%time res1 = labeled_sum1(x, labels)
%time res2 = labeled_sum2(x, labels)
np.all(res1 == res2)

Output:

Wall time: 73.2 ms
Wall time: 68.9 ms
True

Tymbal answered 2/6, 2015 at 10:40 Comment(0)

Recommended topics

Hot tags