Suppose I have a scipy.sparse.csr_matrix
representing the values below
[[0 0 1 2 0 3 0 4]
[1 0 0 2 0 3 4 0]]
I want to calculate the cumulative sum of non-zero values in-place, which would change the array to:
[[0 0 1 3 0 6 0 10]
[1 0 0 3 0 6 10 0]]
The actual values are not 1, 2, 3, ...
The number of non-zero values in each row are unlikely to be the same.
How to do this fast?
Current program:
import scipy.sparse
import numpy as np
# sparse data
a = scipy.sparse.csr_matrix(
[[0,0,1,2,0,3,0,4],
[1,0,0,2,0,3,4,0]],
dtype=int)
# method
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
st = indptr[i]
en = indptr[i + 1]
np.cumsum(data[st:en], out=data[st:en])
# print result
print(a.todense())
Result:
[[ 0 0 1 3 0 6 0 10]
[ 1 0 0 3 0 6 10 0]]
numpy/scipy
eyes on SO than on CR. Speed questions on working code are answered all the time on SO, especially with the code packages are somewhat specialized. – Subminiature@r xu
, what you show looks good. Applyingcumsum
row by row is really only way to go. And your use ofout
is clever. There is aas strided
basedindptr
iterator that might improve speed a bit. – Subminiature