How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Asked 14/7, 2010 at 15:31 Answered 21/12, 2010 at 20:1

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

Bruise answered 14/7, 2010 at 15:31 Comment(4)

If d is a sparse matrix of the same size as a you can use a.multiply(d). Perhaps you can make a d that's N rows long and loop over N rows of a at a time? – Phosgenite 14/7, 2010 at 19:43

But d is dense and cannot be broadcasted explicitly in memory to satisfy the multiply shape requirements. Looping over a batch is an option but I find this a bit hackish. I would have thought there was a vanilla vectorized / scipy way to do this without a python loop. – Bruise 14/7, 2010 at 22:15

I guess the problem is you want the representation of a (sparse) matrix but the mulitply operation of an array. I think you're going to have to roll your own unfortunately. – Phosgenite 15/7, 2010 at 19:33

Actually there is a.multply(d) that should do exactly that but it does not do the broadcasting as usual. Maybe it's a bug. – Bruise 17/7, 2010 at 10:16

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

Koontz answered 28/7, 2010 at 21:27 Comment(2)

The great thing about this is that it also works when X is an ndarray or a dense matrix. +1. – Patrizio 8/9, 2012 at 16:40

This could be further simplified using scipy.sparse.diags(d, 0) rather than lil_matrix – Wiatt 21/1, 2016 at 23:5

I think A.multiply(B) should work in scipy sparse. The method multiply does "point-wise" multiplication, not matrix multiplication.

HTH

Akim answered 21/12, 2010 at 20:1 Comment(1)

@K3---rnc the result is dense only if B is dense. If you convert B to any of the sparse formats, it will do the trick. E.g. A.multiply(csc_matrix(B)) – Bulimia 23/1, 2017 at 18:27

Well, here's a simple code that will do what you want. I don't know if it is as efficient as you would like, so take it or leave it:

import scipy.sparse as ssp
def pointmult(a,b):
    x = a.copy()
    for i in xrange(a.shape[0]):
        if x.data[i]:
            for j in xrange(len(x.data[i])):
                x.data[i] *= b[x.rows[i]]
    return x

It only works with lil matrices so you'll have to make some changes if you want it to work with other formats.

Conduce answered 18/7, 2010 at 22:34 Comment(1)

thanks I would have liked to avoid for loops in python however. But maybe there is no way out with the current scipy.sparse classes for this use case. – Bruise 19/7, 2010 at 17:49

Recommended topics

Hot tags