easy sampling of vectors from a sparse matrix, and creating a new matrix from the sample (python)
Asked Answered
S

2

6

This question has two parts (maybe one solution?):

Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.

from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K)    #works OK
mm = np.array(m)
sample(m,K)    #works OK
sm = lil_matrix(m)
sample(sm,K)   #throws exception TypeError: sparse matrix length is ambiguous.

My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:

indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
    sampledRows+=[sm.getrow(i)]

Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.

Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?

Siren answered 24/3, 2012 at 21:48 Comment(0)
S
3

Try

sm[np.random.sample(sm.shape[0], K, replace=False), :]

This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.

Sharrisharron answered 24/3, 2012 at 21:51 Comment(4)
it doesn't really work as it returns a list of lists in various length and not sparse (/not sparse) vectors. e.g. sm.data[sample(xrange(sm.shape[0]), 2)] returns array([[1, 2], [8]], dtype=object)Siren
@Siren Whoops, you're right: I was testing on a sample where the rows all had entries. I've changed the answer to something similar that actually gets you out a sparse matrix in one step.Sharrisharron
+ I was not familiar with xrange() which appears to be very useful :)Siren
TypeError: random_sample() takes at most 1 positional argument (2 given) Perhaps this worked in the past but with modern versions of numpy np.random.sample is an alias to numpy.random.random_sample which only takes one argument size and spits out an array of random numbers.Neckline
F
1

The accepted answer to this question is outdated and no longer works. With newer versions of numpy, you should use np.random.choice in place of np.random.sample, e.g.:

sm[np.random.choice(sm.shape[0], K, replace=False), :]

as opposed to:

sm[np.random.sample(sm.shape[0], K, replace=False), :]
Fahland answered 4/1, 2023 at 17:1 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.