I've got a sparse Matrix
in R that's apparently too big for me to run as.matrix()
on (though it's not super-huge either). The as.matrix()
call in question is inside the svd()
function, so I'm wondering if anyone knows a different implementation of SVD that doesn't require first converting to a dense matrix.
The irlba package has a very fast SVD implementation for sparse matrices.
irlba
(third under the question) was late by one year when posted. Your answer duplicates the comment another year later... –
Tailwind irlba
is the best R package for sparse svd, it seems like an appropriate answer. If the author of the comment would like to post an answer, I'd be happy to upvote it and delete mine. –
Justifiable You can do a very impressive bit of sparse SVD in R using random projection as described in http://arxiv.org/abs/0909.4061
Here is some sample code:
# computes first k singular values of A with corresponding singular vectors
incore_stoch_svd = function(A, k) {
p = 10 # may need a larger value here
n = dim(A)[1]
m = dim(A)[2]
# random projection of A
Y = (A %*% matrix(rnorm((k+p) * m), ncol=k+p))
# the left part of the decomposition works for A (approximately)
Q = qr.Q(qr(Y))
# taking that off gives us something small to decompose
B = t(Q) %*% A
# decomposing B gives us singular values and right vectors for A
s = svd(B)
U = Q %*% s$u
# and then we can put it all together for a complete result
return (list(u=U, v=s$v, d=s$d))
}
p
way up, in which case it doesn't save much resources. As a test, I made a random sparse 10000x12000 matrix with 1000 nonzero entries sampled as runif(1000), which should have eigenvalues around 0.999 or 1. But this method shows the first few eigenvalues as 0.8461391, 0.8423876, 0.8353727, 0.8321352, 0.8271768, 0.8203687
. –
Substituent So here's what I ended up doing. It's relatively straightforward to write a routine that dumps a sparse matrix (class dgCMatrix
) to a text file in SVDLIBC's "sparse text" format, then call the svd
executable, and read the three resultant text files back into R.
The catch is that it's pretty inefficient - it takes me about 10 seconds to read & write the files, but the actual SVD calculation takes only about 0.2 seconds or so. Still, this is of course way better than not being able to perform the calculation at all, so I'm happy. =)
rARPACK is the package you need. Works like a charm and is Superfast because it parallelizes via C and C++.
© 2022 - 2024 — McMap. All rights reserved.