SVD for sparse matrix in R
Asked Answered
S

4

10

I've got a sparse Matrix in R that's apparently too big for me to run as.matrix() on (though it's not super-huge either). The as.matrix() call in question is inside the svd() function, so I'm wondering if anyone knows a different implementation of SVD that doesn't require first converting to a dense matrix.

Substituent answered 9/2, 2011 at 22:26 Comment(3)
I can't find anything for R. Plenty of stuff for C, Fortran, Python etc.Historian
Maybe I'll try out SVDLIBC. It builds as a C library, so if it works well I could in the future wrap it as a module (though my ambition probably won't hold up that long, if history is any guide...).Substituent
How about this cran.r-project.org/web/packages/irlba A fast and memory-efficient method for computing a few approximate singular values and singular vectors of large matrices.Perfervid
J
9

The irlba package has a very fast SVD implementation for sparse matrices.

Justifiable answered 30/4, 2013 at 20:9 Comment(2)
The comment mentioning package irlba (third under the question) was late by one year when posted. Your answer duplicates the comment another year later...Tailwind
@Tailwind I apologize, as I missed the comment. This page is the first result when one searches for "R svd sparse," and considering that irlba is the best R package for sparse svd, it seems like an appropriate answer. If the author of the comment would like to post an answer, I'd be happy to upvote it and delete mine.Justifiable
E
7

You can do a very impressive bit of sparse SVD in R using random projection as described in http://arxiv.org/abs/0909.4061

Here is some sample code:

# computes first k singular values of A with corresponding singular vectors
incore_stoch_svd = function(A, k) {
  p = 10              # may need a larger value here
  n = dim(A)[1]
  m = dim(A)[2]

  # random projection of A    
  Y = (A %*% matrix(rnorm((k+p) * m), ncol=k+p))
  # the left part of the decomposition works for A (approximately)
  Q = qr.Q(qr(Y))
  # taking that off gives us something small to decompose
  B = t(Q) %*% A

  # decomposing B gives us singular values and right vectors for A  
  s = svd(B)
  U = Q %*% s$u
  # and then we can put it all together for a complete result
  return (list(u=U, v=s$v, d=s$d))
}
Electromagnetism answered 7/5, 2011 at 20:48 Comment(3)
So I tried it out and it looks like it doesn't give very good results unless I jack p way up, in which case it doesn't save much resources. As a test, I made a random sparse 10000x12000 matrix with 1000 nonzero entries sampled as runif(1000), which should have eigenvalues around 0.999 or 1. But this method shows the first few eigenvalues as 0.8461391, 0.8423876, 0.8353727, 0.8321352, 0.8271768, 0.8203687.Substituent
Read the original paper. If your eigenvalues are all about the same value, then it won't save you much as it stands. In such cases, you need to do a few iterations of squaring your source matrix to get a better spread.Electromagnetism
Try it again on a test matrix with limited rank.Electromagnetism
S
5

So here's what I ended up doing. It's relatively straightforward to write a routine that dumps a sparse matrix (class dgCMatrix) to a text file in SVDLIBC's "sparse text" format, then call the svd executable, and read the three resultant text files back into R.

The catch is that it's pretty inefficient - it takes me about 10 seconds to read & write the files, but the actual SVD calculation takes only about 0.2 seconds or so. Still, this is of course way better than not being able to perform the calculation at all, so I'm happy. =)

Substituent answered 11/2, 2011 at 21:39 Comment(4)
Followup question posted at https://mcmap.net/q/1161735/-extract-long-from-r-object.Substituent
Giving yourself a checkmark without offering any example data or solution code seems contrary to the principles of SO.Gilli
I'm no longer at the company where I did this, so I don't have access to that code. I meant the checkmark to mean "this is the solution that I actually ended up using." It's pretty different from the other suggestions, and very straightforward to implement. To boot, it's really the "wrong" way to do it, because it requires writing text files representing numeric matrices. So I don't know if it's really worth copying as an example.Substituent
This is an old question now. Anyone going into the same area now would be well advised to follow Zach's recommendation below to use irlba, ignoring the checkmark on this answer.Pluviometer
F
3

rARPACK is the package you need. Works like a charm and is Superfast because it parallelizes via C and C++.

Friction answered 6/4, 2014 at 6:54 Comment(2)
In 2017, rArpack's v0.11.0 DESCRIPTION file says: Now rARPACK becomes a simple shell of the RSpectra package.Butterflies
Jumping in even more years later, I have to say rARPACK and RSpectra are amazing packages. irlba has a special place in my heart, but RSpectra is amazingly fastJustifiable

© 2022 - 2024 — McMap. All rights reserved.