svd of very large matrix in R program
Asked Answered
S

2

6

I have a matrix 60 000 x 60 000 in a txt file, I need to get svd of this matrix. I use R but I don´t know if R can generate it.

Saunder answered 27/6, 2013 at 20:41 Comment(8)
Welcome to stackoverflow. You will probably benefit from looking over the guidelines on asking a good question here: stackoverflow.com/help/how-to-ask. In particular, please include a minimal, reproducible answer, and explain what you have tried so far to solve your own problem.Exert
Probably not. n <- 60e3; x <- matrix(0, ncol=n, nrow=n) throws an error. (In R-2.15.3)Riggins
@Andrie: On my machine: Error: cannot allocate vector of size 13.4 Gb. So probably only a matter of having enough memory. At least on R 3.x, where we can have long arrays, I think.Codeine
@Codeine That's a loaded "only". It you need 13.4Gb just to create the matrix, I would think you'd need at least double that to do anything meaningful on it. Maybe triple. (Assuming there isn't a disk based solution using ff or bigmemory or something.)Musicale
@joran: :D Yes, yes! We have a 128GB server that I admin. If it was just a one time thing, I'd probably have just thrown enough memory at the problem. But I may have picked up bad habits. :)Codeine
I think the combo bigmemory + irlba is the way to go (or scidb but I find very difficult to install it on my linux box).Intermit
You might consider using something a bit more suited to this task, such as GraphChi.Travis
Figure out how to do it with a 10x10 matrix. Then a 100x100 matrix. Then try 1000x1000. Then 10,000 x 10,000. Then 60,000 x 60,000. If you don't get to that point THEN you tell us exactly what happened AND I'll not downvote your question.Crissycrist
I
14

I think it's possible to compute (partial) svd using the irlba package and bigmemory and bigalgebra without using a lot of memory.

First let's create a 20000 * 20000 matrix and save it into a file

require(bigmemory)
require(bigalgebra)
require(irlba)

con <- file("mat.txt", open = "a")
replicate(20, {
    x <- matrix(rnorm(1000 * 20000), nrow = 1000)
    write.table(x, file  = 'mat.txt', append = TRUE,
            row.names = FALSE, col.names = FALSE)
})

file.info("mat.txt")$size
## [1] 7.264e+09   7.3 Gb
close(con)

Then you can read this matrix using bigmemory::read.big.matrix

bigm <- read.big.matrix("mat.txt", sep = " ",
                        type = "double",
                        backingfile = "mat.bk",
                        backingpath = "/tmp",
                        descriptorfile = "mat.desc")

str(bigm)
## Formal class 'big.matrix' [package "bigmemory"] with 1 slots
##   ..@ address:<externalptr>

dim(bigm)
## [1] 20000 20000

bigm[1:3, 1:3]
##            [,1]     [,2]     [,3]
## [1,] -0.3623255 -0.58463 -0.23172
## [2,] -0.0011427  0.62771  0.73589
## [3,] -0.1440494 -0.59673 -1.66319

Now we can use the use the excellent irlba package as explained in the package vignette.

The first step consist of defining matrix multiplication operator which can work with big.matrix object and then use the irlba::irlba function

### vignette("irlba", package = "irlba") # for more info

matmul <- function(A, B, transpose=FALSE) {
    ## Bigalgebra requires matrix/vector arguments
    if(is.null(dim(B))) B <- cbind(B)

    if(transpose)
        return(cbind((t(B) %*% A)[]))

    cbind((A %*% B)[])
}

dim(bigm)

system.time(
S <- irlba(bigm, nu = 2, nv = 2, matmul = matmul)
)

##    user  system elapsed 
## 169.820   0.923 170.194


str(S)
## List of 5
##  $ d    : num [1:2] 283 283
##  $ u    : num [1:20000, 1:2] -0.00615 -0.00753 -0.00301 -0.00615 0.00734 ...
##  $ v    : num [1:20000, 1:2] 0.020086 0.012503 0.001065 -0.000607 -0.006009 ...
##  $ iter : num 10
##  $ mprod: num 310

I forgot to set the seed to make it reproductible but I just wanted to show that it's possible to do that in R.

EDIT

If you are using a new version of the package irlba, the above code throw an error because the matmult parameter of the function irlba has been renamed to mult. Therefore, you should change this part of the code

S <- irlba(bigm, nu = 2, nv = 2, matmul = matmul)

By

S <- irlba(bigm, nu = 2, nv = 2, mult = matmul)

I want to thank @FrankD for pointing this out.

Intermit answered 27/6, 2013 at 22:24 Comment(5)
+1 for showing a method, looks great... BUT... what about 60e3*60e3? That's 9 times the data. I wonder what the BigO is on this problem? Any ideas? :-)Viguerie
@SimonO101 In theory this should work will larger data but will just take more time (now how much ?). I'll update it as soon as possible and we'll have an idea of the BigOIntermit
Just to say that the matmul argument in irlba has been renamed to mult, which would cause the code above to throw an error.Truism
FYI: the link in "as explained here" (illposed.net/irlb.html) is no longer working.Ruckman
@JohnnyStrings Thanks, I think we can refer to the package vignette right now. I will edit my answer. Thanks againIntermit
B
2

In R 3.x+ you can construct a matrix of that size, the upper limit of vector sizes being 2^53 (or maybe 2^53-1 ), up from 2^31-1 as it was before which was why Andrie was throwing an error on his out-of-date installation. It generally takes 10 bytes per numeric element. At any rate:

> 2^53 < 10*60000^2
[1] FALSE  # so you are safe on that account.

It would also fit in 64GB (but not in 32GB):

> 64000000000 < 10*60000^2
[1] FALSE

Generally to do any serious work you need at least 3 times the size of your largest object, so this seems pretty borderline even with the new expanded vectors/matrices.

Bartholemy answered 27/6, 2013 at 21:58 Comment(4)
with R version 3.0.0 (2013-04-03)## Platform: x86_64-w64-mingw32/x64 (64-bit) I get ## Error: cannot allocate vector of size 26.8 Gb .Do I need to upgrade to 3.0.1+?Microgram
+1 for ticking Andrie off for being out of date with his R installation. Ha! :-)Viguerie
I guess it might not have been the only reason it was failing for Andrie. It probably would also have failed on my six-year-old Mac, since it is maxxed out at 32GB of RAM.Bartholemy
So the only points I get is for disparaging comments? How lame is that?Bartholemy

© 2022 - 2024 — McMap. All rights reserved.