Randomized SVD for LSA\LSI on Windows environment
Asked Answered
C

2

6

I am working on a project which includes the use of latent semantic analysis (LSA). This requires the usage of singular value decomposition (SVD), sometimes on large data sets. Is there an implementation of randomized-SVD (rSVD) available for Windows\Visual Studio environment? I saw a project called redsvd but it seems that it is supported on Linux only.

Crack answered 9/6, 2013 at 7:49 Comment(0)
A
3

ILNumerics might have it but I didn't see whether they do rSVD and I have no personal experience with the library but it is available through NuGet fortunately.

http://ilnumerics.net

Here are the docs on their SVD implementation:

http://ilnumerics.net/apidoc/Index.html?topic=html/Overload_ILNumerics_ILMath_svd.htm

There is also NAG, but its paid: http://www.nag.co.uk/numeric/numerical_libraries.asp

I also checked out redsvd, and I bet I could either port it to C# for you or at the very least get it to compile on windows. If those don't meet your needs let me know and I'll take a look into the complexity of the port.

UPDATE:

Well got home tonight and decided to give it a shot. Here's a really quick way to get redsvd working on Windows using Visual Studio 2010. I posted it on github:

https://github.com/hoonto/redsvdwin

Open up the rsvd3.sln in Visual Studio, build it, and you'll get a rsvd3.exe in the Debug directory.

Run that:

C:\Users\MLM\Documents\Visual Studio 2010\Projects\redsvdwin\Debug>rsvd3.exe
usage: redsvd --input=string --output=string [options] ...

redsvd supports the following format types (one line for each row)

[format=dense] (<value>+\n)+
[format=sparse] ((colum_id:value)+\n)+
Example:
>redsvd -i imat -o omat -r 10 -f dense
compuate SVD for a dense matrix in imat and output omat.U omat.V, and omat.S
with the 10 largest eigen values/vectors
>redsvd -i imat -o omat -r 3 -f sparse -m PCA
compuate PCA for a sparse matrix in imat and output omat.PC omat.SCORE
with the 3 largest principal components

options:
  -i, --input     input file (string)
  -o, --output    output file's prefix (string)
  -r, --rank      rank       (int [=10])
  -f, --format    format type (dense|sparse) See example.  (string [=dense])
  -m, --method    method (SVD|PCA|SymEigen) (string [=SVD])

And there it is. By the way, this builds the redsvdMain.cpp, if you wanted the Incr file with main it, exclude redsvdMain.cpp and include redsvdMainIncr.cpp. Since both have main's in them I just excluded the Incr version and built the regular version.

Also, I included the Eigen3 headers in the github repository as well and put them in the Additional Include's for the solution configuration, so you don't need to fiddle with that at all.

One last thing, there is no such thing as cxxabi.h to my knowledge for Visual Studio, so I did some cheating, you'll see where I've made the changes because they'll be commented like so:

//MLM: commented next 3
//...
//...
//...
//MLM: added 1
...

and so forth. So if you need to make adjustments, you'll know where my changes are.

Annul answered 17/6, 2013 at 18:7 Comment(2)
First, thanks for the effort! Second, an efficient implementation of SVD cannot be performed with Ilnumerics since there is no implementation of an economy QR decomposition...Crack
My pleasure, if you have any problems with that please feel free to either reach out on SO or via github. Interesting about Ilnumerics and the economy QR. My math is very rusty these days, but I wonder why Ilnumerics would go to such great lengths and not do that? According to Matlab it's just: "If m > n, only the first n columns of Q and the first n rows of R are computed. If m<=n, this is the same as [Q,R] = qr(A)." Maybe I don't see it, but that doesn't seem difficult for ilnumerics to implement. I must be overlooking something. Perhaps they have it on the radar for a future date.Annul
W
2

qr in ILNumerics has an overload ILMath.qr(A, outR, outE, economy) which allows to perform that economy sized decomposition.

Wellintentioned answered 19/6, 2013 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.