Storing a large but low-rank matrix efficiently

Asked 14/3, 2013 at 0:46 Answered 13/2, 2014 at 14:29

I know there are packages in R to store sparse matrices efficiently. Is there also a way to store a low-rank matrix efficiently? For example:

A <- matrix(rnorm(1e6), nrow=1e5, ncol=1e1)
B <- A %*% t(A)

Now, B is too large to store in memory, but it is low in rank. Is there any way to construct and store B in an efficient way, such that some basic read methods (rowSums, colSums, etc) are performed on the fly, in order to trade for cpu or memory?

Hypogenous answered 14/3, 2013 at 0:46 Comment(6)

Interesting question- what applications would it have? (Where do low rank matrices generally appear?) – Negativism 14/3, 2013 at 2:9

@DavidRobinson: Those matrices are used, for instance, as approximations of large dense matrices (too large to compute, or even to store), in some optimization algorithms. – Wines 14/3, 2013 at 9:33

If you are willing to approximate B, could you use a low-dimensional approximation, e.g. use a SVD and keep the first n dimensions of the SVD? Not sure this is quite what you want, but might be worth considering. – Coati 15/3, 2013 at 14:47

While it doesn't answer your question, the following seems somewhat relevant: mathoverflow.net/questions/92328/low-rank-matrix-factorization – Rampage 15/3, 2013 at 15:0

Yes, I agree with above comment. Factor it, and it will become sparse. – Eleanoraeleanore 19/3, 2013 at 17:27

This might be of help: stanford.edu/class/ee378b/papers/… – Gravitation 6/9, 2013 at 13:21

Your question is already the answer: To store such a low rank matrix efficiently, you make a data structure containing both factors. If matrix-vector multiplication is required, it can be done from right to left using matrix-vector products of the factors.

One application of this strategy and data structure can be found in implementations of limited-memory Broyden or BFGS quasi-Newton methods.

Oletta answered 13/2, 2014 at 14:29 Comment(0)

Here's another approach, although I miss the experience to know how efficient this would be for large matrices:

If the rank is low, it means the matrix contains many irrelevant lines, which are linear combinations of others. If the matrix represents a linear system of equations, one could design an algorithm, which successively removes those lines.

To check if a line is irrelevant, check if the rank of the matrix without that line is still the same. For computing the matrix rank, see this and that answer.

Toxic answered 28/3, 2013 at 10:10 Comment(2)

That basically sounds like a very expensive way of factoring :-) – Hypogenous 29/3, 2013 at 22:52

Sorry, but a terribly poor idea that works only for a simple case, with replicated rows. In fact, it is TRIVIAL to generate a matrix of rank 1, that has NO replicate rows or columns. Thus pick random row vectors U and V, then U'*V has rank 1. – Gravitation 6/9, 2013 at 14:23

Recommended topics

Hot tags