What is singular value decomposition (SVD)?

Asked 10/2, 2009 at 8:16 Answered 21/5, 2010 at 7:26

How does it actually reduce noise? Can you suggest some nice tutorials?

Skulk answered 10/2, 2009 at 8:16 Comment(3)

Rather off-topic question, if you want theory, then go to Wikipedia - they've got basic description and references. If you want help on particular programming subject then restate the question (i.e. how to use Lapack to obtain SVD of hermitian matrix, etc). – Liva 10/2, 2009 at 8:38

Please don't close. This is far more programming-related than some of the touchy-feely questions that have been on this site. – Intoxicative 10/2, 2009 at 16:10

After some thinking I have to agree, and remove the -1 :) – Liva 10/2, 2009 at 17:12

SVD can be understood from a geometric sense for square matrices as a transformation on a vector.

Consider a square n x n matrix M multiplying a vector v to produce an output vector w:

w = M*v

The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.

So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.

One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.

Intoxicative answered 10/2, 2009 at 14:36 Comment(1)

somebody just -1'd: care to explain why? – Intoxicative 29/10, 2010 at 12:28

One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.

Here's an online tutorial on SVD.

You might want to take a look at Numerical Recipes.

Illuse answered 10/2, 2009 at 12:16 Comment(1)

this is also the basic for LSA/LSI (latent semantic indexing). The theory is that the "small value" vectors are really just "noisy" perturbations of the vector. – Skyway 25/6, 2009 at 22:5

Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.

With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.

In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).

Dunkle answered 11/5, 2010 at 14:45 Comment(0)

To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices. Say, $X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices. Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.

Whitted answered 25/6, 2009 at 12:5 Comment(0)

SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * M^T where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * V^T(y) and therefore D * M^T = U * S * V^T
then D = U * S * V^T * M^T+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M^? (the ? indicates it is based on fitting).
M * M^?+ * V = V^? which allows residuals R * S² = V - V^? to be minimized, thus determining D and M.

Pretty cool, eh?

The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.

Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.

Needlewoman answered 21/5, 2010 at 7:26 Comment(0)

Recommended topics

Hot tags