How to use princomp () function in R when covariance matrix has zero's?
Asked Answered
B

1

10

While using princomp() function in R, the following error is encountered : "covariance matrix is not non-negative definite".

I think, this is due to some values being zero (actually close to zero, but becomes zero during rounding) in the covariance matrix.

Is there a work around to proceed with PCA when covariance matrix contains zeros ?

[FYI : obtaining the covariance matrix is an intermediate step within the princomp() call. Data file to reproduce this error can be downloaded from here - http://tinyurl.com/6rtxrc3]

Buttonball answered 19/12, 2011 at 13:50 Comment(13)
Adding a sample input to make the the problem reproducible is useful to answerers.Dovev
If you look at stats:::princomp.default you'll see that the error occurs when you have negative eigenvalues in the covariance matrix.Dovev
@ Richie Cotton : I wish I can provide. My data is huge (10K x 10K) and I haven't figured out the part that is causing the error. I will be happy to know if there is a way in which I can extract troubling part of data and post it here !Buttonball
cv <- matrix(c(1, 2, 2, 1), nrow = 2); princomp(covmat = cv) reproduces the error. Don't know how relevant it is to your dataset.Dovev
Thanks ! I could reproduce it from a small portion of original matrix, 1K x 1K (file size 5.5 MB). I was wondering how to post it. I am sure some elements of covariance matrix will be zero (or close to it) as my input data has large chunks of identical values.Buttonball
You can throw it up onto github or another free data hosting site. Google brings up many that claim to offer this service, I don't have any experience with them though so can't vouch one way or another.Sussna
@Chase: Thanks, I put up the file online .Buttonball
@user1029725, it is not trivial to read the file using read.table. Do something like 'm = matrix(runif(4), nrow = 2, ncol = 2)' and then 'write.table(m, file = "bla")Bisayas
Updated input file. It is a symmetric matrix, so only lower triangle is uploaded !Buttonball
To transform a triangular matrix to full matrix (as required for princomp() function), I usually replace NA's to 0's first and then add the transpose of the matrix to original matrix (some thing like this - matrix <- matrix + t(matrix))Buttonball
It sounds like your matrix might be singular - i.e., some column is a linear combination of the others. Zeroes in the covariance matrix are a good thing, but those zeroes come from columns of data being independent. If, as you say, your data has large sections of identical values, then your covariance matrix will have large entries, not small ones. You might need to remove columns that are redundant before you can use princomp.Danielladanielle
@Wesley: It is quite possible that some of the columns are linear combination of others. But removing redundant columns is removing a variable from the data analysis, which is not desirable I guess. Anyways, I will give it a try. Btw, how to remove redundant columns ?Buttonball
@DWin has some good suggestions in his answer below.Danielladanielle
U
9

The first strategy might be to decrease the tolerance argument. Looks to me that princomp won't pass on a tolerance argument but that prcomp does accept a 'tol' argument. If not effective, this should identify vectors which have nearly-zero covariance:

nr0=0.001
which(abs(cov(M)) < nr0, arr.ind=TRUE)

And this would identify vectors with negative eigenvalues:

which(eigen(M)$values < 0)

Using the h9 example on the help(qr) page:

> which(abs(cov(h9)) < .001, arr.ind=TRUE)
      row col
 [1,]   9   4
 [2,]   8   5
 [3,]   9   5
 [4,]   7   6
 [5,]   8   6
 [6,]   9   6
 [7,]   6   7
 [8,]   7   7
 [9,]   8   7
[10,]   9   7
[11,]   5   8
[12,]   6   8
[13,]   7   8
[14,]   8   8
[15,]   9   8
[16,]   4   9
[17,]   5   9
[18,]   6   9
[19,]   7   9
[20,]   8   9
[21,]   9   9
> qr(h9[-9,-9])$rank  
[1] 7                  # rank deficient, at least at the default tolerance
> qr(h9[-(8:9),-(8:9)])$ take out only the vector  with the most dependencies
[1] 6                   #Still rank deficient
> qr(h9[-(7:9),-(7:9)])$rank
[1] 6

Another approach might be to use the alias function:

alias( lm( rnorm(NROW(dfrm)) ~ dfrm) )
Underclothing answered 19/12, 2011 at 16:31 Comment(1)
Nice. I hadn't come across alias before.Dovev

© 2022 - 2024 — McMap. All rights reserved.