I'm trying to do a PCA analysis of my data using R, and I found this nice guide, using prcomp
and ggbiplot
. My data is two sample types with three biological replicates each (i.e. 6 rows) and around 20000 genes (i.e. variables). First, getting the PCA model with the code described in the guide doesn't work:
>pca=prcomp(data,center=T,scale.=T)
Error in prcomp.default(data, center = T, scale. = T) :
cannot rescale a constant/zero column to unit variance
However, if I remove the scale. = T
part, it works just fine and I get a model. Why is this, and is this the cause of the error below?
> summary(pca)
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 4662.8657 3570.7164 2717.8351 1419.3137 819.15844
Proportion of Variance 0.4879 0.2861 0.1658 0.0452 0.01506
Cumulative Proportion 0.4879 0.7740 0.9397 0.9849 1.00000
Secondly, plotting the PCA. Even just using the basic code, I get an error and an empty plot image:
> ggbiplot(pca)
Error: invalid 'rot' value
What does this mean, and how can I fix it? Does it have something to do with the (non)scale in making the PCA, or is it something different? It must be something with my data, I think, since if I use a standard example code (below) I get a really nice PCA plot.
> data(wine)
> wine.pca=prcomp(wine,scale.=T)
> print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class,
ellipse = TRUE, circle = TRUE))
[EDIT 1] I have tried subsetting my data in two ways: 1) remove all columns were all rows are 0, and 2) remove all columns were any rows are 0. The first subsetting still gives me the scale
error, but not the ones that have removed columns with any 0's. Why is this? How does this affect my PCA?
Also, I tried doing using the normal biplot
command for both the original data (non-scaled) and the subsetted data above, and it works in both cases. So it's something to do with with ggbiplot
?
[EDIT 2] I have uploaded a subset of my data that gives me the error when I don't remove all the zeroes and works when I do. I haven't used gist before, but I think this is it. Or this...
dput
of your dataset on gist? Or if it is large, a subset that still produces the error? It is difficult to try and diagnose a problem that we can't reproduce. – Omentumprcomp
andggbiplot
ran without error. – Omentumprcomp
on this data as-is. What I'm interested in is a PCA with the 10k variables (or however many variables i subset to) with the 20 or so different sample types. Doesprcomp
for the transposed dataset work for you? – Religiose