Pairwise Correlation Table
Asked Answered
K

4

16

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.

Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.

Koerlin answered 21/11, 2012 at 3:42 Comment(3)
@mnel - ?cor does not give p-values though. rcorr from the Hmisc library will however do so.Hawsepipe
@Hawsepipe ?cor will give a link to cor.test which will (If the OP follows the suggestion, he should find that).Rathenau
@Rathenau - though to my knowledge cor.test does not operate on a matrix/data.frame simply like cor does.Hawsepipe
M
11

Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:

cor.prob <- function(X, dfr = nrow(X) - 2) {
  R <- cor(X)
  above <- row(R) < col(R)
  r2 <- R[above]^2
  Fstat <- r2 * dfr / (1 - r2)
  R[above] <- 1 - pf(Fstat, 1, dfr)

  cor.mat <- t(R)
  cor.mat[upper.tri(cor.mat)] <- NA
  cor.mat
}

So let's test it out:

set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)

          [,1]      [,2]      [,3]      [,4] [,5]
[1,] 1.0000000        NA        NA        NA   NA
[2,] 0.7005361 1.0000000        NA        NA   NA
[3,] 0.5990483 0.6816955 1.0000000        NA   NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000   NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835    1

Does that line up with cor.test?

cor.test(data[,2], data[,3])

 Pearson's product-moment correlation
data:  data[, 2] and data[, 3] 
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.3603246  0.5178982 
sample estimates:
       cor 
0.09778865 

Seems to work ok.

Markman answered 21/11, 2012 at 5:21 Comment(4)
This definitely seems like it'll be a good solution, however, I'm having trouble figuring out how to first construct a matrix out of my data. As I said, I have a data frame with ~400 variables (columns) and 371 subjects (rows), but doing something like cor.prob(matrix(var10, var11)) seems to be giving me correlations between each row in those variables, instead of one coefficient representing the correlation between the two variables.Koerlin
So I've figured out a solution to my last comment—it looks like I should be doing something like this for my data: cor(cbind(data$var1, data$var2), use="complete.obs") However, I'm having trouble extending your code to include use="complete.obs". Any ideas on this?Koerlin
@Koerlin Try using na.omit on the data frame to create a new data frame and see if that fixes your issue.Markman
This worked well, and is producing the results I was looking for. Thanks again for your help!Koerlin
P
5

Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.

  pairwiseCor <- function(dataframe){
  pairs <- combn(names(dataframe), 2, simplify=FALSE)
  df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)), 
                   AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
  for(i in 1:length(pairs)){
    df[i,1] <- pairs[[i]][1]
    df[i,2] <- pairs[[i]][2]
    df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
    df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
  }
  pairwiseCorDF <- df
  pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
  row.names(pairwiseCorDF) <- 1:length(pairs)
  pairwiseCorDF <<- pairwiseCorDF
  pairwiseCorDF
  }

This is what the output is:

 > head(pairwiseCorDF)
             Vairable1        Variable2 AbsCor     Cor
    1        roll_belt     accel_belt_z 0.9920 -0.9920
    2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
    3        roll_belt total_accel_belt 0.9811  0.9811
    4 total_accel_belt     accel_belt_z 0.9752 -0.9752
    5       pitch_belt     accel_belt_x 0.9658 -0.9658
    6 gyros_dumbbell_z  gyros_forearm_z 0.9491  0.9491
Prurient answered 24/8, 2014 at 4:20 Comment(0)
D
4

I've found that the R package picante does a nice job dealing with the problem that you have. You can easily pass your dataset to the cor.table function and get a table of correlations and p-values for all of your variables. You can specify Pearson's r or Spearman in the function. See this link for help: http://www.inside-r.org/packages/cran/picante/docs/cor.table

Also remember to remove any non-numeric columns from your dataset prior to running the function. Here's an example piece of code:

install.packages("picante")
library(picante)
#Insert the name of your dataset in the code below
cor.table(dataset, cor.method="pearson")
Disapprove answered 22/6, 2015 at 18:42 Comment(0)
W
3

You can use the sjt.corr function of the sjPlot-package, which gives you a nicely formatted correlation table, ready for use in your Office application.

Simplest function call is just to pass the data frame:

sjt.corr(df)

See examples here.

Walley answered 2/7, 2015 at 12:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.