how do i calculate correlation between corresponding columns of two matrices and not getting other correlations as output
Asked Answered
H

4

8

I have these data

> a
     a    b    c
1    1   -1    4
2    2   -2    6
3    3   -3    9
4    4   -4   12
5    5   -5    6

> b
     d    e    f
1    6   -5    7
2    7   -4    4
3    8   -3    3
4    9   -2    3
5   10   -1    9

> cor(a,b)
           d            e             f
a  1.0000000    1.0000000     0.1767767
b -1.0000000    -1.000000    -0.1767767
c  0.5050763    0.5050763    -0.6964286

The result I want is just:

cor(a,d) = 1
cor(b,e) = -1
cor(c,f) = -0.6964286
Hallowell answered 15/7, 2011 at 22:49 Comment(0)
S
4

I would probably personally just use diag:

> diag(cor(a,b))
[1]  1.0000000 -1.0000000 -0.6964286

But you could also use mapply:

> mapply(cor,a,b)
         a          b          c 
 1.0000000 -1.0000000 -0.6964286
Somite answered 15/7, 2011 at 23:33 Comment(0)
M
12

The first answer above calculates all pairwise correlations, which is fine unless the matrices are large, and the second one doesn't work. As far as I can tell, efficient computation must be done directly, such as this code borrowed from borrowed from the arrayMagic Bioconductor package, works efficiently for large matrices:

> colCors = function(x, y) { 
+   sqr = function(x) x*x
+   if(!is.matrix(x)||!is.matrix(y)||any(dim(x)!=dim(y)))
+     stop("Please supply two matrices of equal size.")
+   x   = sweep(x, 2, colMeans(x))
+   y   = sweep(y, 2, colMeans(y))
+   cor = colSums(x*y) /  sqrt(colSums(sqr(x))*colSums(sqr(y)))
+   return(cor)
+ }

> set.seed(1)
> a=matrix(rnorm(15),nrow=5)
> b=matrix(rnorm(15),nrow=5)
> diag(cor(a,b))
[1]  0.2491625 -0.5313192  0.5594564
> mapply(cor,a,b)
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> colCors(a,b)
[1]  0.2491625 -0.5313192  0.5594564
Mcintosh answered 15/11, 2011 at 20:33 Comment(1)
Is it possible to add p-values and also adjusted p-values for multiple comparisons?Gittens
S
4

I would probably personally just use diag:

> diag(cor(a,b))
[1]  1.0000000 -1.0000000 -0.6964286

But you could also use mapply:

> mapply(cor,a,b)
         a          b          c 
 1.0000000 -1.0000000 -0.6964286
Somite answered 15/7, 2011 at 23:33 Comment(0)
G
2

mapply works with data frames but not matrices. That is because in data frames each column is an element, while in matrices each entry is an element.

In the answer above mapply(cor,as.data.frame(a),as.data.frame(b)) works just fine.

set.seed(1)
a=matrix(rnorm(15),nrow=5)
b=matrix(rnorm(15),nrow=5)
diag(cor(a,b))
[1]  0.2491625 -0.5313192  0.5594564
mapply(cor,as.data.frame(a),as.data.frame(b))
    V1         V2         V3 
 0.2491625 -0.5313192  0.5594564 

This is much more efficient for large matrices.

Grandam answered 8/3, 2018 at 21:35 Comment(0)
P
1

For larger matrices Rfast::corpairs is a bit faster than mapply.

library(Rfast)

a <- matrix(rnorm(1e6), 1e3)
b <- matrix(rnorm(1e6), 1e3)

microbenchmark::microbenchmark(
  diag = diag(cor(a, b)),
  mapply = as.numeric(mapply(cor,as.data.frame(a),as.data.frame(b))),
  corpairs = corpairs(a, b),
  check = "equal"
)
#> Unit: milliseconds
#>      expr       min         lq       mean     median         uq       max neval
#>      diag 1294.1268 1303.72885 1313.41297 1306.84705 1314.44330 1426.1978   100
#>    mapply   49.4884   54.91380   58.52463   57.11995   60.49635  115.6425   100
#>  corpairs    8.6714   12.57265   15.29742   14.81435   16.18880   65.3170   100
Personnel answered 15/4, 2023 at 23:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.