Row-wise cor() on subset of columns using dplyr::mutate()
Asked Answered
A

2

3
set.seed(8)
df <- data.frame(
  A=sample(c(1:3), 10, replace=T), 
  B=sample(c(1:3), 10, replace=T),
  C=sample(c(1:3), 10, replace=T),
  D=sample(c(1:3), 10, replace=T),
  E=sample(c(1:3), 10, replace=T), 
  F=sample(c(1:3), 10, replace=T))

Would like to pass a subset of columns into a dplyr mutate() and make a row-wise calculation, for instance cor() to get correlation between column A-C and D-F, but cannot figure out how. Found SO inspiration here, here and here, but nevertheless failed to produce an acceptable code. For instance, I tried this:

require(plyr)
require(dplyr)
df %>%
  rowwise() %>%
  mutate(c=cor(.[[1:3]],.[[4:6]]))
Ashton answered 2/3, 2015 at 10:17 Comment(2)
You'll probably will have to use do in order to run cor.Prim
@arun: it worked fine - I'll accept it if you put it as an answer.Ashton
D
6

You could try

df %>% 
   rowwise() %>% 
   do(data.frame(., Cor=cor(unlist(.[1:3]), unlist(.[4:6]))))
Doggery answered 2/3, 2015 at 10:27 Comment(0)
S
1

Here is another solution from FAY (2017).

> library(tidystringdist)
> comb <- tidy_comb_all(names(airquality))
> comb
# A tibble: 15 x 2
   V1      V2     
 * <chr>   <chr>  
 1 Ozone   Solar.R
 2 Ozone   Wind   
 3 Ozone   Temp   
 4 Ozone   Month  
 5 Ozone   Day    
 6 Solar.R Wind   
 7 Solar.R Temp   
 8 Solar.R Month  
 9 Solar.R Day    
10 Wind    Temp   
11 Wind    Month  
12 Wind    Day    
13 Temp    Month  
14 Temp    Day    
15 Month   Day    

We get the combination of the pairs.

> bulk_cor <-
+   comb %>%
+   pmap(~ cor.test(airquality[[.x]], airquality[[.y]])) %>%
+   map_df(broom::tidy) %>%
+   bind_cols(comb, .)
> bulk_cor
# A tibble: 15 x 10
   V1      V2      estimate statistic  p.value parameter conf.low conf.high method       alternative
   <chr>   <chr>      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <fct>        <fct>      
 1 Ozone   Solar.R  0.348      3.88   1.79e- 4       109   0.173     0.502  Pearson's p~ two.sided  
 2 Ozone   Wind    -0.602     -8.04   9.27e-13       114  -0.706    -0.471  Pearson's p~ two.sided  
 3 Ozone   Temp     0.698     10.4    2.93e-18       114   0.591     0.781  Pearson's p~ two.sided  
 4 Ozone   Month    0.165      1.78   7.76e- 2       114  -0.0183    0.337  Pearson's p~ two.sided  
 5 Ozone   Day     -0.0132    -0.141  8.88e- 1       114  -0.195     0.169  Pearson's p~ two.sided  
 6 Solar.R Wind    -0.0568    -0.683  4.96e- 1       144  -0.217     0.107  Pearson's p~ two.sided  
 7 Solar.R Temp     0.276      3.44   7.52e- 4       144   0.119     0.419  Pearson's p~ two.sided  
 8 Solar.R Month   -0.0753    -0.906  3.66e- 1       144  -0.235     0.0882 Pearson's p~ two.sided  
 9 Solar.R Day     -0.150     -1.82   7.02e- 2       144  -0.305     0.0125 Pearson's p~ two.sided  
10 Wind    Temp    -0.458     -6.33   2.64e- 9       151  -0.575    -0.323  Pearson's p~ two.sided  
11 Wind    Month   -0.178     -2.23   2.75e- 2       151  -0.328    -0.0202 Pearson's p~ two.sided  
12 Wind    Day      0.0272     0.334  7.39e- 1       151  -0.132     0.185  Pearson's p~ two.sided  
13 Temp    Month    0.421      5.70   6.03e- 8       151   0.281     0.543  Pearson's p~ two.sided  
14 Temp    Day     -0.131     -1.62   1.08e- 1       151  -0.283     0.0287 Pearson's p~ two.sided  
15 Month   Day     -0.00796   -0.0978 9.22e- 1       151  -0.166     0.151  Pearson's p~ two.sided  

Now you can use dplyr::filter to subset the results you want.

Biboligraphy

FAY, Colin. 2017. “A Crazy Little Thing Called purrr - Part 6 : Doing Statistics.” https://colinfay.me/purrr-statistics/.

Stationary answered 15/6, 2018 at 9:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.