How to use purrr's map function to perform row-wise prop.tests and add results to the dataframe?
Asked Answered
B

3

6

I'm trying to solve the following problem in R: I have a dataframe with two variables (number of successes, and number of total trials).

# A tibble: 4 x 2
 Success     N
    <dbl> <dbl>
1     28.   40.
2     12.   40.
3     22.   40.
4      8.   40.

I would like to perform a prop.test or binom.test on each row and add the resulting list to the dataframe (or certain elements of it, like the p-value and CIs).

Ideally, I would like to add a third column with the p-values and the CI-range. My attempts so far were painly unsuccessful. Here is a minimal coding example:

Success <- c( 38, 12, 27, 9)
N <- c( 50, 50, 50, 50)
df <- as.tibble( cbind(Success, N))


df %>%
  map( ~ prop.test, x = .$Success, n = .$N)

Doesn't give the desired result. Any help would be much appreciated.

Cheers,

Luise

Bibliopegy answered 11/3, 2018 at 16:36 Comment(0)
S
6

If you want a new column, you'd use @akrun's approach but sprinkle in a little dplyr and broom amongst the purrr

library(tidyverse) # for dplyr, purrr, tidyr & co.
library(broom)
    
analysis <- df %>%
  set_names(c("x","n")) %>% 
  mutate(result = pmap(., prop.test)) %>% 
  mutate(result = map(result, tidy)) 

From there that gives you the results in a tidy nested tibble. If you want to just limit that to certain variables, you'd just follow the mutate/map applying functions to the nested frame, then unnest().

analysis %>% 
  mutate(result = map(result, ~select(.x, p.value, conf.low, conf.high))) %>% 
  unnest(cols = c(result))

# A tibble: 4 x 5
      x     n   p.value conf.low conf.high
  <dbl> <dbl>     <dbl>    <dbl>     <dbl>
1 38.0   50.0 0.000407    0.615      0.865
2 12.0   50.0 0.000407    0.135      0.385
3 27.0   50.0 0.671       0.395      0.679
4  9.00  50.0 0.0000116   0.0905     0.319
Skvorak answered 11/3, 2018 at 16:58 Comment(1)
Warning message: cols is now required.Please use cols = c(result) unnest(cols=c("result"))Cattle
W
9

We can use pmap after changing the column names with the arguments of 'prop.test'

pmap(setNames(df, c("x", "n")), prop.test)

Or using map2

map2(df$Success, df$N, prop.test)

The problem with map is that it is looping through each of the columns of the dataset and it is a list of vectors

df %>%
   map(~ .x)
#$Success
#[1] 38 12 27  9

#$N
#[1] 50 50 50 50

So, we cannot do .x$Success or .x$N

Update

As @Steven Beaupre mentioned, if we need to create new columns with p-value and confidence interval

res <- df %>%
        mutate(newcol = map2(Success, N, prop.test), 
            pval = map_dbl(newcol, ~ .x[["p.value"]]), 
            CI = map(newcol, ~ as.numeric(.x[["conf.int"]]))) %>% 
            select(-newcol) 
# A tibble: 4 x 4
#   Success     N      pval CI       
#    <dbl> <dbl>     <dbl> <list>   
#1   38.0   50.0 0.000407  <dbl [2]>  
#2   12.0   50.0 0.000407  <dbl [2]>
#3   27.0   50.0 0.671     <dbl [2]>
#4    9.00  50.0 0.0000116 <dbl [2]>

The 'CI' column is a list of 2 elements, which can be unnested to make it a 'long' format data

res %>%
   unnest

Or create 3 columns

df %>% 
  mutate(newcol = map2(Success, N,  ~ prop.test(.x, n = .y) %>% 
                  {tibble(pvalue = .[["p.value"]],
                         CI_lower = .[["conf.int"]][[1]], 
                         CI_upper = .[["conf.int"]][[2]])})) %>%
  unnest
# A tibble: 4 x 5
#  Success     N    pvalue CI_lower CI_upper
#    <dbl> <dbl>     <dbl>    <dbl>    <dbl>
#1   38.0   50.0 0.000407    0.615     0.865
#2   12.0   50.0 0.000407    0.135     0.385
#3   27.0   50.0 0.671       0.395     0.679
#4    9.00  50.0 0.0000116   0.0905    0.319
Wherefrom answered 11/3, 2018 at 16:38 Comment(1)
Handy tip: if you give map a string instead of a function it works as an extractor, so pval = map_dbl(newcol, ~ .x[["p.value"]]) can actually be pval = map_dbl(newcol, "p.value").Skvorak
S
6

If you want a new column, you'd use @akrun's approach but sprinkle in a little dplyr and broom amongst the purrr

library(tidyverse) # for dplyr, purrr, tidyr & co.
library(broom)
    
analysis <- df %>%
  set_names(c("x","n")) %>% 
  mutate(result = pmap(., prop.test)) %>% 
  mutate(result = map(result, tidy)) 

From there that gives you the results in a tidy nested tibble. If you want to just limit that to certain variables, you'd just follow the mutate/map applying functions to the nested frame, then unnest().

analysis %>% 
  mutate(result = map(result, ~select(.x, p.value, conf.low, conf.high))) %>% 
  unnest(cols = c(result))

# A tibble: 4 x 5
      x     n   p.value conf.low conf.high
  <dbl> <dbl>     <dbl>    <dbl>     <dbl>
1 38.0   50.0 0.000407    0.615      0.865
2 12.0   50.0 0.000407    0.135      0.385
3 27.0   50.0 0.671       0.395      0.679
4  9.00  50.0 0.0000116   0.0905     0.319
Skvorak answered 11/3, 2018 at 16:58 Comment(1)
Warning message: cols is now required.Please use cols = c(result) unnest(cols=c("result"))Cattle
G
0

The question mentions prop.test and binom.test but another alternative is binom::binom.confint which among other things is useful when estimating confidence intervals when you have zero successes, see here, here, and here. If using this function, the following might be useful:

library(tidyverse)
library(binom)
df %>% 
  rowwise() %>% 
  mutate(binom_test_var = list(binom.confint(x = Success, n = N, method = c("wilson")))) %>% 
  unnest(cols = c(binom_test_var))
# # A tibble: 4 × 8
#   Success     N method     x     n  mean  lower upper
#     <dbl> <dbl> <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl>
# 1      38    50 wilson    38    50  0.76 0.626  0.857
# 2      12    50 wilson    12    50  0.24 0.143  0.374
# 3      27    50 wilson    27    50  0.54 0.404  0.670
# 4       9    50 wilson     9    50  0.18 0.0977 0.308
Golightly answered 13/3 at 11:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.