I am trying to collate results from a simulation study using dplyr and purrr. My results are saved as a list of data frames with the results from several different classification algorithms, and I'm trying to use purrr and dplyr to summarize these results.
I'm trying to calculate - number of objects assigned to each cluster - number of objects in the cluster that actually belong to the cluster - number of true positives, false positives, false negatives, and true negatives using 3 different algorithms (KEEP1 - KEEP3) - for 2 of the algorithms, I have access to a probability of being in the cluster, so I can compare this to alternate choices of alpha - and so I can calculate true positives etc. using a different choice of alpha.
I found this: https://github.com/tidyverse/dplyr/issues/3101, which I used successfully on a single element of the list to get exactly what I wanted:
f <- function(.x, .y) {
sum(.x & .y)
}
actions <- list(
.vars = lst(
c('correct'),
c('KEEP1', 'KEEP2', 'KEEP3'),
c('pval1', 'pval2')
),
.funs = lst(
funs(Nk = length, N_correct = sum),
funs(
TP1 = f(., .y = correct),
FN1 = f(!(.), .y = correct),
TN1 = f(!(.), .y = !(correct)),
FP1 = f(., .y = !(correct))
),
funs(
TP2 = f((. < alpha0) , .y = correct),
FN2 = f(!(. < alpha0), .y = correct),
TN2 = f(!(. < alpha0), .y = !(correct)),
FP2 = f((. < alpha0), .y = !(correct))
)
)
)
reproducible_data <- replicate(2,
data_frame(
k = factor(rep(1:10, each = 20)), # group/category
correct = sample(x = c(TRUE, FALSE), 10 * 20, replace = TRUE, prob = c(.8, .2)),
pval1 = rbeta(10 * 20, 1, 10),
pval2 = rbeta(10 * 20, 1, 10),
KEEP1 = pval1 < 0.05,
KEEP2 = pval2 < 0.05,
KEEP3 = runif(10 * 20) > .2,
alpha0 = 0.05,
alpha = 0.05 / 20 # divided by no. of objects in each group (k)
),
simplify = FALSE)
# works
df1 <- reproducible_data[[1]]
pmap(actions, ~df1 %>% group_by(k) %>% summarize_at(.x, .y)) %>%
reduce(inner_join,by = 'k')
Now, I want to use map to do this to the entire list. However, I can no longer access the variable "correct" (it hasn't gotten far enough to not see alpha or alpha0, but presumably the same issue will occur). I'm still learning dplyr/purrr, but my experimenting hasn't proved useful.
# does not work
out_summary <- map(
reproducible_data,
pmap(actions, ~ as_tibble(.) %>% group_by("k") %>% summarize_at(.x, .y)) %>%
reduce(inner_join,by = 'k')
)
# this doesn't either
out_summary <- map(
reproducible_data,
pmap(actions, ~ as_tibble(.) %>% group_by("k") %>% summarize_at(.x, .y, alpha = alpha, alpha0 = alpha0, correct = correct)) %>%
reduce(inner_join,by = 'k')
)
Within map, I don't see the variable 'k' in $group_by(k)$ unless it is quoted $group_by('k')$, but I do not need to quote it when I just used pmap. I've tried various ways to pass the correct variables to these functions, but I'm still learning dplyr and purrr, and haven't succeeded yet.
One more note - the actual data is stored as a regular data frame, so I need $as_tibble()$ in the pmap function. I was running into some different errors when I removed it in this example, so I opted to add it back so I would get the same issues. Thanks!
inner_join
a bit. β Afforestgroup_by("k")
does. β Afforest