Difference map() vs. map_dfr() in R
Asked Answered
D

2

10

While playing around with the purrr package of the Tidyverse in R, I saw that the map() function returns a list.

library(tidyverse)
set.seed(123)

map(1:5, ~rnorm(3))
#> [[1]]
#> [1] -0.5604756 -0.2301775  1.5587083
#> 
#> [[2]]
#> [1] 0.07050839 0.12928774 1.71506499
#> 
#> [[3]]
#> [1]  0.4609162 -1.2650612 -0.6868529
#> ......

I want to convert this list to a data frame with 3 columns. One option would be using do.call(rbind, .). However, I also noticed that the map_dfr() function existed.

Using this function in the same way as the map() provides an error.

map_dfr(1:5, ~rnorm(3))
#> Error: Argument 1 must have names.

Question

What are the differences between the map() and the map_dfr() functions that lead to this error? And how should you use the map_dfr() function to bind the rows directly in the mapping function?

Douse answered 8/3, 2021 at 15:57 Comment(2)
Try t(mapply(rnorm, 3, 1:5))Salpiglossis
A recent example of a map_dfr answer - https://mcmap.net/q/1164854/-loop-through-function-and-stack-the-output-into-a-dataset-in-r/10276092Parget
M
12

map_dfr directly binds the rows when map output is a dataframe or a named vector:

library(dplyr)
library(purrr)

map(1:5, ~as.data.frame(matrix(rnorm(3),nrow=1)))
[[1]]
        V1        V2        V3
1 1.326029 0.4581257 0.4367454

[[2]]
          V1         V2        V3
1 -0.3769822 -0.2488601 -1.441538

[[3]]
          V1       V2         V3
1 -0.4931225 1.145818 -0.6269974

[[4]]
         V1       V2       V3
1 -1.679398 1.035032 1.784175

[[5]]
         V1        V2        V3
1 0.5158901 -2.322314 -1.145897

map_dfr(1:5, ~as.data.frame(matrix(rnorm(3),nrow=1)))
           V1         V2          V3
1  0.29250530 -0.8325543  0.21013608
2  1.03348415  0.3333718 -0.08498664
3  1.01011329  0.6583516 -0.49360421
4 -0.06229409 -0.1200969  0.06078136
5 -1.92491929  0.3891900 -0.57046411

It's equivalent to :

map(1:5, ~as.data.frame(matrix(rnorm(3),nrow=1))) %>% bind_rows

You get an error because you provide to map_dfr a vector without column names.
This works:

map_dfr(1:5, ~setNames(rnorm(3),LETTERS[1:3])) 
# A tibble: 5 x 3
       A      B      C
   <dbl>  <dbl>  <dbl>
1 -0.360 -1.36   1.40 
2  0.715  1.55   0.381
3  1.20  -0.179  0.315
4  0.126 -0.467  1.04 
5  1.31   0.375 -2.21 
Miquelmiquela answered 8/3, 2021 at 16:13 Comment(1)
Interestingly, map_dfr binds columns if you provide a named input and return a vector: set_names(3:5) %>% map_dfr(~1:3)Outpost
T
0

An alternative way to use map_dfr. I like to use it this way because it allows for other possibilities outside of the scope of the OP's question.

library(tidyverse)

df <- map_dfr(1:5, function(x) {
       r <- rnorm(3)
       tibble(A = r[1], B = r[2], C = r[3])
})
Thebault answered 23/7, 2022 at 20:18 Comment(1)
The answer doesn't disclosure a difference between map_dfr and map functions. Try to highlight use cases, or performance issues, or any aspects that were not describe in the answers above.Grownup

© 2022 - 2025 — McMap. All rights reserved.