Row-wise iteration like apply with purrr

Asked 23/10, 2017 at 22:24 Answered 14/10, 2022 at 21:6

How do I achieve row-wise iteration using purrr::map?

Here's how I'd do it with a standard row-wise apply.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- apply(df, 1, function(x){
            var1 <- (x[['a']] + x[['b']])
            var2 <- x[['c']]/2
            return(data.frame(var1 = var1, var2 = var2))
          })

However, this is not too elegant, and I would rather do it with purrr. May (or may not) be faster, too.

Callipash answered 23/10, 2017 at 22:24 Comment(0)

You can use pmap for row-wise iteration. The columns are used as the arguments of whatever function you are using. In your example you would have a three-argument function.

For example, here is pmap using an anonymous function for the work you are doing. The columns are passed to the function in the order they are in the dataset.

pmap(df, function(a, b, c) {
     data.frame(var1 = a + b,
                var2 = c/2) 
     }  )

You can use the purrr tilde "short-hand" for an anonymous function by referring to the columns in order with numbers preceded by two dots.

pmap(df, ~data.frame(var1 = ..1 + ..2,
                var2 = ..3/2)  )

If you want to get these particular results as a data.frame instead of a list, you can use pmap_dfr.

Trude answered 23/10, 2017 at 22:40 Comment(4)

In the first example, what do I do if the df has 100 columns and I only want to manipulate the 90th one? I understand I can refer to it by index number, but I would like to refer to it by name. – Callipash 29/10, 2017 at 18:34

@Callipash If you only want to use a single column, other tools might be more appropriate (e.g., dplyr::mutate). However, the documentation for pmap points out that you can always use ... to "absorb unused components of input [the] list". So if the column of interest was named "c", something like pmap(df, function(c, ...) {data.frame(var1 = c/2) }) would work. – Trude 1/11, 2017 at 23:22

what is ... used for? – Roice 19/4, 2021 at 3:5

@AlvaroMorales It takes all of the rest of the column names so you don't need to refer to every single column name in pmap(). There is an example in the documentation Examples section of the map family of functions that you might find useful! – Trude 19/4, 2021 at 14:22

Note that you're using only vectorized operations in your example so you could very well do :

df %>% dplyr::transmute(var1 = a+b,var2 = c/2)

(or in base R: transform(df,var1 = a+b,var2 = c/2)[4:5])

If you use non vectorized functions such as median you can use pmap as in @aosmith 's answer, or use dplyr::rowwise.

rowwise is slower and the package maintainers advise to use the map family instead, but it's arguably easier on the eye than pmap in some cases. I personally still use it when speed isn't an issue:

library(dplyr)
df %>% transmute(var3 = pmap(.,~median(c(..1,..2,..3))))
df %>% rowwise %>% transmute(var3 = median(c(a,b,c)))

(to go back to a strict unnamed list output : res %>% split(seq(nrow(.))) %>% unname)

Niels answered 27/10, 2017 at 9:57 Comment(0)

You can use pmap and the ... in combination which for me is the best solution because I dont need to specify the parameters.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- df %>%
   pmap(function(...) {
       x <- tibble(...)
      return(tibble(var1 = x$a + x$b, var2 = x$c/2))
   })

Triumvirate answered 23/11, 2021 at 1:34 Comment(0)

You are free to always make a wrapper around a function you "like".

rmap <- function (.x, .f, ...) {
    if(is.null(dim(.x))) stop("dim(X) must have a positive length")
    .x <- t(.x) %>% as.data.frame(.,stringsAsFactors=F)
    purrr::map(.x=.x,.f=.f,...)
}

apply the new function rmap (rowwisemap)

rmap(df1,~{
    var1 <- (.x[[1]] + .x[[2]])
    var2 <- .x[[3]]/2
    return(data.frame(var1 = var1, var2 = var2))
    })

Additional Info: (eval from top to bottom)

df1 <- data.frame(a=1:3,b=1:3,c=1:3)
m   <- matrix(1:9,ncol=3)

apply(df1,1,sum)
rmap(df1,sum)

apply(m,1,sum)
rmap(m,sum)

apply(1:10,1,sum)  # intentionally throws an error
rmap(1:10,sum)     # intentionally throws an error

Pastry answered 25/7, 2018 at 9:45 Comment(0)

You can also use group_nest() to access each row as a one-row-tibble:

library(tidyverse)
df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

df %>% 
    group_nest(row_number()) %>% 
    pull(data) %>% 
    map(function(x) transmute(x,
                                 var1 = a + b,
                                 var2 = c/2))

Grosz answered 12/1, 2022 at 14:8 Comment(0)

I like (and upvoted) the group_nest answer by @rasmus-larsen, but I think it's cleaner to use group_by and group_map:

library(tidyverse)
df <- data.frame(a = 1:10, b = 11:20, c = 21:30)
lst_result <- df %>% 
  group_by(row_number()) %>%
  group_map(function(x, i) {
    x %>% transmute(
      var1 = a + b,
      var2 = c/2
    )
  })

But, why not simply do the following? It would be faster.

transmute(df, var1 = a + b, var2 = c/2)

Pedagogics answered 14/10, 2022 at 21:6 Comment(0)

Recommended topics

Hot tags