Add multiple output variables using purrr and a predefined function
Asked Answered
B

4

9

Take this simple dataset and function (representative of more complex problems):

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n

Using base R's Map I could do this to add 2 new columns in a vectorised fashion:

ns <- 1:2
x[paste0("new",seq_along(ns))] <- Map(mult, x["a"], x["b"], n=ns)
x
#  a b new1 new2
#1 1 2    3    6
#2 2 3    5   10
#3 3 4    7   14

purrr attempt via pmap gets close with a list output:

library(purrr)
library(dplyr)
x %>% select(a,b) %>% pmap(mult, n=1:2)
#[[1]]
#[1] 3 6
#
#[[2]]
#[1]  5 10
#
#[[3]]
#[1]  7 14

My attempts from here with pmap_dfr etc all seem to error out in trying to map this back to new columns.

How do I end up making 2 further variables which match my current "new1"/"new2"? I'm sure there is a simple incantation, but I'm clearly overlooking it or using the wrong *map* function.

There is some useful discussion here - How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs - but it seems overly hacky and inflexible for what I imagined was a simple problem.

Breadthways answered 23/8, 2018 at 4:0 Comment(5)
This is ugly but I don't think you are looking for this. x %>% select(a,b) %>% pmap(mult, n=1:2) %>% bind_cols() %>% t()Poetry
What about this one: x[paste0("new",seq_along(ns))] <- pmap(list(x['a'], x['b'], ns), mult)?Grandniece
@Grandniece - that's a really good point that pmap/Map are directly analogous. The only problem being that I am trying to fit this into someone else's piped code. Is there a way to feed x into that pmap like the (non-working) x %>% pmap(list(a,b,ns), mult) - I'm still lost at this stage.Breadthways
I cannot think of a better way to feed a and b from x to pmap. The most closest one I an get is x %>% {list(.['a'], .['b'], ns)} %>% pmap(mult) %>% setNames(paste0('new', seq_along(ns))) %>% cbind(x).Grandniece
@Grandniece - that's not too bad. Might be worth adding as a formal answer for completion's sake.Breadthways
B
1

Coming back to answer my own question many years later after reading this question. The secret I think is to use a *map* function to pass in each value in the vector, and then call the function and bind_cols the results back: E.g.:

x %>% 
    mutate(bind_cols(map(setNames(1:2, str_c("new_",1:2)), \(x) mult(a,b,x))))
##  a b new_1 new_2
##1 1 2     3     6
##2 2 3     5    10
##3 3 4     7    14
Breadthways answered 28/5 at 4:27 Comment(0)
D
3

Here is one possibility.

library(purrr)
library(dplyr)
n <- 1:2
x %>%
    mutate(val = pmap(., mult, n = n)) %>%
    unnest() %>%
    mutate(var = rep(paste0("new", n), nrow(.) / length(n))) %>%
    spread(var, val)
#  a b new1 new2
#1 1 2    3    6
#2 2 3    5   10
#3 3 4    7   14

Not pretty, so I'm also curious to see alternatives. A lot of excess comes about from unnesting the list column and spreading into new columns.

Here is another possibility using pmap_dfc plus an ugly as.data.frame(t(...)) call

bind_cols(x, as.data.frame(t(pmap_dfc(x, mult, n = n))))
#  a b V1 V2
#1 1 2  3  6
#2 2 3  5 10
#3 3 4  7 14

Sample data

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n
Delmore answered 23/8, 2018 at 4:14 Comment(3)
Interesting. At least this answer can take the mult function as-is. I'll hold off and see if anyone else has a super-simple solution.Breadthways
@Breadthways I also feel there must be an easier & more elegant way. I've played around with a double map call (see update); still not elegant (ugly as.data.frame(t(...)) and the names still need to be fixed).Delmore
@Breadthways [update] Ok the double map call was not necessary, still have the ugly as.data.frame(t(...))...Delmore
M
3

The best approach I've found (which is still not terribly elegant) is to pipe into bind_cols. To get pmap_dfr to work correctly, the function should return a named list (which may or may not be a data frame):

library(tidyverse)

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) as.list(set_names((a + b) * n, paste0('new', n)))

x %>% bind_cols(pmap_dfr(., mult, n = 1:2))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14

To avoid changing the definition of mult, you can wrap it in an anonymous function:

mult <- function(a,b,n) (a + b) * n

x %>% bind_cols(pmap_dfr(
    ., 
    ~as.list(set_names(
        mult(...), 
        paste0('new', 1:2)
    )), 
    n = 1:2
))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14

In this particular case, it's not actually necessary to iterate over rows, though, because you can vectorize the inputs from x and instead iterate over n. The advantage is that usually n > p, so the number of iterations will be [potentially much] lower. To be clear, whether such an approach is possible depends on for which parameters the function can accept vector arguments.

mult still needs to be called on the variables of x. The simplest way to do this is to pass them explicitly:

x %>% bind_cols(map_dfc(1:2, ~mult(x$a, x$b, .x)))
#>   a b V1 V2
#> 1 1 2  3  6
#> 2 2 3  5 10
#> 3 3 4  7 14

...but this loses the benefit of pmap that named variables will automatically get passed to the correct parameter. You can get that back by using purrr::lift, which is an adverb that changes the domain of a function so it accepts a list by wrapping it in do.call. The returned function can be called on x and the value of n for that iteration:

x %>% bind_cols(map_dfc(1:2, ~lift(mult)(x, n = .x)))

This is equivalent to

x %>% bind_cols(map_dfc(1:2, ~invoke(mult, x, n = .x)))

but the advantage of the former is that it returns a function which can be partially applied on x so it only has an n parameter left, and thus requires no explicit references to x and so pipes better:

x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .)))

All return the same thing. Names can be fixed after the fact with %>% set_names(~sub('^V(\\d+)$', 'new\\1', .x)), if you like.

Marion answered 23/8, 2018 at 4:14 Comment(5)
Thanks. Good to know I wasn't overlooking something really basic. I tried fudging around with set_names but wasn't having much luck. It's a bit unfortunate that the mult function needs editing beforehand.Breadthways
You could wrap mult in an anonymous function instead: x %>% bind_cols(pmap_dfr(., ~as.list(set_names(mult(...), paste0('new', 1:2))), n = 1:2)), but at some point it's just really opaque code.Marion
purrrlyr's by_row and invoke_rows used to offer alternative ways to do these sorts of tasks: x %>% purrrlyr::invoke_rows(mult, ., n = 1:2, .collate = 'cols'), but are now sort-of deprecated. They're really interesting functions, though.Marion
I've been thinking about this more, and provided the actual function is similar enough, it would be more efficient to vectorize over rows and iterate over n, which will usually result in a [potentially much] smaller number of iterations, here 2 instead of 3: x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .))) It does depend on a function that can be vectorized across the row parameters, though.Marion
Would you like to add that last comment to the answer? I think it is useful.Breadthways
G
1

To mimic the input format for Map, we could call pmap from purrr in this way:

x[paste0("new",seq_along(ns))] <- pmap(list(x['a'], x['b'], ns), mult)

To fit this in a pipe:

x %>%
    {list(.['a'], .['b'], ns)} %>%
    pmap(mult) %>%
    setNames(paste0('new', seq_along(ns))) %>%
    cbind(x)

#   new1 new2 a b
# 1    3    6 1 2
# 2    5   10 2 3
# 3    7   14 3 4

Apparently, this looks ugly compared to the concise base R code. But I could not think of a better way.

Grandniece answered 24/8, 2018 at 13:35 Comment(0)
B
1

Coming back to answer my own question many years later after reading this question. The secret I think is to use a *map* function to pass in each value in the vector, and then call the function and bind_cols the results back: E.g.:

x %>% 
    mutate(bind_cols(map(setNames(1:2, str_c("new_",1:2)), \(x) mult(a,b,x))))
##  a b new_1 new_2
##1 1 2     3     6
##2 2 3     5    10
##3 3 4     7    14
Breadthways answered 28/5 at 4:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.