Mutate columns based on value in a vector

Asked 28/5, 2024 at 2:1 Answered 28/5, 2024 at 4:20

I would like to mutate new columns onto a dataframe using a function where the inputs to the function come from a vector. Like this:

library(tidyverse)

# sessionInfo()
# R version 4.3.2 (2023-10-31)

set.seed(123)  # For reproducibility

mydf <- data.frame(user_id = 1:100) |> 
  expand_grid(data.frame(day = 1:365)) |> 
  mutate(logins = floor(abs(rnorm(1,10,10))))

lambdas <- c(0.01, 0.05, 0.1)

mydf <- mydf |> 
  mutate(
    lambda_logins_01 = logins * exp(-lambdas[1] * day),
    lambda_logins_05 = logins * exp(-lambdas[2] * day),
    lambda_logins_1 = logins * exp(-lambdas[3] * day),
    )

Except, instead of writing out "mutate( lambda_logins_01 ..." I wanted to do it more elegantly using something like map.

I would like to use the native pipe and refer to cur_data() or equivilent, as opposed to refering to mydf, i.e. I want to use the pipe in a traditional sense, referring to the current state of data I'm working on.

Desired result would be similar as in my example code, with the new columns named based on the values of lambda, except I wouldnt have to write out each mutate line manually.

Amatory answered 28/5, 2024 at 2:1 Comment(1)

I asked a very similar question years ago: #51978638 if you'd like to see some other approaches. – Sewole 28/5, 2024 at 3:27

You could use outer:

mydf %>%
   mutate(data.frame(logins * exp(outer(day, -lambdas))) %>% 
            setNames(str_c('lambda_logins_', lambdas)))

# A tibble: 36,500 × 6
   user_id   day logins lambda_logins_0.01 lambda_logins_0.05 lambda_logins_0.1
     <int> <int>  <dbl>              <dbl>              <dbl>             <dbl>
 1       1     1      4               3.96               3.80              3.62
 2       1     2      4               3.92               3.62              3.27
 3       1     3      4               3.88               3.44              2.96
 4       1     4      4               3.84               3.27              2.68
 5       1     5      4               3.80               3.12              2.43
 6       1     6      4               3.77               2.96              2.20
 7       1     7      4               3.73               2.82              1.99
 8       1     8      4               3.69               2.68              1.80
 9       1     9      4               3.66               2.55              1.63
10       1    10      4               3.62               2.43              1.47

Edit: If you do not mind using the superseeded map_dfc

mutate(mydf, map_dfc(set_names(lambdas), ~logins * exp(-.x*day)))

Enigmatic answered 28/5, 2024 at 2:59 Comment(6)

Nice, similar idea but much neater. – Soiree 28/5, 2024 at 3:5

Thanks this works! Reading ?outer, and tryign to read whats happening here. You can pass a df straight to mutate() without specifying name = value for a field? huh, that's new for me. – Amatory 28/5, 2024 at 3:39

Great use of outer, but using data.frame, setNames within mutate doesn't feel like it is tidyverse, is there a way to make it more tidy? – Gender 28/5, 2024 at 8:4

@Gender consider mutate(mydf, map_dfc(set_names(lambdas), ~logins * exp(-.x*day))) – Enigmatic 28/5, 2024 at 8:12

Nice, this is much better (tidier), thank you. – Gender 28/5, 2024 at 8:13

@Gender though map_dfc is superseded. You can still go around that by using map` + bind_cols – Enigmatic 28/5, 2024 at 8:14

This can be done without needing to iterate by taking advantage of vectorization and recycling.

library(tidyverse)

mydf |> 
  mutate(lambda_logins = data.frame(t(set_names(lambdas))),
         lambda_logins = logins * exp(-lambda_logins * day)) |>
  unpack(lambda_logins, names_sep = "_", names_repair = ~ sub("X0.", "", .x, fixed = TRUE))

# A tibble: 36,500 × 6
   user_id   day logins lambda_logins_01 lambda_logins_05 lambda_logins_1
     <int> <int>  <dbl>            <dbl>            <dbl>           <dbl>
 1       1     1      4             3.96             3.80            3.62
 2       1     2      4             3.92             3.62            3.27
 3       1     3      4             3.88             3.44            2.96
 4       1     4      4             3.84             3.27            2.68
 5       1     5      4             3.80             3.12            2.43
 6       1     6      4             3.77             2.96            2.20
 7       1     7      4             3.73             2.82            1.99
 8       1     8      4             3.69             2.68            1.80
 9       1     9      4             3.66             2.55            1.63
10       1    10      4             3.62             2.43            1.47
# ℹ 36,490 more rows
# ℹ Use `print(n = ...)` to see more rows

Soiree answered 28/5, 2024 at 2:58 Comment(0)

I think this is nice and transferrable to any type of function where you have to loop over a shorter vector:

Using map to pass each element of lambdas in, which is then used as a single value to recycle against the other vectors needed for the function. The naming of the columns is done by using a named vector of lambdas:

fun <- function(logins,lambdas,day) logins * exp(-lambdas * day)
mydf %>%
    mutate(bind_cols(map(setNames(lambdas, str_c("lambda_logins_", lambdas)),
                  \(x) fun(logins,x,day))))

### A tibble: 36,500 × 6
##   user_id   day logins lambda_logins_0.01 lambda_logins_0.05 lambda_logins_0.1
##     <int> <int>  <dbl>              <dbl>              <dbl>             <dbl>
## 1       1     1      4               3.96               3.80              3.62
## 2       1     2      4               3.92               3.62              3.27
## 3       1     3      4               3.88               3.44              2.96
## 4       1     4      4               3.84               3.27              2.68
## 5       1     5      4               3.80               3.12              2.43
## 6       1     6      4               3.77               2.96              2.20
## 7       1     7      4               3.73               2.82              1.99
## 8       1     8      4               3.69               2.68              1.80
## 9       1     9      4               3.66               2.55              1.63
##10       1    10      4               3.62               2.43              1.47
### ℹ 36,490 more rows
### ℹ Use `print(n = ...)` to see more rows

Sewole answered 28/5, 2024 at 4:20 Comment(1)

This also solves my own question from 5+ years ago far more neatly I think: https://mcmap.net/q/1204982/-add-multiple-output-variables-using-purrr-and-a-predefined-function – Sewole 28/5, 2024 at 4:30

Recommended topics

Hot tags