Mutate columns based on value in a vector
Asked Answered
A

3

2

I would like to mutate new columns onto a dataframe using a function where the inputs to the function come from a vector. Like this:

library(tidyverse)

# sessionInfo()
# R version 4.3.2 (2023-10-31)

set.seed(123)  # For reproducibility

mydf <- data.frame(user_id = 1:100) |> 
  expand_grid(data.frame(day = 1:365)) |> 
  mutate(logins = floor(abs(rnorm(1,10,10))))

lambdas <- c(0.01, 0.05, 0.1)

mydf <- mydf |> 
  mutate(
    lambda_logins_01 = logins * exp(-lambdas[1] * day),
    lambda_logins_05 = logins * exp(-lambdas[2] * day),
    lambda_logins_1 = logins * exp(-lambdas[3] * day),
    )

Except, instead of writing out "mutate( lambda_logins_01 ..." I wanted to do it more elegantly using something like map.

I would like to use the native pipe and refer to cur_data() or equivilent, as opposed to refering to mydf, i.e. I want to use the pipe in a traditional sense, referring to the current state of data I'm working on.

Desired result would be similar as in my example code, with the new columns named based on the values of lambda, except I wouldnt have to write out each mutate line manually.

Amatory answered 28/5 at 2:1 Comment(1)
I asked a very similar question years ago: #51978638 if you'd like to see some other approaches.Sewole
E
7

You could use outer:

mydf %>%
   mutate(data.frame(logins * exp(outer(day, -lambdas))) %>% 
            setNames(str_c('lambda_logins_', lambdas)))

# A tibble: 36,500 × 6
   user_id   day logins lambda_logins_0.01 lambda_logins_0.05 lambda_logins_0.1
     <int> <int>  <dbl>              <dbl>              <dbl>             <dbl>
 1       1     1      4               3.96               3.80              3.62
 2       1     2      4               3.92               3.62              3.27
 3       1     3      4               3.88               3.44              2.96
 4       1     4      4               3.84               3.27              2.68
 5       1     5      4               3.80               3.12              2.43
 6       1     6      4               3.77               2.96              2.20
 7       1     7      4               3.73               2.82              1.99
 8       1     8      4               3.69               2.68              1.80
 9       1     9      4               3.66               2.55              1.63
10       1    10      4               3.62               2.43              1.47

Edit: If you do not mind using the superseeded map_dfc

mutate(mydf, map_dfc(set_names(lambdas), ~logins * exp(-.x*day)))
Enigmatic answered 28/5 at 2:59 Comment(6)
Nice, similar idea but much neater.Soiree
Thanks this works! Reading ?outer, and tryign to read whats happening here. You can pass a df straight to mutate() without specifying name = value for a field? huh, that's new for me.Amatory
Great use of outer, but using data.frame, setNames within mutate doesn't feel like it is tidyverse, is there a way to make it more tidy?Gender
@Gender consider mutate(mydf, map_dfc(set_names(lambdas), ~logins * exp(-.x*day)))Enigmatic
Nice, this is much better (tidier), thank you.Gender
@Gender though map_dfc is superseded. You can still go around that by using map` + bind_colsEnigmatic
S
4

This can be done without needing to iterate by taking advantage of vectorization and recycling.

library(tidyverse)

mydf |> 
  mutate(lambda_logins = data.frame(t(set_names(lambdas))),
         lambda_logins = logins * exp(-lambda_logins * day)) |>
  unpack(lambda_logins, names_sep = "_", names_repair = ~ sub("X0.", "", .x, fixed = TRUE))

# A tibble: 36,500 × 6
   user_id   day logins lambda_logins_01 lambda_logins_05 lambda_logins_1
     <int> <int>  <dbl>            <dbl>            <dbl>           <dbl>
 1       1     1      4             3.96             3.80            3.62
 2       1     2      4             3.92             3.62            3.27
 3       1     3      4             3.88             3.44            2.96
 4       1     4      4             3.84             3.27            2.68
 5       1     5      4             3.80             3.12            2.43
 6       1     6      4             3.77             2.96            2.20
 7       1     7      4             3.73             2.82            1.99
 8       1     8      4             3.69             2.68            1.80
 9       1     9      4             3.66             2.55            1.63
10       1    10      4             3.62             2.43            1.47
# ℹ 36,490 more rows
# ℹ Use `print(n = ...)` to see more rows
Soiree answered 28/5 at 2:58 Comment(0)
S
3

I think this is nice and transferrable to any type of function where you have to loop over a shorter vector:

Using map to pass each element of lambdas in, which is then used as a single value to recycle against the other vectors needed for the function. The naming of the columns is done by using a named vector of lambdas:

fun <- function(logins,lambdas,day) logins * exp(-lambdas * day)
mydf %>%
    mutate(bind_cols(map(setNames(lambdas, str_c("lambda_logins_", lambdas)),
                  \(x) fun(logins,x,day))))

### A tibble: 36,500 × 6
##   user_id   day logins lambda_logins_0.01 lambda_logins_0.05 lambda_logins_0.1
##     <int> <int>  <dbl>              <dbl>              <dbl>             <dbl>
## 1       1     1      4               3.96               3.80              3.62
## 2       1     2      4               3.92               3.62              3.27
## 3       1     3      4               3.88               3.44              2.96
## 4       1     4      4               3.84               3.27              2.68
## 5       1     5      4               3.80               3.12              2.43
## 6       1     6      4               3.77               2.96              2.20
## 7       1     7      4               3.73               2.82              1.99
## 8       1     8      4               3.69               2.68              1.80
## 9       1     9      4               3.66               2.55              1.63
##10       1    10      4               3.62               2.43              1.47
### ℹ 36,490 more rows
### ℹ Use `print(n = ...)` to see more rows
Sewole answered 28/5 at 4:20 Comment(1)
This also solves my own question from 5+ years ago far more neatly I think: https://mcmap.net/q/1204982/-add-multiple-output-variables-using-purrr-and-a-predefined-functionSewole

© 2022 - 2024 — McMap. All rights reserved.