Use dynamic name for new column/variable in `dplyr`
Asked Answered
V

10

296

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.

Example data from iris:

library(dplyr)
iris <- as_tibble(iris)

I've created a function to mutate my new columns from the Petal.Width variable:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df <- mutate(df, varname = Petal.Width * n)  ## problem arises here
    df
}

Now I create a loop to build my columns:

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).

How can I get mutate() to use my dynamic name as variable name?

Valladares answered 23/9, 2014 at 19:51 Comment(8)
The vignette doesn't even mention mutate_, and it really isn't obvious from the other functions how to use it.Spann
I have struggled to understand the quosure et al. documentation for years. While the vignette link above no longer works, that comment lead me to this summary to tidyevaluation: shipt.tech/…. I finally understand! Thank you.Fastening
dplyr provides this excellent vignette on Programming with dplyr that covers this.Biliary
Compared to base R it seems overly complex in dplyr get have dynamic names when looping ...Cothran
If you need the same operation several times it usually tells you that your data format is not optimal. You want a longer format with n being a column in the data.frame (see my answer below)Antiperiodic
@MarioReutter optimal for what? I disagree that e.g. replacing 10 columns of 1 million rows with a key-value column of 10 million rows is unambiguously an improvement. Looping over column names has always been and should remain a perfectly acceptable thing to do, "tidiness" be damned.Sterilant
@Cothran think of Dplyr/Rlang as implementing a succinct macro/metaprogramming DSL on top of the base R functionality of as.symbol, substitute, etc. which can be clunky and verbose. I like what they've done, but I really dislike all the new terminology, the constant churn in design, and the overly-complicated descriptions in the documentation, as if this were an arcane and obscure thing that people shouldn't need to do.Sterilant
@Sterilant If you need the slightly better performance for large data sets or a function that will be used myriads of times, I also encourage you to use the more efficient, wider data frame and a for loop. In many cases, however, the better readability of your code will save more working time than the lower run time of your script. Edit: Reading your other comments, I realize you are a far more advanced programmer than I am. It seems like you know what you are doing and you can pick the solution that suits your problem best. For beginners, however, I argue that tidyness can help learning.Antiperiodic
C
374

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.


dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
  mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name

meanofcol <- function(df, col) {
  mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)


dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").


dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}
Chondrite answered 23/9, 2014 at 20:15 Comment(16)
do.call() probably doesn't do what you think it does: rpubs.com/hadley/do-call2. See also the nse vignette in the dev version of dplyr.Jessiajessica
So if I understand your point @hadley, I've updated the do.call above to use do.call("mutate") and to quote df in the list. Is that what you were suggesting? And when the lazyeval version of dplyr is the released version, then mutate_(df, .dots= setNames(list(~Petal.Width * n), varname)) would be a better solution?Chondrite
@Chondrite yes and yes :)Jessiajessica
What if I need the variable column header not only on the left hand side of the assignment but also on the right? e.g. mutate(df, !!newVar := (!!var1 + !!var2) / 2) does not work :(Antiperiodic
@Mario Reutter: Did you ever got an answer to your comment? I asked the same question here and would love to get it solved!Dialysis
@Dialysis Look at the meanofcol() example in the answer. It uses column names on both sides.Chondrite
@MrFlick: Yes, but I would like to combine a column name with a string to call a column, such as mutate(df, "Mean of {{col}}" := mean("{{col}} for mean")), The column name is a composite of another column's name {{col}} and a suffix or prefix " for mean". Ideally, some quick and easy code.Dialysis
@Dialysis If you need to use string for column names, use the .data pronoun. Maybe mean(.data[[glue::glue("{{col}} for mean")]])Chondrite
This link can be helpful. You should use .data[[col]] if col contains the name of the column as a character string. You should use {{col}} if col holds a promise as in the post above, and as explained in the link.Telecommunication
@MarioReutter @msgisrocker Try: mutate(df, !!newVar := (df[ ,var1] + df[,var2]) / 2)Duumvirate
Thank you, @OliverHumphreys! But if I have to rereference the df with base R within mutate, I can just use base R entirely : df[,newVar] = (df[,var1] + df[,var2]) / 2 (unless it shall be used in a pipe).Antiperiodic
mutate(outvar = {{varname1}} - {{varname2}}) results in "Error in local_error_context(dots = dots, .index = i, mask = mask) : promise already under evaluation: recursive default argument reference or earlier problems?" whereas mutate(outvar = get(varname1) - get(varname2)) "just works" for me. (dplyr v1.0.9)Meraree
@BrianD if get() works then your values are likely strings and not symbols as {{}} would expect. If you have an issue with promises already under evaluation then you have an issuer with variable names possibly being reused and matching column names where you don’t expect. Hard to say without seeing any code. But if you have an issue, you should create your own question.Chondrite
For anyone reading this coming from other questions marked as duplicate, these techniques with {{ and !! are the official and correct tools for injecting a variable name into any Dplyr function, not only assignment with :=. It might be helpful to add a note of this to the answer, so that this answer can be come "canonical" with respect to marking other questions as duplicates.Sterilant
also, MrFlick, may I suggest to also add a note (this comment section is already far too long) that the only version that seems to work with non-anonymous functions in loops is the one with a single curly operator. code was too long for comment, see github.com/tidyverse/dplyr/issues/6784Fluker
Note in the solution for "dplyr version >= 1.0", col is NOT a string, that is, it equals Petal.Width, not "Petal.Width". If you put the latter it won't work, as the " will become part of the code.Guide
S
83

In the new release of dplyr (0.6.0 awaiting in April 2017), we can also do an assignment (:=) and pass variables as column names by unquoting (!!) to not evaluate it

 library(dplyr)
 multipetalN <- function(df, n){
      varname <- paste0("petal.", n)
      df %>%
         mutate(!!varname := Petal.Width * n)
 }

 data(iris)
 iris1 <- tbl_df(iris)
 iris2 <- tbl_df(iris)
 for(i in 2:5) {
     iris2 <- multipetalN(df=iris2, n=i)
 }   

Checking the output based on @MrFlick's multipetal applied on 'iris1'

identical(iris1, iris2)
#[1] TRUE
Suburbanize answered 14/4, 2017 at 21:1 Comment(2)
why do you need to use := for assignments when unquoting a variable (i.e. !!varname)?Baumann
this is a good solution, but it only works when we're passing character vectors, not numerics directly (it works here because of your paste step before).Fluker
O
51

After a lot of trial and error, I found the pattern !!rlang::sym("my variable")) (in earlier R versions: UQ(rlang::sym("my variable")))) really useful for working with strings and dplyr verbs. It seems to work in a lot of surprising situations.

Here's an example with mutate. We want to create a function that adds together two columns, where you pass the function both column names as strings. We can use this pattern, together with the assignment operator :=, to do this.

## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
  mtcars %>% 
    mutate(!!rlang::sym(new_name) :=  !!rlang::sym(name1) + !!rlang::sym(name2))
}
mutate_values('test', 'mpg', 'cyl')

The pattern works with other dplyr functions as well. Here's filter:

## filter a column by a value 
filter_values <- function(name, value){
  mtcars %>% 
    filter(!!rlang::sym(name) != value)
}
filter_values('gear', 4)

Or arrange:

## transform a variable and then sort by it 
arrange_values <- function(name, transform){
  mtcars %>% 
    arrange((!!rlang::sym(name)) %>% (!!rlang::sym(transform)))
}
arrange_values('mpg', 'sin')

For select, you don't need to use the pattern. Instead you can use !!:

## select a column 
select_name <- function(name){
  mtcars %>% 
    select(!!name)
}
select_name('mpg')
Overmatch answered 16/3, 2018 at 1:24 Comment(10)
Your tips works very well, but I have a little issue. I change an initial column myCol to an url (for example), and copy the old column myColInitialValue at the end of the dataframe df with a new name. But a which(colnames(df)=='myCol') send back the col # of myColInitialValue. I didn't write an issue yet because I didn't found a reprex. My goal is for the escape parameter of DT::datatable(). I use escape=FALSE in waiting that. With constants it doesn't work also but the DT package seems also get the bad # column. :)Unintelligible
My question dplyr - mutate: use dynamic variable names, works well, but seems not work with # column for escape an DT::datatableUnintelligible
It seems than dynamic variables are not the cause. (btw reprex added)Unintelligible
Thanks for this answer! Here is a super-simple example of how I used it: varname = sym("Petal.Width"); ggplot(iris, aes(x=!!varname)) + geom_histogram()Yuen
This worked for me inside a formula where !!varname wasn't working.Lir
sym() is included by default in dplyr. see help(sym).Staphyloplasty
which library does UQ come from?Osier
UQ comes from rlang but is deprecatedQuieten
You should now use !!rlang::sym(name) instead of UQ(rlang::sym(name))Quieten
Is there something simpler that UQ(rlang::sym("string")) for mutate?Circus
H
26

With rlang 0.4.0 we have curly-curly operators ({{}}) which makes this very easy. When a dynamic column name shows up on the left-hand side of an assignment, use :=.

library(dplyr)
library(rlang)

iris1 <- tbl_df(iris)

multipetal <- function(df, n) {
   varname <- paste("petal", n , sep=".")
   mutate(df, {{varname}} := Petal.Width * n)
}

multipetal(iris1, 4)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
# 1          5.1         3.5          1.4         0.2 setosa      0.8
# 2          4.9         3            1.4         0.2 setosa      0.8
# 3          4.7         3.2          1.3         0.2 setosa      0.8
# 4          4.6         3.1          1.5         0.2 setosa      0.8
# 5          5           3.6          1.4         0.2 setosa      0.8
# 6          5.4         3.9          1.7         0.4 setosa      1.6
# 7          4.6         3.4          1.4         0.3 setosa      1.2
# 8          5           3.4          1.5         0.2 setosa      0.8
# 9          4.4         2.9          1.4         0.2 setosa      0.8
#10          4.9         3.1          1.5         0.1 setosa      0.4
# … with 140 more rows

We can also pass quoted/unquoted variable names to be assigned as column names.

multipetal <- function(df, name, n) {
   mutate(df, {{name}} := Petal.Width * n)
}

multipetal(iris1, temp, 3)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  temp
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
# 1          5.1         3.5          1.4         0.2 setosa  0.6  
# 2          4.9         3            1.4         0.2 setosa  0.6  
# 3          4.7         3.2          1.3         0.2 setosa  0.6  
# 4          4.6         3.1          1.5         0.2 setosa  0.6  
# 5          5           3.6          1.4         0.2 setosa  0.6  
# 6          5.4         3.9          1.7         0.4 setosa  1.2  
# 7          4.6         3.4          1.4         0.3 setosa  0.900
# 8          5           3.4          1.5         0.2 setosa  0.6  
# 9          4.4         2.9          1.4         0.2 setosa  0.6  
#10          4.9         3.1          1.5         0.1 setosa  0.3  
# … with 140 more rows

It works the same with

multipetal(iris1, "temp", 3)
Hulahula answered 7/12, 2019 at 8:22 Comment(2)
or mutate(df, 'petal.{n}' := Petal.Width * n)Stomachic
NB this approach fails when passing character vectors to the function within a loop - then a column will be created with the name of the function argumentFluker
S
14

Here's another version, and it's arguably a bit simpler.

multipetal <- function(df, n) {
    varname <- paste("petal", n, sep=".")
    df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
    df
}

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1          5.1         3.5          1.4         0.2  setosa     0.4     0.6     0.8       1
2          4.9         3.0          1.4         0.2  setosa     0.4     0.6     0.8       1
3          4.7         3.2          1.3         0.2  setosa     0.4     0.6     0.8       1
4          4.6         3.1          1.5         0.2  setosa     0.4     0.6     0.8       1
5          5.0         3.6          1.4         0.2  setosa     0.4     0.6     0.8       1
6          5.4         3.9          1.7         0.4  setosa     0.8     1.2     1.6       2
Sands answered 24/9, 2015 at 13:12 Comment(0)
O
5

You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.

You are creating strings that you wish mutate to treat as column names. So using friendlyeval you could write:

multipetal <- function(df, n) {
  varname <- paste("petal", n , sep=".")
  df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}

Which under the hood calls rlang functions that check varname is legal as column name.

friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.

Overwork answered 24/6, 2018 at 9:54 Comment(2)
I don't think this package is available anymoreOsanna
it is and it works better than anything above ! When everything failed with case_when() this worked flawlesslyIslander
K
4

I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.

I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.

Below is how I did this via SE mutate (mutate_()) and the .dots argument. Criticisms that make this better are welcome.

library(dplyr)

dat <- data.frame(a="leave alone",
                  dt="2015-08-03 00:00:00",
                  dt2="2015-01-20 00:00:00")

# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
    for (col in dtnames) {
        varval <- sprintf("as.Date(%s)", col)
        df <- df %>% mutate_(.dots= setNames(list(varval), col))
    }
    return(df)
}

dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
Kiri answered 29/7, 2015 at 1:54 Comment(0)
C
4

While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.

Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.

multipetal <- function(df, n) {
   for (i in 1:n){
      varname <- paste("petal", i , sep=".")
      df[[varname]] <- with(df, Petal.Width * i)
   }
   df
}
multipetal(iris, 3) 
Collbaith answered 22/1, 2017 at 15:1 Comment(1)
+1, although I still use dplyr a lot in non-interactive settings, using it with variabel input inside a function uses very clunky syntax.Bandwidth
W
1

Another alternative: use {} inside quotation marks to easily create dynamic names. This is similar to other solutions but not exactly the same, and I find it easier.

library(dplyr)
library(tibble)

iris <- as_tibble(iris)

multipetal <- function(df, n) {
  df <- mutate(df, "petal.{n}" := Petal.Width * n)  ## problem arises here
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}
iris

I think this comes from dplyr 1.0.0 but not sure (I also have rlang 4.7.0 if it matters).

Waistband answered 20/7, 2020 at 7:58 Comment(0)
A
0

If you need the same operation several times it usually tells you that your data format is not optimal. You want a longer format with n being a column in the data.frame that can be achieved by a cross join:

library(tidyverse)
iris %>% mutate(identifier = 1:n()) %>% #necessary to disambiguate row 102 from row 143 (complete duplicates)
   full_join(tibble(n = 1:5), by=character()) %>% #cross join for long format
   mutate(petal = Petal.Width * n) %>% #calculation in long format
   pivot_wider(names_from=n, values_from=petal, names_prefix="petal.width.") #back to wider format (if desired)

Result:

# A tibble: 150 x 11
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species identifier petal.width.1 petal.width.2 petal.width.3
          <dbl>       <dbl>        <dbl>       <dbl> <fct>        <int>         <dbl>         <dbl>         <dbl>
 1          5.1         3.5          1.4         0.2 setosa           1           0.2           0.4           0.6
 2          4.9         3            1.4         0.2 setosa           2           0.2           0.4           0.6
 3          4.7         3.2          1.3         0.2 setosa           3           0.2           0.4           0.6
 4          4.6         3.1          1.5         0.2 setosa           4           0.2           0.4           0.6
 5          5           3.6          1.4         0.2 setosa           5           0.2           0.4           0.6
 6          5.4         3.9          1.7         0.4 setosa           6           0.4           0.8           1.2
 7          4.6         3.4          1.4         0.3 setosa           7           0.3           0.6           0.9
 8          5           3.4          1.5         0.2 setosa           8           0.2           0.4           0.6
 9          4.4         2.9          1.4         0.2 setosa           9           0.2           0.4           0.6
10          4.9         3.1          1.5         0.1 setosa          10           0.1           0.2           0.3
# ... with 140 more rows, and 2 more variables: petal.width.4 <dbl>, petal.width.5 <dbl>
Antiperiodic answered 12/3, 2022 at 9:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.