Create new variables with mutate_at while keeping the original ones
Asked Answered
D

1

89

Consider this simple example:

library(dplyr)
library(tibble)

dataframe <- tibble(helloo = c(1,2,3,4,5,6),
                        ooooHH = c(1,1,1,2,2,2),
                        ahaaa = c(200,400,120,300,100,100))

# A tibble: 6 x 3
  helloo ooooHH ahaaa
   <dbl>  <dbl> <dbl>
1      1      1   200
2      2      1   400
3      3      1   120
4      4      2   300
5      5      2   100
6      6      2   100

Here I want to apply the function ntile to all the columns that contains oo, but I would like these new columns to be called cat + the corresponding column.

I know I can do this

dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2)))
# A tibble: 6 x 3
  helloo ooooHH ahaaa
   <int>  <int> <dbl>
1      1      1   200
2      1      1   400
3      1      1   120
4      2      2   300
5      2      2   100
6      2      2   100

But what I need is this

# A tibble: 8 x 5
  helloo   ooooHH   ahaaa cat_helloo cat_ooooHH
     <dbl>    <dbl> <dbl>    <int>    <int>
1        1        1   200        1        1
2        2        1   400        1        1
3        3        1   120        1        1
4        4        2   300        2        2
5        5        2   100        2        2
6        5        2   100        2        2
7        6        2   100        2        2
8        6        2   100        2        2

Is there a solution that does NOT require to store the intermediate data, and merge back to the original dataframe?

Disney answered 29/8, 2017 at 20:25 Comment(0)
S
141

Update 2020-06 for dplyr 1.0.0

Starting in dplyr 1.0.0, the across() function supersedes the "scoped variants" of functions such as mutate_at(). The code should look pretty familiar within across(), which is nested inside mutate().

Adding a name to the function(s) you give in the list adds the function name as a suffix.

dataframe %>%
     mutate( across(contains('oo'), 
                    .fns = list(cat = ~ntile(., 2))) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa helloo_cat ooooHH_cat
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Changing the new columns names is a little easier in 1.0.0 with the .names argument in across(). Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.

dataframe %>%
     mutate( across(contains('oo'), 
                    .fns = list(cat = ~ntile(., 2)),
                    .names = "{fn}_{col}" ) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa cat_helloo cat_ooooHH
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Original answer with mutate_at()

Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs() is deprecated and list() with ~ should be used instead.

You can give names to the functions to the list you pass to .funs to make new variables with the names as suffixes attached.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))

# A tibble: 6 x 5
  helloo ooooHH ahaaa helloo_cat ooooHH_cat
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

If you want it as a prefix instead, you could then use rename_at to change the names.

dataframe %>% 
     mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%
     rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa cat_helloo cat_ooooHH
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Previous code with funs() from earlier versions of dplyr:

dataframe %>% 
     mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%
     rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )
Soulful answered 29/8, 2017 at 20:31 Comment(5)
I guess one can always write some regex stuff to change the name of the col_cat variables?Crustaceous
@Disney Yep. Possibly in rename_at for convenience; added example in edit.Soulful
The renaming appears to only append if there's more than one column that contains a match. Is there a way to make it append for a single match too? Example: dataframe %>% mutate_at(vars(contains('ah')), .funs = funs(cat = ntile(., 2))) Lophophore
@Lophophore Not that I know of, but you might ask a new question. For a single variable you could write a function using mutate and set the variable names based on the function input. See the "Setting variable names" section of Programming with dplyrSoulful
If you want to know what exactly do the ~ntile(., 2)this function map smallest inputs to smallest outputs so in this case the col ooooHH the first three values 1,2,3 rank as 1 and the 4,5,6 rank as 2. (e.g compare hello column with cat_hello if I rank ~ntile(., 5)) use .fns = list(cat = ~(.)*10) if you only want to mutate based on a product of row values.Crooked

© 2022 - 2024 — McMap. All rights reserved.