Rename data frame columns, specified by column index, as a function of those indices

Asked 20/3, 2023 at 21:44 Answered 21/3, 2023 at 0:4

As part of a pipeline, I'd like to take a data frame or tibble and rename a subset of columns, specified by a vector of position indices, with the new column names as a function of their index rather than their names. I don't want to leave the pipeline, store an intermediate result, store the vector of indices, or have to type the vector of indices twice (an accident waiting to happen if I ever want to change them).

I can achieve my goal by piping into a horrible anonymous function, using dplyr::rename_with or rlang::set_names. But surely there's a cleaner way to do this than what I came up with?

library(tidyverse)
# Base R does what I want: but not pipe-friendly
temp <- starwars |>
  head(c(2, 6))
idx <- c(2, 4:6)
colnames(temp)[idx] <- str_c("col_", idx, "_new")
print(temp)
#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

# Can repeat the vector of selected indices in the .fn argument of rename_with
# but surely there's a way to avoid writing c(2, 4:6) twice?
starwars |>
  head(c(2, 6)) |>
  rename_with(.cols = c(2, 4:6), ~ str_c("col_", c(2, 4:6), "_new"))
#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

# rename_with doesn't *quite* do what I want here
# Can specify cols by index, but .x is the column name not its index
starwars |>
  head(c(2, 6)) |>
  rename_with(.cols = c(2, 4:6), ~ str_c("col_", .x, "_new"))
#> # A tibble: 2 × 6
#>   name           col_height_new  mass col_hair_color_new col_skin_colo…¹ col_e…²
#>   <chr>                   <int> <dbl> <chr>              <chr>           <chr>  
#> 1 Luke Skywalker            172    77 blond              fair            blue   
#> 2 C-3PO                     167    75 <NA>               gold            yellow 
#> # … with abbreviated variable names ¹col_skin_color_new, ²col_eye_color_new

# Anonymous function avoids repeating c(2, 4:6) - supplying the external vector
# means using all_of() or any_of() depending on whether you want an error if
# an index is missing.
# But surely there's an easier way than this?
starwars |>
  head(c(2, 6)) |>
  (\(tbl, idx) rename_with(tbl, .cols = all_of(idx),
                           ~ str_c("col_", idx, "_new")))(c(2, 4:6))
#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

# There's also rlang::set_names ... but this is even uglier
starwars |>
  head(c(2, 6)) |>
  (\(tbl, idx) set_names(tbl, ifelse(seq_along(tbl) %in% idx,
                                     str_c("col_", seq_along(tbl), "_new"),
                                     colnames(tbl))))(c(2, 4:6))
#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

If there's no "obvious" one-liner to do it, a functional approach might be better. I'll self-answer a purrr solution, but I'm sure others can do better.

Related questions, but not duplicates as they don't ask for the new name to be a function of the index: R: dplyr - Rename column name by position instead of name and How to dplyr rename a column, by column index?

Comrade answered 20/3, 2023 at 21:44 Comment(0)

I think there is no canonical / clean way for doing this without either i) using the values of your index twice or ii) store them in a temporary variable (or iii) use a hacky approach storing the values on the fly in a temporary variable or function and use them again).

I'd say a canonical way would be to create a lookup vector and use this inside rename(all_of()). When coming back some time later to this code it is easy to understand how the column names have been recoded.

library(tidyverse)

idx <- c(2, 4:6)
lookup_vec <- setNames(idx, str_c("col_", idx, "_new"))

starwars |>
  head(c(2, 6)) |>
  rename(all_of(lookup_vec))

#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

If you want to apply this kind of operations a lot and want to avoid temp variables at all costs, then a helper function might do the trick:

rename_at_idx <- function(df, idx, before = "", after = "") {
  rename(df, all_of(setNames(idx,
                  str_c(before, idx, after))
                )
  )
}

starwars |>
  head(c(2, 6)) |>
  rename_at_idx(c(2, 4:6), "col_", "_new")
#> same output

^{Created on 2023-03-20 with reprex v2.0.2}

Shahaptian answered 20/3, 2023 at 21:53 Comment(1)

Thanks - was rather hoping there's a neat one-liner somewhere in dplyr but maybe not! The difficulty here seems to be finding a "nice" way to give the naming function access to the column index - in the end I decided splitting the indices out and storing them in a temporary list, along with the data frame, didn't feel too hacky from a purrr perspective (not much different to using eg array_branch() to split something up before recombining) – Comrade 21/3, 2023 at 0:25

A functional approach with purrr::pmap.

library(tidyverse)
starwars |>
  head(c(2, 6)) |>
  (\(tbl) list(idx = seq_along(tbl), col = tbl, old_name = names(tbl), 
               new_name = str_c("col_", seq_along(tbl), "_new")))() |>
  pmap(function(idx, col, old_name, new_name) {
    set_names(tibble(col), if (idx %in% c(2, 4:6)) new_name else old_name)
    }) |>
  bind_cols()
#> # A tibble: 2 × 6
#>   name           col_2_new  mass col_4_new col_5_new col_6_new
#>   <chr>              <int> <dbl> <chr>     <chr>     <chr>    
#> 1 Luke Skywalker       172    77 blond     fair      blue     
#> 2 C-3PO                167    75 <NA>      gold      yellow

purrr::imap sounds like a good idea ("i" is for index!) but of course data frames are named, whereas

imap(x, ...), an indexed map, is short hand for map2(x, names(x), ...) if x has names, or map2(x, seq_along(x), ...) if it does not.

So imap here would use the column names, not the indices we want to give the naming function access to. However, an anonymous function to make a list of the data frame and its indices (and maybe the proposed names too) doesn't seem too dirty, and would let me pipe into map2 (or pmap)... pre-computing the new names has the advantage of keeping the function inside the map nice and simple, so that's what I went with.

Comrade answered 21/3, 2023 at 0:4 Comment(0)

Recommended topics

Hot tags