As part of a pipeline, I'd like to take a data frame or tibble and rename a subset of columns, specified by a vector of position indices, with the new column names as a function of their index rather than their names. I don't want to leave the pipeline, store an intermediate result, store the vector of indices, or have to type the vector of indices twice (an accident waiting to happen if I ever want to change them).
I can achieve my goal by piping into a horrible anonymous function, using dplyr::rename_with
or rlang::set_names
. But surely there's a cleaner way to do this than what I came up with?
library(tidyverse)
# Base R does what I want: but not pipe-friendly
temp <- starwars |>
head(c(2, 6))
idx <- c(2, 4:6)
colnames(temp)[idx] <- str_c("col_", idx, "_new")
print(temp)
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# Can repeat the vector of selected indices in the .fn argument of rename_with
# but surely there's a way to avoid writing c(2, 4:6) twice?
starwars |>
head(c(2, 6)) |>
rename_with(.cols = c(2, 4:6), ~ str_c("col_", c(2, 4:6), "_new"))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# rename_with doesn't *quite* do what I want here
# Can specify cols by index, but .x is the column name not its index
starwars |>
head(c(2, 6)) |>
rename_with(.cols = c(2, 4:6), ~ str_c("col_", .x, "_new"))
#> # A tibble: 2 × 6
#> name col_height_new mass col_hair_color_new col_skin_colo…¹ col_e…²
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
#> # … with abbreviated variable names ¹col_skin_color_new, ²col_eye_color_new
# Anonymous function avoids repeating c(2, 4:6) - supplying the external vector
# means using all_of() or any_of() depending on whether you want an error if
# an index is missing.
# But surely there's an easier way than this?
starwars |>
head(c(2, 6)) |>
(\(tbl, idx) rename_with(tbl, .cols = all_of(idx),
~ str_c("col_", idx, "_new")))(c(2, 4:6))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# There's also rlang::set_names ... but this is even uglier
starwars |>
head(c(2, 6)) |>
(\(tbl, idx) set_names(tbl, ifelse(seq_along(tbl) %in% idx,
str_c("col_", seq_along(tbl), "_new"),
colnames(tbl))))(c(2, 4:6))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
If there's no "obvious" one-liner to do it, a functional approach might be better. I'll self-answer a purrr
solution, but I'm sure others can do better.
Related questions, but not duplicates as they don't ask for the new name to be a function of the index: R: dplyr - Rename column name by position instead of name and How to dplyr rename a column, by column index?
dplyr
but maybe not! The difficulty here seems to be finding a "nice" way to give the naming function access to the column index - in the end I decided splitting the indices out and storing them in a temporary list, along with the data frame, didn't feel too hacky from apurrr
perspective (not much different to using egarray_branch()
to split something up before recombining) – Comrade