Combining column values with column names for some select columns using tidyr unite
Asked Answered
C

1

2

Given a dataframe:

df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), col4 = c(100:103))

I want to combine column with their column names. I know I can use unite from tidyr and get the following output.

df %>% unite(NewCol, c(Col1, Col4), remove = F)

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  A_100
2    B    X    2  101  B_101
3    C    Y    3  102  C_102
4    D    Z    4  103  D_103

But I want to have the column name next to the value of the column as follows (the separator _ is really not that important):

  Col1 Col2 Col3 Col4 NewCol
1    A    W    1  100  Col1_A_Col4_100
2    B    X    2  101  Col1_B_Col4_101
3    C    Y    3  102  Col1_C_Col4_102
4    D    Z    4  103  Col1_D_Col4_103

I tried the solution posted here which does give the desired output but it creates a separate output.

imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>%
  unite(NewCol, sep = "_")

  NewCol         
  <chr>          
1 Col1_A_Col4_100
2 Col1_B_Col4_101
3 Col1_C_Col4_102
4 Col1_D_Col4_103

Would I simply use bind_cols() to combine both? How do I know the sequence of the rows is preserved between the two? Is there another way that I can create NewCol within the same dataframe similar to unite in the first case?

Campfire answered 22/6, 2023 at 3:52 Comment(5)
I don’t understand why you do not test your strategy???Levon
Is it the bind_cols() implementation you're having trouble with? df %>% bind_cols(imap_dfr(df %>% select(Col1, Col4), ~ paste(.y, .x, sep = "_")) %>% unite(newcol, sep = "_")) should work as expected. As for "How do I know the sequence of the rows is preserved between the two?" - I can't think of any way that 'newcol' would be in a different order to 'df'Handspring
You could also create the 'colname + value' columns, then unite, then left_join to get the 'overwritten' columns, e.g. df %>% mutate(across(c(Col1, Col4), ~paste0(cur_column(), "_", .x))) %>% unite(newcol, c(Col1, Col4), sep = "_") %>% left_join(df); maybe that would suit better?Handspring
@Handspring - df %>% mutate(across(c(Col1, Col4), ~paste0(cur_column(), "_", .x))) %>% unite(newcol, c(Col1, Col4), sep = "_") %>% left_join(df) gives me the desired output with the added peace that row order is preserved. I didn't know about the existence of cur_column() until now. I thought the solution posted here was the way to go.Campfire
@Handspring - Please feel free to post it as solution. I feel your response is a safer way to achieve the desired output (atleast compared to what I was trying to do).Campfire
H
2

One option is to create temporary 'colname + value' columns, then unite them in a second step, e.g.

## Load libraries
library(tidyverse)

## Load example data
df <- data.frame(Col1 = LETTERS[1:4], Col2 = LETTERS[23:26], Col3 = c(1:4), Col4 = c(100:103))

## Expected outcome
df %>% bind_cols(imap_dfr(df %>% select(Col1, Col4),
                          ~ paste(.y, .x, sep = "_")) %>%
                   unite(newcol, sep = "_"))
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## With a small number of columns
df %>%
  mutate(tmp_Col1 = paste0("Col1", "_", Col1),
         tmp_Col4 = paste0("Col4", "_", Col4)) %>%
  unite(newcol, c(tmp_Col1, tmp_Col4), sep = "_")
#>   Col1 Col2 Col3 Col4          newcol
#> 1    A    W    1  100 Col1_A_Col4_100
#> 2    B    X    2  101 Col1_B_Col4_101
#> 3    C    Y    3  102 Col1_C_Col4_102
#> 4    D    Z    4  103 Col1_D_Col4_103

## With a large number of columns
df %>%
  mutate(across(c(Col1, Col4),
                ~paste0(cur_column(), "_", .x))) %>%
  unite(newcol, c(Col1, Col4), sep = "_") %>%
  left_join(df)
#> Joining with `by = join_by(Col2, Col3)`
#>            newcol Col2 Col3 Col1 Col4
#> 1 Col1_A_Col4_100    W    1    A  100
#> 2 Col1_B_Col4_101    X    2    B  101
#> 3 Col1_C_Col4_102    Y    3    C  102
#> 4 Col1_D_Col4_103    Z    4    D  103

Created on 2023-06-22 with reprex v2.0.2

If you have a large number of columns you want to transform, using across() allows you to employ tidyselect functions, such as starts_with(), to select columns of interest without having to specify each column by name.

Handspring answered 22/6, 2023 at 5:40 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.