I have the following data:
df <- structure(list(id = c("1358792", "1358792", "333482", "333482", "747475", "747475"),
x = c("123", "123", "456", "456", NA, NA),
all_x = list("123", "123",
c("456", "789"),
c("456", "789"),
list(),
list())),
row.names = c(NA, -6L),
class = "data.frame")
id x all_x
1 1358792 123 123
2 1358792 123 123
3 333482 456 456, 789
4 333482 456 456, 789
5 747475 <NA> NULL
6 747475 <NA> NULL
The all_x column is a list with either an EMPTY/NULL value, a single character or a character vector.
I want to create a new column (tidyverse style) with the following logic: when the all_x
column has one or no value, just take the value from x
. If it has two values (i.e. is a character vector), we want to group by id
and take the element that corresponds to the row number, i.e. for the first id value, take the first element of the character vector, for the second id element, take the second character value and so on.
Desired output would be an additional character column with the respective values, i.e.
id x all_x x2
1 1358792 123 123 123
2 1358792 123 123 123
3 333482 456 456, 789 456
4 333482 456 456, 789 789
5 747475 <NA> NULL <NA>
6 747475 <NA> NULL <NA>
I have tried tons of variants with if_else
, ifelse
and unlisting and indexing, but still always get errors due to the mixed structure of the all_x
column.
Here's the closest I got:
library(tidyverse)
df |>
mutate(x2 = if_else(lengths(all_x) > 1, all_x[[1]][row_number()], x), .by = id)
However, obviously, I'm not successful.
all_x
as a list column? Wouldn't it be better to make it character by doinglapply(all_x, toString)
? In other words, I would appreciate a reference to why and when keeping a list column is considered to be good practice? – Pulquelapply(all_x, toString)
still keeps the column as a list column and working with the code still leads to these type-inconsistency errors. – Namnamadf0$all_x = vapply(df0$all_x, toString, character(1L))
. – Pulque