My question has strong similarities with this one and this other one, but my dataset is a little bit different and I can't seem to make those solutions work. Please excuse me if I misunderstood something and this question is redundant.
I have a dataset such as this one:
df <- data.frame(
id = c(1:5),
conditionA = c(1, NA, NA, NA, 1),
conditionB = c(NA, 1, NA, NA, NA),
conditionC = c(NA, NA, 1, NA, NA),
conditionD = c(NA, NA, NA, 1, NA)
)
# id conditionA conditionB conditionC conditionD
# 1 1 1 NA NA NA
# 2 2 NA 1 NA NA
# 3 3 NA NA 1 NA
# 4 4 NA NA NA 1
# 5 5 1 NA NA NA
(Note that apart from these columns, I have a lot of other columns that shouldn't be affected by the current manipulation.)
So, I observe that conditionA
, conditionB
, conditionC
and conditionD
are mutually exclusives and should be better presented as a single categorical variable, i.e. factor
, that should look like this :
# id type
# 1 1 conditionA
# 2 2 conditionB
# 3 3 conditionC
# 4 4 conditionD
# 5 5 conditionA
I have investigated using gather
or unite
from tidyr
, but it doesn't correspond to this case (with unite
, we lose the information from the variable name).
I tried using kimisc::coalescence.na
, as suggested in the first referred answer, but 1. I need first to set a factor value based on the name for each column, 2. it doesn't work as expected, only including the first column :
library(kimisc)
# first, factor each condition with a specific label
df$conditionA <- df$conditionA %>%
factor(levels = 1, labels = "conditionA")
df$conditionB <- df$conditionB %>%
factor(levels = 1, labels = "conditionB")
df$conditionC <- df$conditionC %>%
factor(levels = 1, labels = "conditionC")
df$conditionD <- df$conditionD %>%
factor(levels = 1, labels = "conditionD")
# now coalesce.na to merge into a single variable
df$type <- coalesce.na(df$conditionA, df$conditionB, df$conditionC, df$conditionD)
df
# id conditionA conditionB conditionC conditionD type
# 1 1 conditionA <NA> <NA> <NA> conditionA
# 2 2 <NA> conditionB <NA> <NA> <NA>
# 3 3 <NA> <NA> conditionC <NA> <NA>
# 4 4 <NA> <NA> <NA> conditionD <NA>
# 5 5 conditionA <NA> <NA> <NA> conditionA
I tried the other suggestions from the second question, but haven't found one that would bring me the expected result...
NA
/1
instead of0
/1
has no upside that I know of. I've been seeing this a lot on SO lately. – Abell1
each time a condition was satisfied (and didn't bother to fill the rest with0
). I'm not sure if I should call that a dummy variable (but that's the term I've been encountering)... – Generalist