I am working with a large dataset where much of the data was entered twice. This means that many of the variables are represented by pairs of columns: column.1
with the data entered by one person, and column.2
where the same data was entered by a different person. I want to create a "master" column called simply column
that first draws from column.1
and then, if column.1
is NA
, draws from column.2
.
Here is an example of what I am trying to do with made-up data:
mydata <- data.frame(name = c("Sarah","Ella","Carmen","Dinah","Billie"),
cheese.1 = c(1,4,NA,6,NA),
cheese.2 = c(1,4,3,5,NA),
milk.1 = c(NA,2,0,4,NA),
milk.2 = c(1,2,1,4,2),
tofu.1 = c("yum","yum",NA,"gross", NA),
tofu.2 = c("gross", "yum", "yum", NA, "gross"))
For example, the code below shows an example of what I want to do for a single pair of columns.
mydata %>% mutate(cheese = ifelse(is.na(cheese.1), cheese.2, cheese.1))
#OUTPUT:
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 cheese
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
However, I want to automate the process rather than doing each manually. Below is my attempt at automating the process, using a list (col.list
) of the column pairs for which I want to create new "master" columns:
col.list = c("cheese","milk","tofu")
lapply(col.list, FUN = function(x) {
v <- as.name({{x}})
v.1 <- as.name(paste0({{x}}, ".1"))
v.2 <- as.name(paste0(({{x}}), ".2"))
mydata %>% mutate(v = ifelse(is.na({{v.1}}), {{v.2}}, {{v.1}}))
})
#OUTPUT:
[[1]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
[[2]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 2
3 Carmen NA 3 0 1 <NA> yum 0
4 Dinah 6 5 4 4 gross <NA> 4
5 Billie NA NA NA 2 <NA> gross 2
[[3]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross yum
2 Ella 4 4 2 2 yum yum yum
3 Carmen NA 3 0 1 <NA> yum yum
4 Dinah 6 5 4 4 gross <NA> gross
5 Billie NA NA NA 2 <NA> gross gross
The problems with this attempt are:
- the new columns are not correctly named (they should be named
cheese
,milk
andtofu
rather than all be calledv
) - the new columns are not added to the original data frame. What I want is for the program to add a series of new "master" columns to my dataframe (one new column for each pair of columns identified in
col.list
).
coalesce
which I couldn't use with "dynamic" column names. Perhaps someone reading this could give some insights? – Ovidacoalesce
. – Townswomanexec
andcoalesce
. It has dynamic names, but only with the help ofstr_glue()
. – Danseuse