Let's say I have a data frame with several rows like the following:
df <- data.frame(a = c(NA,20,NA),
date1 = c("2016-03-01", "2016-02-01", "2016-02-01"),
b = c(50,NA, NA),
date2 = c("2016-02-01", "2016-03-01", "2016-03-01"),
c = c(10,10, 10),
date3 = c("2016-01-01","2016-01-01", "2016-01-01"))
For each row, I want to get the latest value which is not a NA
between a
, b
, and c
according to the dates
(so I respectively look at date1
, date2
, or date3
and pick the most recent one).
Basically, date1
gives the date corresponding to the value a
,
date2
gives the date corresponding to the value b
,
date3
gives the date corresponding to the value c
.
If date1 > date2
& date1 > date3
, I will want to take the value a
However, if the value a
is NA
(which is the case in my example), I will compare date2
and date3
. In my example, date2 > date3
, and since value b
is not NA
but 50
, I will take 50
as my final result.
Now I want to do this for all the rows in my dataframe
Since I am using dplyr
, I tried to use the case_when
function by using the rank function (in my example, I look a the first ranked date, and then look at the linked value. If it is a NA, I look at the 2nd best ranked, etc...)
However, I can't just put, as I'd like to do, :
df <- df %>%
mutate(result = case_when(is.na(a) & is.na(b) & is.na(c) ~ NA_integer_,
rev(rank(date1, date2, date3))[1] == 3 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 3 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 3 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 2 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 2 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 2 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 1 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 1 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 1 & !is.na(a) ~ c))
Because the rank
function needs a unique vector as argument (but I can't put c(date1, date2, date3)
neither because it would give me the whole order of this vector and not the rank for each row)
In my example the result I would like to have would be
res
a date1 b date2 c date3 result
NA 2016-03-01 50 2016-02-01 10 2016-01-01 50
20 2016-02-01 NA 2016-03-01 10 2016-01-01 20
NA 2016-02-01 NA 2016-03-01 10 2016-01-01 10
Does anyone have an idea or even an entirely different approach to this problem ?