I am working with the cancer registry data. In the following data example (ex_data), variables id and diagnosis_yr stand for ID and year at cancer diagnosis receptively. Columns x_2005 to x_2010 and y_2005 to y_2010 respectively stand for x and y status for each year (2005 to 2010).In my actual working data, I have many columns for many years (2005-2020). I would like to extract x and y values from the earliest available year, latest available year, and at the diagnosis year (ie. x_earliest, y_latest,x_at_diagnosis,y_at_diagnosis variables in "wanted" ) by excluding NAs . For id 1, for example , I would like to extract x values from the earliest year and y values from the latest year by skipping NAs. For x and y values at the diagnosis year, if there is NAs at the diagnosis year, I would like to skip NAs and extract the available data from the preceding year. How can I implement to get wanted variables in R?
library(tidyverse)
#example data
ex_data <- tribble(
~id,~diagnosis_yr,~x_2005,~x_2006,~x_2007,~x_2008,~x_2009,~x_2010,~y_2005,~y_2006,~y_2007,~y_2008,~y_2009,~y_2010,
1, 2007, NA, NA, 1, 2, 2, 3, "a", "b", "c", "d", "e", NA,
2, 2008, 1, 3, 1, NA, 1, 2, NA, "b", "b", "e", "d", "d",
3, 2010, NA, 2, 2, 2, 3, NA, "a", "b", "c", NA, NA, NA,
4, 2009, 1, 3, 1, NA, 1, 2, NA, NA, NA, NA, NA, NA,
5, 2005, NA, 1, 1, 2, 2, 3, "a", "b", "c", "d", "e", "e"
)
#wanted variables
wanted <- tribble(
~id,~diagnosis_yr,~x_earliest,~y_latest,~x_at_diagnosis,~y_at_diagnosis,
1, 2007, 1, "e", 1, "c",
2, 2008, 1, "d", 1, "e",
3, 2010, 2, "c", 3, "c",
4, 2009, 1, NA, 1, NA,
5, 2005, 1, "e", NA, "a"
)