I have a column in my datasets where time periods (Time
) are integers ranging from a-b. Sometimes there might be missing time periods for any given group. I'd like to fill in those rows with NA
. Below is example data for 1 (of several 1000) group(s).
structure(list(Id = c(1, 1, 1, 1), Time = c(1, 2, 4, 5), Value = c(0.568780482159894,
-0.7207749516298, 1.24258192959273, 0.682123081696789)), .Names = c("Id",
"Time", "Value"), row.names = c(NA, 4L), class = "data.frame")
Id Time Value
1 1 1 0.5687805
2 1 2 -0.7207750
3 1 4 1.2425819
4 1 5 0.6821231
As you can see, Time 3 is missing. Often one or more could be missing. I can solve this on my own but am afraid I wouldn't be doing this the most efficient way. My approach would be to create a function that:
Generate a sequence of time periods from min(Time)
to max(Time)
Then do a setdiff
to grab missing Time
values.
Convert that vector to a data.frame
Pull unique identifier variables (Id
and others not listed above), and add that to this data.frame.
Merge the two.
Return from function.
So the entire process would then get executed as below:
# Split the data into individual data.frames by Id.
temp_list <- dlply(original_data, .(Id))
# pad each data.frame
tlist2 <- llply(temp_list, my_pad_function)
# collapse the list back to a data.frame
filled_in_data <- ldply(tlist2)
Better way to achieve this?
expand.grid
and thenmerge
withall = TRUE
. Not sure splitting by Id first is necessary, really. – FullerTime
and setValue
toNA
and pad the rest. So then it becomesdata_to_merge <- data.frame(id=unique(data$id),...)
(which is a really long line and not portable if the structure of the data changes). Wish I could just merge missing Time, add NA and grab all the rest from the original data efficiently. – Pothead