Limit na.locf in zoo package
Asked Answered
P

2

5

I would like to do a last observation carried forward for a variable, but only up to 2 observations. That is, for gaps of data of 3 or more NA, I would only carry the last observation forward for the next 2 observations and leave the rest as NA.

If I do this with the zoo::na.locf, the maxgap parameter implies that if the gap is larger than 2, no NA is replaced. Not even the last 2. Is there any alternative?

x <- c(NA,3,4,5,6,NA,NA,NA,7,8)
zoo::na.locf(x, maxgap = 2) # Doesn't replace the first 2 NAs of after the 6 as the gap of NA is 3. 
Desired_output <- c(NA,3,4,5,6,6,6,NA,7,8)
Pires answered 13/9, 2018 at 13:47 Comment(0)
C
3

A solution using base R:

ave(x, cumsum(!is.na(x)), FUN = function(i){ i[1:pmin(length(i), 3)] <- i[1]; i })
# [1] NA  3  4  5  6  6  6 NA  7  8

cumsum(!is.na(x)) groups each run of NAs with most recent non-NA value.

function(i){ i[1:pmin(length(i), 3)] <- i[1]; i } transforms the first two NAs of each group into the leading non-NA value of this group.

Caiman answered 13/9, 2018 at 14:26 Comment(5)
Nice. A small simplification might be to use this as FUN: function(x) ifelse(seq_along(x) <= 2+1, x[1], NA)Genetic
@G.Grothendieck, good suggestion. My original function is quite clumsy.Caiman
I think it is very elegant to have a one liner although I am not very familiar with ave and pmin. How could you do the carry backwards? @Caiman @G.GrothendieckPires
@user3507584, A possible way is: 1) reverse the vector; 2) transform reversed vector with the current answers; 3) reverse the transformed values.Caiman
@Caiman Thanks for the guidance, I think I got it [using @G.Grothendieck suggestion for the function]: rev(ave(rev(x), cumsum(!is.na(rev(x))), FUN = function(z) ifelse(seq_along(z) <= 3, z[1], NA)))Pires
G
5

First apply na.locf0 with maxgap = 2 giving x0 and define a grouping variable g using rleid from the data.table package. For each such group use ave to apply keeper which if the group is all NA replaces it with c(1, 1, NA, ..., NA) and otherwise outputs all 1s. Multiply na.locf0(x) by that.

library(data.table)
library(zoo)

mg <- 2
x0 <- na.locf0(x, maxgap = mg)
g <- rleid(is.na(x0))
keeper <- function(x) if (all(is.na(x)))  ifelse(seq_along(x) <= mg, 1, NA) else 1
na.locf0(x) * ave(x0, g, FUN = keeper)
## [1] NA  3  4  5  6  6  6 NA  7  8
Genetic answered 13/9, 2018 at 14:18 Comment(2)
Thanks! Would it be possible to know what rleid, ave and keeper are doing in this case?Pires
rleid creates a vector the same length as its input such that it assigns 1 to the first run, 2 to the second run and so on. keeper is described in the answer. ave splits the first argument into groups defined by the second argument, applies the specified function to each group and then puts it all back togeher. Use help for further info.Genetic
C
3

A solution using base R:

ave(x, cumsum(!is.na(x)), FUN = function(i){ i[1:pmin(length(i), 3)] <- i[1]; i })
# [1] NA  3  4  5  6  6  6 NA  7  8

cumsum(!is.na(x)) groups each run of NAs with most recent non-NA value.

function(i){ i[1:pmin(length(i), 3)] <- i[1]; i } transforms the first two NAs of each group into the leading non-NA value of this group.

Caiman answered 13/9, 2018 at 14:26 Comment(5)
Nice. A small simplification might be to use this as FUN: function(x) ifelse(seq_along(x) <= 2+1, x[1], NA)Genetic
@G.Grothendieck, good suggestion. My original function is quite clumsy.Caiman
I think it is very elegant to have a one liner although I am not very familiar with ave and pmin. How could you do the carry backwards? @Caiman @G.GrothendieckPires
@user3507584, A possible way is: 1) reverse the vector; 2) transform reversed vector with the current answers; 3) reverse the transformed values.Caiman
@Caiman Thanks for the guidance, I think I got it [using @G.Grothendieck suggestion for the function]: rev(ave(rev(x), cumsum(!is.na(rev(x))), FUN = function(z) ifelse(seq_along(z) <= 3, z[1], NA)))Pires

© 2022 - 2024 — McMap. All rights reserved.