I'm looking for something similar to na.locf()
in the zoo
package, but instead of always using the previous non-NA
value I'd like to use the nearest non-NA
value. Some example data:
dat <- c(1, 3, NA, NA, 5, 7)
Replacing NA
with na.locf
(3 is carried forward):
library(zoo)
na.locf(dat)
# 1 3 3 3 5 7
and na.locf
with fromLast
set to TRUE
(5 is carried backwards):
na.locf(dat, fromLast = TRUE)
# 1 3 5 5 5 7
But I wish the nearest non-NA
value to be used. In my example this means that the 3 should be carried forward to the first NA
, and the 5 should be carried backwards to the second NA
:
1 3 3 5 5 7
I have a solution coded up, but wanted to make sure that I wasn't reinventing the wheel. Is there something already floating around?
FYI, my current code is as follows. Perhaps if nothing else, someone can suggest how to make it more efficient. I feel like I'm missing an obvious way to improve this:
na.pos <- which(is.na(dat))
if (length(na.pos) == length(dat)) {
return(dat)
}
non.na.pos <- setdiff(seq_along(dat), na.pos)
nearest.non.na.pos <- sapply(na.pos, function(x) {
return(which.min(abs(non.na.pos - x)))
})
dat[na.pos] <- dat[non.na.pos[nearest.non.na.pos]]
To answer smci's questions below:
- No, any entry can be NA
- If all are NA, leave them as is
- No. My current solution defaults to the lefthand nearest value, but it doesn't matter
- These rows are a few hundred thousand elements typically, so in theory the upper bound would be a few hundred thousand. In reality it'd be no more than a few here & there, typically a single one.
Update So it turns out that we're going in a different direction altogether but this was still an interesting discussion. Thanks all!
fromLast
looks like it may do what you want. – Smallagerle(which(is.na(dat)))
. Not saying that's the most efficient but it's an improvement. See also "How can I count runs in R?" which needs a tweakrle.na()
to handle NAs. – Hermitage