I have a data frame like this:
> head(df1)
iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049
I want to replace the NA
's of intermittent years based on the years around it and the NA
's in years at the beginning and end of the range by carrying backward and forward the outer most non-NA observation.
My code to do this for one column is:
df1 %>%
group_by(iso) %>%
mutate(var1 = na.approx(var1, na.rm = FALSE, rule = 1)) %>%
mutate(var1 = na.locf(var1, na.rm = FALSE)) %>%
mutate(var1 = na.locf(var1, na.rm = FALSE, fromLast = TRUE))
This works, so now I want to do this for all columns in one go (there are more than 3 and they are not numbered like in my example). This I pieced together from the answers to this question. I omitted the two calls to na.locf
.
columnnames <- c("var1, "var2", "var3")
df1 %>%
group_by(iso) %>%
mutate_at(.vars = vars(columnnames), .funs = funs(na.approx(., na.rm = FALSE, rule = 1)))
This throws me an error and a warning:
Error in approx(x[!na], y[!na], xout, ...) : need at least two non-NA values to interpolate In addition: Warning message: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
I think I understand the error, but I did not get it when I used the first piece of code on var1
. The warning I don't follow. How cal I apply my code to all columns in my data frame? I also tried putting evertything in a loop, looping over columnnames
but that didn't work either (and it it probably not the best way to go about this).
mutate_at
:-) – Bacillary