How could I Replace a NA with mean of its previous and next rows in a fast manner?
name grade
1 A 56
2 B NA
3 C 70
4 D 96
such that B's grade would be 63.
How could I Replace a NA with mean of its previous and next rows in a fast manner?
name grade
1 A 56
2 B NA
3 C 70
4 D 96
such that B's grade would be 63.
Or you may try na.approx
from package zoo
: "Missing values (NAs) are replaced by linear interpolation"
library(zoo)
x <- c(56, NA, 70, 96)
na.approx(x)
# [1] 56 63 70 96
This also works if you have more than one consecutive NA
:
vals <- c(1, NA, NA, 7, NA, 10)
na.approx(vals)
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
na.approx
is based on the base
function approx
, which may be used instead:
vals <- c(1, NA, NA, 7, NA, 10)
xout <- seq_along(vals)
x <- xout[!is.na(vals)]
y <- vals[!is.na(vals)]
approx(x = x, y = y, xout = xout)$y
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
Assume you have a data.frame df
like this:
> df
name grade
1 A 56
2 B NA
3 C 70
4 D 96
5 E NA
6 F 95
Then you can use the following:
> ind <- which(is.na(df$grade))
> df$grade[ind] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+1]))))
> df
name grade
1 A 56
2 B 63
3 C 70
4 D 96
5 E 95.5
6 F 95
ind <- which(df$grade<(-100))
and df$grade[ind:ind+2] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+3]))))
For x<-100 –
Dulcet sapply
call, you could also use: df$grade[ind] <- with(df, ((grade[ind-1] + grade[ind+1])/2))
–
Griff An alternative solution, using the median instead of mean, is represented by the na.roughfix
function of the randomForest
package.
As described in the documentation, it works with a data frame or numeric matrix.
Specifically, for numeric variables, NAs
are replaced with column medians. For factor variables, NAs
are replaced with the most frequent levels (breaking ties at random). If object contains no NAs
, it is returned unaltered.
Using the same examples as @Henrik,
library(randomForest)
x <- c(56, NA, 70, 96)
na.roughfix(x)
#[1] 56 70 70 96
or with a larger matrix:
y <- matrix(1:50, nrow = 10)
y[sample(1:length(y), 4, replace = FALSE)] <- NA
y
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21 31 41
# [2,] 2 12 22 32 42
# [3,] 3 NA 23 33 NA
# [4,] 4 14 24 34 44
# [5,] 5 15 25 35 45
# [6,] 6 16 NA 36 46
# [7,] 7 17 27 37 47
# [8,] 8 18 28 38 48
# [9,] 9 19 29 39 49
# [10,] 10 20 NA 40 50
na.roughfix(y)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21.0 31 41
# [2,] 2 12 22.0 32 42
# [3,] 3 16 23.0 33 46
# [4,] 4 14 24.0 34 44
# [5,] 5 15 25.0 35 45
# [6,] 6 16 24.5 36 46
# [7,] 7 17 27.0 37 47
# [8,] 8 18 28.0 38 48
# [9,] 9 19 29.0 39 49
#[10,] 10 20 24.5 40 50
© 2022 - 2024 — McMap. All rights reserved.