It appears that a solution is missing for multiple values to be replaced and for factors, so I will add one.
Consider a data frame dat
with various classes.
dat
# character integer Date factor POSIX
# 1 4 2022-07-10 B 2022-07-10 20:08:10
# 2 1 2022-07-11 FOO 2022-07-10 21:08:10
# 3 -2 2022-07-12 2022-07-10 22:08:10
# 4 2 2022-07-13 B 2022-07-10 23:08:10
# 5 a 3 2022-07-14 2022-07-11 00:08:10
# 6 c 1 2022-07-15 2022-07-11 01:08:10
# 7 a -1 2022-07-16 FOO 2022-07-11 02:08:10
# 8 a -1 2022-07-17 A 2022-07-11 03:08:10
# 9 4 2022-07-18 FOO 2022-07-11 04:08:10
# 10 c 0 2022-07-19 FOO 2022-07-11 05:08:10
# 11 b -2 2022-07-20 B 2022-07-11 06:08:10
# 12 c -2 2022-07-21 A 2022-07-11 07:08:10
We may put everything we want to convert to NA on a list to_na
,
To_NA <- list('', -1, -2, 'c', 'FOO', as.Date('2022-07-17'), as.POSIXct('2022-07-11 00:08:10'))
and use it in a small function make_na
based on replace
. if
the respective variable is.factor
we may want to droplevels
of values that have just been deleted.
make_na <- \(x, z) {x <- replace(x, x %in% z, NA); if (is.factor(x)) droplevels(x) else x}
We can apply it on a vector,
make_na(dat$character, To_NA)
# [1] NA NA NA NA "a" NA "a" "a" NA NA "b" NA
or loop over the columns using lapply
.
dat[] <- lapply(dat, make_na, To_NA)
Gives
dat
# character integer Date factor POSIX
# 1 <NA> 4 2022-07-10 B 2022-07-10 20:08:10
# 2 <NA> 1 2022-07-11 <NA> 2022-07-10 21:08:10
# 3 <NA> NA 2022-07-12 <NA> 2022-07-10 22:08:10
# 4 <NA> 2 2022-07-13 B 2022-07-10 23:08:10
# 5 a 3 2022-07-14 <NA> <NA>
# 6 <NA> 1 2022-07-15 <NA> 2022-07-11 01:08:10
# 7 a NA 2022-07-16 <NA> 2022-07-11 02:08:10
# 8 a NA <NA> A 2022-07-11 03:08:10
# 9 <NA> 4 2022-07-18 <NA> 2022-07-11 04:08:10
# 10 <NA> 0 2022-07-19 <NA> 2022-07-11 05:08:10
# 11 b NA 2022-07-20 B 2022-07-11 06:08:10
# 12 <NA> NA 2022-07-21 A 2022-07-11 07:08:10
Where:
str(dat)
# 'data.frame': 12 obs. of 5 variables:
# $ character: chr NA NA NA NA ...
# $ integer : int 4 1 NA 2 3 1 NA NA 4 0 ...
# $ Date : Date, format: "2022-07-10" "2022-07-11" "2022-07-12" ...
# $ factor : Factor w/ 2 levels "A","B": 2 NA NA 2 NA NA NA 1 NA NA ...
# $ POSIX : POSIXct, format: "2022-07-10 20:08:10" "2022-07-10 21:08:10" "2022-07-10 22:08:10" ...
Data:
dat <- structure(list(character = c("", "", "", "", "a", "c", "a", "a",
"", "c", "b", "c"), integer = c(4L, 1L, -2L, 2L, 3L, 1L, -1L,
-1L, 4L, 0L, -2L, -2L), Date = structure(c(19183, 19184, 19185,
19186, 19187, 19188, 19189, 19190, 19191, 19192, 19193, 19194
), class = "Date"), factor = structure(c(3L, 4L, 1L, 3L, 1L,
1L, 4L, 2L, 4L, 4L, 3L, 2L), levels = c("", "A", "B", "FOO"), class = "factor"),
POSIX = structure(c(1657476490L, 1657480090L, 1657483690L,
1657487290L, 1657490890L, 1657494490L, 1657498090L, 1657501690L,
1657505290L, 1657508890L, 1657512490L, 1657516090L), class = c("POSIXct",
"POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-12L))