To add to the alternatives, you can also use replace
instead of the typical blah[index] <- NA
approach. replace
would look like:
df <- replace(df, df == "NA", NA)
Another alternative to consider is type.convert
. This is the function that R uses when reading data in to automatically convert column types. Thus, the result is different from your current approach in that, for instance, the second column gets converted to numeric.
df[] <- lapply(df, function(x) type.convert(as.character(x), na.strings = "NA"))
df
Here's a performance comparison. The sample data is from @roland's answer.
Here are the functions to test:
funop <- function() {
df[df == "NA"] <- NA
df
}
funr <- function() {
ind <- which(vapply(df, function(x) class(x) %in% c("character", "factor"), FUN.VALUE = TRUE))
as.data.table(df)[, names(df)[ind] := lapply(.SD, function(x) {
is.na(x) <- x == "NA"
x
}), .SDcols = ind][]
}
funam1 <- function() replace(df, df == "NA", NA)
funam2 <- function() {
df[] <- lapply(df, function(x) type.convert(as.character(x), na.strings = "NA"))
df
}
Here's the benchmarking:
library(microbenchmark)
microbenchmark(funop(), funr(), funam1(), funam2(), times = 10)
# Unit: seconds
# expr min lq mean median uq max neval
# funop() 3.629832 3.750853 3.909333 3.855636 4.098086 4.248287 10
# funr() 3.074825 3.212499 3.320430 3.279268 3.332304 3.685837 10
# funam1() 3.714561 3.899456 4.238785 4.065496 4.280626 5.512706 10
# funam2() 1.391315 1.455366 1.623267 1.566486 1.606694 2.253258 10
replace
would be the same as @roland's approach, which is the same as @jgozal's. However, the type.convert
approach would result in different column types.
all.equal(funop(), setDF(funr()))
all.equal(funop(), funam())
str(funop())
# 'data.frame': 10000000 obs. of 3 variables:
# $ vect1: Factor w/ 3 levels "BANANA","HELLO",..: 2 2 NA 2 1 1 1 NA 1 1 ...
# $ vect2: Factor w/ 3 levels "1","5","NA": NA 2 1 NA 1 NA NA 1 NA 2 ...
# $ vect3: Factor w/ 1 level "NA": NA NA NA NA NA NA NA NA NA NA ...
str(funam2())
# 'data.frame': 10000000 obs. of 3 variables:
# $ vect1: Factor w/ 2 levels "BANANA","HELLO": 2 2 NA 2 1 1 1 NA 1 1 ...
# $ vect2: int NA 5 1 NA 1 NA NA 1 NA 5 ...
# $ vect3: logi NA NA NA NA NA NA ...
na.strings
argument. Otherwise this should help, but replace the period with NA – Polyhedrontype.convert
after having read the data in, as I did in my answer. – Donatello