R: replacing NAs in a data.frame with values in the same position in another dataframe

Asked 15/12, 2016 at 23:41 Answered 19/1, 2023 at 9:26

I have a dataframe with some NA values:

dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfa

I would like to replace the NAs with values in the same position in another dataframe:

dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfrepair

I tried:

dfa1 <- dfa

dfa1 <- ifelse(dfa == NA, dfrepair, dfa)
dfa1

but this did not work.

Bassarisk answered 15/12, 2016 at 23:41 Comment(0)

You can do:

dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfa[is.na(dfa)] <- dfrepair[is.na(dfa)]
dfa

  a b c
1 1 1 7
2 3 5 7
3 3 4 6
4 4 3 5
5 5 8 2
6 7 9 3

Presbytery answered 15/12, 2016 at 23:51 Comment(2)

If creating a new table instead of modifying dfa: replace(dfa, is.na(dfa), dfrepair[is.na(dfa)]) – Sulphurbottom 16/12, 2016 at 0:15

In case the data.frame has different types, this way might change the type of the columns. Try it with dfrepair <- data.frame(a=letters[2:7],b=c(6:1),c=c(8:3)) what will change all columns of dfa from numeric to character. – Hindquarter 19/1, 2023 at 9:18

In the tidyverse, you can use purrr::map2_df, which is a strictly bivariate version of mapply that simplifies to a data.frame, and dplyr::coalesce, which replaces NA values in its first argument with the corresponding ones in the second.

library(tidyverse)

dfrepair %>% 
    mutate_all(as.numeric) %>%    # coalesce is strict about types
    map2_df(dfa, ., coalesce)

## # A tibble: 6 × 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     7
## 2     3     5     7
## 3     3     4     6
## 4     4     3     5
## 5     5     8     2
## 6     7     9     3

Sergius answered 15/12, 2016 at 23:53 Comment(0)

We can use Map from base R to do a columnwise comparison between the two datasets

dfa[] <- Map(function(x,y) {x[is.na(x)] <- y[is.na(x)]; x}, dfa, dfrepair)
dfa
#  a b c
#1 1 1 7
#2 3 5 7
#3 3 4 6
#4 4 3 5
#5 5 8 2
#6 7 9 3

Cnut answered 16/12, 2016 at 0:1 Comment(0)

dfa <- data.frame(a=c(1,NA,3,4,5,NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfa
dfrepair <- data.frame(a=c(2:7),b=c(6:1),c=c(8:3))
dfrepair 
library(dplyr)
coalesce(as.numeric(dfa), as.numeric(dfrepair))

  a b c
1 1 1 7
2 3 5 7
3 3 4 6
4 4 3 5
5 5 8 2
6 7 9 3

As the code in dplyr is written in C++ it is faster in most cases. An other important advantage is that coalesce as well as many other dplyr functions are the same in SQL. Using dplyr you learn SQL by coding in R. ;-)

Oliguria answered 16/12, 2016 at 7:30 Comment(5)

The as.numerics error out for me, though coalesce(dfa, dfrepair) surprisingly does work (the docs only talk about vectors, not whole data.frames) and doesn't complain about the different types. – Sergius 19/12, 2016 at 18:56

Thank you for the as.numeric. ;-) A data.frame is a list of list and vectors. In the case above as well as in many other cases it is just a list of vectors containing numbers. Therefore in our case it is very similar to a matrix. If the data.frame contains recursive lists my code will not run. – Oliguria 20/12, 2016 at 10:45

A data.frame may look like a matrix, but the underlying structure is very different. A data.frame is a list of one-dimensional vectors, whereas a matrix is a single vector with two dimensions. Most functions designed for vectors don't work on data.frames; the subsetting in coalesce must be fortuitous. – Sergius 20/12, 2016 at 10:56

A <- matrix(1:9, nrow = 3) is.vector(A) renders FALSE. According to this test a matrix is not a single vector in R. – Oliguria 21/12, 2016 at 8:25

From the language definition: "Matrices and arrays are simply vectors with the attribute dim and optionally dimnames attached to the vector." E.g. x <- diag(3); attributes(x) <- NULL; x Lists are vectors too (generic, not atomic), e.g. is.vector(list(1)), making the word confusing. is.vector just checks for attributes other than names, which is why is.vector(iris) returns FALSE even though typeof(iris) returns "list". A better check: purrr::is_vector(diag(3)); purrr::is_atomic(diag(3)): TRUE; TRUE. – Sergius 21/12, 2016 at 15:4

In case there are different types the replacement should be done columnwise. Another simple way allowing in place exchange might be.

for(i in seq_along(dfa)) {
    . <- is.na(dfa[[i]])
    dfa[[i]][.] <- dfrepair[[i]][.]
}

Or using in addition which which might improve speed / memory usage in some cases.

for(i in seq_along(dfa)) {
    . <- which(is.na(dfa[[i]]))
    dfa[[i]][.] <- dfrepair[[i]][.]
}

Benchmark of columnwise base options.

dfa <- data.frame(a=c("A",NA,"B","C","D",NA),b=c(1,5,NA,NA,8,9),c=c(7,NA,NA,NA,2,NA))
dfrepair <- data.frame(a=letters[2:7],b=c(6:1),c=c(8:3))

bench::mark(
akrun = local({dfa[] <- Map(function(x,y) {x[is.na(x)] <- y[is.na(x)]; x}, dfa, dfrepair); dfa}),
GKi = local({for(i in seq_along(dfa)) {. <- is.na(dfa[[i]])
                 dfa[[i]][.] <- dfrepair[[i]][.]}
                 dfa})
)
#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 akrun        64.4µs 70.3µs    12895.      280B     26.7  5793    12      449ms
#2 GKi          54.8µs   60µs    16347.      280B     28.7  7395    13      452ms

Hindquarter answered 19/1, 2023 at 9:26 Comment(0)

Recommended topics

Hot tags