Replace NAs with missing values in sequence (R)

Asked 6/8, 2021 at 3:28 Answered 7/8, 2021 at 12:40

I have a DF like

Now I want to replace The Col B = NA with 15 since that is the missing value. Col C first NA with 14 and second NA with 15. Col D first NA with 13, second NA with 14 and third NA with 15. So the numbers follow a sequence up to down or down to up.

Reproducible Sample Data

structure(list(`Col A` = c(11, 12, 13, 14, 15), `Col B` = c(NA, 
11, 12, 13, 14), `Col C` = c(NA, NA, 11, 12, 13), `Col D` = c(NA, 
NA, NA, 11, 12)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Abode answered 6/8, 2021 at 3:28 Comment(0)

I think you can use the following solution in tidyverse:

library(dplyr)
library(purrr)

df[1] %>%
  bind_cols(map_dfc(2:length(df), function(x) {
    df[[x]][which(is.na(df[[x]]))] <- setdiff(df[[1]], df[[x]][!is.na(df[[x]])])
    df[x]
  }))

# A tibble: 5 x 4
  `Col A` `Col B` `Col C` `Col D`
    <dbl>   <dbl>   <dbl>   <dbl>
1      11      15      14      13
2      12      11      15      14
3      13      12      11      15
4      14      13      12      11
5      15      14      13      12

Or in base R we could do:

do.call(cbind, Reduce(function(x, y) {
  i <- which(is.na(df[[y]]))
  df[[y]][i] <- sort(setdiff(x, df[[y]]))
  df[[y]]
}, init = df[[1]], 2:length(df), accumulate = TRUE)) |>
  as.data.frame() |>
  setNames(paste0("Col", LETTERS[1:length(df)]))

  ColA ColB ColC ColD
1   11   15   14   13
2   12   11   15   14
3   13   12   11   15
4   14   13   12   11
5   15   14   13   12

Allix answered 6/8, 2021 at 5:15 Comment(1)

fantastic answer! dear friend – Leucine 6/8, 2021 at 6:23

You can try:

df[is.na(df)] <- head({tm <- toeplitz(rev(df$ColA))}[upper.tri(tm, diag = TRUE)], sum(is.na(df)))

Which gives:

  ColA ColB ColC ColD
1   11   15   14   13
2   12   11   15   14
3   13   12   11   15
4   14   13   12   11
5   15   14   13   12

Dottiedottle answered 6/8, 2021 at 6:47 Comment(0)

Try the code below

df[-1] <- lapply(
  df[-1],
  function(x) {
    replace(x, is.na(x), df[[1]][is.na(match(df[[1]], x))])
  }
)

and you will get

> df
# A tibble: 5 x 4
  `Col A` `Col B` `Col C` `Col D`
    <dbl>   <dbl>   <dbl>   <dbl>
1      11      15      14      13
2      12      11      15      14
3      13      12      11      15
4      14      13      12      11
5      15      14      13      12

Evansville answered 7/8, 2021 at 7:24 Comment(1)

Truly elegant and concise. – Allix 7/8, 2021 at 7:39

If the sequence is constant (in this case from 11 to 15), then one option using dplyr could be:

df %>%
 mutate(across(everything(), ~ if_else(is.na(.), max(., na.rm = TRUE) + cumsum(is.na(.)), .)))

  `Col A` `Col B` `Col C` `Col D`
    <dbl>   <dbl>   <dbl>   <dbl>
1      11      15      14      13
2      12      11      15      14
3      13      12      11      15
4      14      13      12      11
5      15      14      13      12

If the sequence may vary, then one option could be:

df %>%
 mutate(across(-1, 
               ~ if_else(is.na(.),
                         cumsum(is.na(.)) - 1 + last(pull(select(cur_data(), which(names(cur_data()) == cur_column()) - 1))),
                         .)))

Or:

df %>%
 mutate(across(-1, 
               ~ if_else(is.na(.),
                         cumsum(is.na(.)) - 1 + last(get(paste0("Col ", LETTERS[which(names(cur_data()) == cur_column()) - 1]))),
                         .)))

Longheaded answered 7/8, 2021 at 12:40 Comment(0)

Recommended topics

Hot tags