NA filling only if "sandwiched" by the same value using dplyr

Asked 6/8, 2018 at 9:9 Answered 6/8, 2018 at 10:4

Ok, here is yet another missing value filling question.

I am looking for a way to fill NAs based on both the previous and next existent values in a column. Standard filling in a single direction is not sufficient for this task.

If the previous and next valid values in a column are not the same, then the chunk remains as NA.

The code for the sample data frame is:

df_in <- tibble(id= 1:12,
        var1 = letters[1:12],
        var2 = c(NA,rep("A",2),rep(NA,2),rep("A",2),rep(NA,2),rep("B",2),NA))

Thanks,

Absolve answered 6/8, 2018 at 9:9 Comment(3)

"If the previous and next valid values in a column are not the same, then the chunk remains as NA." Based on your rule, I don't see how rows 4 and 5 get filled. In row 4 the previous value is A and the next value NA. Therefore it should stay NA. Similarly for row 5. Could you please clarify? – Keffer 6/8, 2018 at 9:22

@MauritsEvers I think 'valid' should be interpreted as 'not NA' here. – Mopboard 6/8, 2018 at 9:24

@Mopboard Hmm, yes I think you're right:-) – Keffer 6/8, 2018 at 9:27

Comparing na.locf() (last observation carried forward) and na.locf(fromLast = TRUE) (backward):

mutate(df_in, 
       var_new = if_else(
         zoo::na.locf(var2, na.rm = FALSE) == 
           zoo::na.locf(var2, na.rm = FALSE, fromLast = TRUE),
         zoo::na.locf(var2, na.rm = FALSE),
         NA_character_
       ))

# # A tibble: 12 x 4
#       id var1  var2  var_new
#    <int> <chr> <chr> <chr>  
#  1     1 a     NA    NA     
#  2     2 b     A     A      
#  3     3 c     A     A      
#  4     4 d     NA    A      
#  5     5 e     NA    A      
#  6     6 f     A     A      
#  7     7 g     A     A      
#  8     8 h     NA    NA     
#  9     9 i     NA    NA     
# 10    10 j     B     B      
# 11    11 k     B     B      
# 12    12 l     NA    NA

Chinchin answered 6/8, 2018 at 10:4 Comment(2)

Quick follow up question (and slightly unrelated): I noticed that if the mutate is applied to the original variable "var2", a simple NA is sufficient at the end of the if_else statement. Why is NA_character_ required instead of plain NA at the end of the if_else statement when mutating to a new variable? – Absolve 6/8, 2018 at 23:44

For me it doesn't matter if I assign to var2 or var_new, I get an error with just NA, because dplyr::if_else() (with an _) does type checking. But note I've edited my answer and had originally used base::ifelse (no _), that doesn't check type and does coercion afterwards – Gladisgladney 7/8, 2018 at 8:1

Something like this?

df_in %>% mutate(var_new = {
       tmp <- var2
       tmp[is.na(tmp)] <- "NA"
       rl <- rle(tmp)
       tibble(before = c(NA, head(rl$values, -1)),
              value  = rl$values,
              after  = c(tail(rl$values, -1), NA),
              lengths = rl$lengths) %>%
       mutate(value = ifelse(value == "NA" & before == after, before, value),
              value = ifelse(value == "NA", NA, value)) %>%
       select(value, lengths) %>%
       unname() %>%
       do.call(rep, .)})

# # A tibble: 12 x 4
#       id var1  var2  var_new
#    <int> <chr> <chr> <chr>  
#  1     1 a     NA    <NA>   
#  2     2 b     A     A      
#  3     3 c     A     A      
#  4     4 d     NA    A      
#  5     5 e     NA    A      
#  6     6 f     A     A      
#  7     7 g     A     A      
#  8     8 h     NA    <NA>   
#  9     9 i     NA    <NA>   
# 10    10 j     B     B      
# 11    11 k     B     B      
# 12    12 l     NA    <NA>

Explanation

Convert NA to "NA" (because rle does not count consecutive NA.)
Create a run length encoded representation of tmp
Now you cna have a look at values beofre and after the relevant blocks
Replace the values.

Kathrynekathy answered 6/8, 2018 at 9:34 Comment(3)

I get the same values for var_new as for var2. Nothing seems to get filled. Can you double-check? – Keffer 6/8, 2018 at 9:35

Sorry forget a line from my code, is updated now. Basically you need to transform NA to "NA" because rle does not count consecutive NA but treats them as distinct values – Kathrynekathy 6/8, 2018 at 9:41

Nice one @Kathrynekathy +1 – Keffer 6/8, 2018 at 9:42

Recommended topics

Hot tags