Fill missing values rowwise (right / left)

Asked 14/4, 2019 at 1:13 Answered 14/4, 2019 at 2:58

I'm looking for a way to "fill" NAs to the right (as opposed to down/up) with dplyr. In other words, I would like to convert d into d2 without having to explicitly reference any columns in a mutate call.

My real dataframe has several 10s of fields with staggered blocks of NAs spanning variable numbers of columns. I'm curious whether there's a short way to globally inherit the first non-NA value to the left, regardless of what field it occurs in.

d<-data.frame(c1=c("a",1:4), c2=c(NA,2,NA,4,5), c3=c(NA,3,4,NA,6))
d2<-data.frame(c1=c("a",1:4), c2=c("a",2,2,4,5), c3=c("a",3,4,4,6))
d
d2

Etz answered 14/4, 2019 at 1:13 Comment(2)

Wouldn't this cause an imbalance in type as the first column is factor, If 'a' replace the NA is first row, then the other column types also change. – Brine 14/4, 2019 at 1:17

That would certainly be a problem for many use cases but, for this one, I'm treating everything as text prior to a dplyr chain that will gather/spread/mutate the data to a nice "long" format with a mix of numeric & character columns. I'm working with data from a pdf exported from an excel file that used merged cells for multiple column headers. The number of columns that form a block of data related to one location varies but the column with a value always seems to be the left-most one from the block luckily. – Etz 14/4, 2019 at 12:7

We can do a gather into 'long' format, do the fill grouped by the row number and then spread back to 'wide' format

library(tidyverse)
rownames_to_column(d, 'rn') %>% 
    gather(key, val, -rn) %>%
    group_by(rn) %>% 
    fill(val) %>% 
    spread(key, val) %>%
    ungroup %>%
    select(-rn)
# A tibble: 5 x 3
#  c1    c2    c3   
#  <chr> <chr> <chr>
#1 a     a     a    
#2 1     2     3    
#3 2     2     4    
#4 3     4     4    
#5 4     5     6

or another option without reshaping would be doing rowwise fill with na.locf

library(zoo)
d %>% 
    mutate(c1 = as.character(c1)) %>%
    pmap_dfr(., ~ na.locf(c(...)) %>%
                      as.list %>%
                      as_tibble)

Also, if we use na.locf, it run columnwise, so the data can be transposed and apply na.locf directly

d[] <- t(na.locf(t(d)))
d
#  c1 c2 c3
#1  a  a  a
#2  1  2  3
#3  2  2  4
#4  3  4  4
#5  4  5  6

As @G.Grothendieck mentioned in the comments, inorder to take care of the elements that are NA at the beginning of the row, use na.locf0 instead of na.locf

Brine answered 14/4, 2019 at 1:19 Comment(9)

super. I will try both approaches. I like the simplicity of the zoo::na.loc and I should have thought of the gather/spread on row number myself. The whole reason I need to do the filling operation is to make future gather/spread operations work properly. – Etz 14/4, 2019 at 12:13

Might want to use.na.locf0 here in case first elements in a row are NA. – Gorgeous 14/4, 2019 at 15:19

This worked beautifully for my task at hand. My source data were structured in a table that was wrapped to the page (i.e. table header repeated 4 to 5 times per page) and there were ~20 pages. I also realized that there were a few NAs in my data that I wanted preserve. My strategy was simply to add row_number as a field, create one df by filtering on field headers that I wanted to use na.locf on and create a 2nd df containing the actual data. I then bound the tables by row, sorted on the original row header and then moved on with my tidyr chain to convert to "long" format. – Etz 15/4, 2019 at 13:40

@DanStrobridge Glad to know it is working for you. Regarding the other comments, I am not following it. Are you saying that there is some issue – Brine 15/4, 2019 at 14:21

@DanStrobridge You can subset those data by the ids you are using and then do thena.locf on the other dataset – Brine 15/4, 2019 at 14:25

No issues. Everything is working now. I'm glad to have learned about na.locf. Thanks so much. – Etz 16/4, 2019 at 15:36

This calculations works for me but has one glitch: My first column contains names and all the data is shifted over one column to the left. So now the name column has data from column 2; column 2 has data from column 3; and my last column of data is now named name. There are no NAs in the name column by the way. – Bootjack 31/7, 2022 at 1:59

edit: I used d[] <- t(na.locf(t(d))) – Bootjack 31/7, 2022 at 2:6

@Bootjack perhaps you meant to do d[-1] <- t(na.locf(t(d[-1]))) – Brine 31/7, 2022 at 17:20

We can apply zoo::na.locf row-wise using apply

d[] <- t(apply(d, 1, zoo::na.locf))

d
#  c1 c2 c3
#1  a  a  a
#2  1  2  3
#3  2  2  4
#4  3  4  4
#5  4  5  6

Cresting answered 14/4, 2019 at 2:58 Comment(1)

See na.locf0 comment on other answer. – Gorgeous 14/4, 2019 at 15:19

Recommended topics

Hot tags