Fill missing values rowwise (right / left)
Asked Answered
E

2

10

I'm looking for a way to "fill" NAs to the right (as opposed to down/up) with dplyr. In other words, I would like to convert d into d2 without having to explicitly reference any columns in a mutate call.

My real dataframe has several 10s of fields with staggered blocks of NAs spanning variable numbers of columns. I'm curious whether there's a short way to globally inherit the first non-NA value to the left, regardless of what field it occurs in.

d<-data.frame(c1=c("a",1:4), c2=c(NA,2,NA,4,5), c3=c(NA,3,4,NA,6))
d2<-data.frame(c1=c("a",1:4), c2=c("a",2,2,4,5), c3=c("a",3,4,4,6))
d
d2
Etz answered 14/4, 2019 at 1:13 Comment(2)
Wouldn't this cause an imbalance in type as the first column is factor, If 'a' replace the NA is first row, then the other column types also change.Brine
That would certainly be a problem for many use cases but, for this one, I'm treating everything as text prior to a dplyr chain that will gather/spread/mutate the data to a nice "long" format with a mix of numeric & character columns. I'm working with data from a pdf exported from an excel file that used merged cells for multiple column headers. The number of columns that form a block of data related to one location varies but the column with a value always seems to be the left-most one from the block luckily.Etz
B
10

We can do a gather into 'long' format, do the fill grouped by the row number and then spread back to 'wide' format

library(tidyverse)
rownames_to_column(d, 'rn') %>% 
    gather(key, val, -rn) %>%
    group_by(rn) %>% 
    fill(val) %>% 
    spread(key, val) %>%
    ungroup %>%
    select(-rn)
# A tibble: 5 x 3
#  c1    c2    c3   
#  <chr> <chr> <chr>
#1 a     a     a    
#2 1     2     3    
#3 2     2     4    
#4 3     4     4    
#5 4     5     6 

or another option without reshaping would be doing rowwise fill with na.locf

library(zoo)
d %>% 
    mutate(c1 = as.character(c1)) %>%
    pmap_dfr(., ~ na.locf(c(...)) %>%
                      as.list %>%
                      as_tibble)

Also, if we use na.locf, it run columnwise, so the data can be transposed and apply na.locf directly

d[] <- t(na.locf(t(d)))
d
#  c1 c2 c3
#1  a  a  a
#2  1  2  3
#3  2  2  4
#4  3  4  4
#5  4  5  6

As @G.Grothendieck mentioned in the comments, inorder to take care of the elements that are NA at the beginning of the row, use na.locf0 instead of na.locf

Brine answered 14/4, 2019 at 1:19 Comment(9)
super. I will try both approaches. I like the simplicity of the zoo::na.loc and I should have thought of the gather/spread on row number myself. The whole reason I need to do the filling operation is to make future gather/spread operations work properly.Etz
Might want to use.na.locf0 here in case first elements in a row are NA.Gorgeous
This worked beautifully for my task at hand. My source data were structured in a table that was wrapped to the page (i.e. table header repeated 4 to 5 times per page) and there were ~20 pages. I also realized that there were a few NAs in my data that I wanted preserve. My strategy was simply to add row_number as a field, create one df by filtering on field headers that I wanted to use na.locf on and create a 2nd df containing the actual data. I then bound the tables by row, sorted on the original row header and then moved on with my tidyr chain to convert to "long" format.Etz
@DanStrobridge Glad to know it is working for you. Regarding the other comments, I am not following it. Are you saying that there is some issueBrine
@DanStrobridge You can subset those data by the ids you are using and then do thena.locf on the other datasetBrine
No issues. Everything is working now. I'm glad to have learned about na.locf. Thanks so much.Etz
This calculations works for me but has one glitch: My first column contains names and all the data is shifted over one column to the left. So now the name column has data from column 2; column 2 has data from column 3; and my last column of data is now named name. There are no NAs in the name column by the way.Bootjack
edit: I used d[] <- t(na.locf(t(d)))Bootjack
@Bootjack perhaps you meant to do d[-1] <- t(na.locf(t(d[-1])))Brine
C
6

We can apply zoo::na.locf row-wise using apply

d[] <- t(apply(d, 1, zoo::na.locf))

d
#  c1 c2 c3
#1  a  a  a
#2  1  2  3
#3  2  2  4
#4  3  4  4
#5  4  5  6
Cresting answered 14/4, 2019 at 2:58 Comment(1)
See na.locf0 comment on other answer.Gorgeous

© 2022 - 2024 — McMap. All rights reserved.