Remove trailing NA by group in a data.frame

Asked 13/2, 2019 at 10:8 Answered 13/2, 2019 at 10:35

I have a data.frame with a grouping variable, and some NAs in the value column.

df = data.frame(group=c(1,1,2,2,2,2,2,3,3), value1=1:9, value2=c(NA,4,9,6,2,NA,NA,1,NA))

I can use zoo::na.trim to remove NA at the end of a column: this will remove the last line of the data.frame:

library(zoo)
library(dplyr)
df %>% na.trim(sides="right")

Now I want to remove the trailing NAs by group; how can I achieve this using dplyr?

Expected output for value2 column: c(NA, 4,9,6,2,1)

Lyn answered 13/2, 2019 at 10:8 Comment(2)

Can you also have NAs in value1 that could cause the row to be removed or is it explicitly for value2? – Brominate 13/2, 2019 at 10:37

in my use case there are NA in one numeric column only... but yes in theory could be more – Lyn 13/2, 2019 at 10:46

Using lapply, loop through group:

do.call("rbind", lapply(split(df, df$group), na.trim, sides = "right"))

#     group value1 value2
# 1.1     1      1     NA
# 1.2     1      2      4
# 2.3     2      3      9
# 2.4     2      4      6
# 2.5     2      5      2
# 3       3      8      1

Or using by, as mentioned by @Henrik:

do.call("rbind", by(df, df$group, na.trim, sides = "right"))

Suzettesuzi answered 13/2, 2019 at 10:18 Comment(6)

or in a pipe indeed: df %>% split(.$group) %>% lapply(na.trim, sides="right") %>% do.call('rbind', .) – Lyn 13/2, 2019 at 10:50

@Lyn ouch, not the best use of pipes. Pretty sure there is a prettier purrr solution to this. – Suzettesuzi 13/2, 2019 at 11:0

Yes... one question, if I have not 1 grouping variable but more, I would need to create a custom split variable right? – Lyn 13/2, 2019 at 11:4

Also, save some characters with good'ol by instead of lapply(split( – Alphosis 13/2, 2019 at 11:19

@Lyn no, we just create list, see this post – Suzettesuzi 13/2, 2019 at 11:22

@Alphosis Thanks, I always forget "by", more comfortable with "*apply". – Suzettesuzi 13/2, 2019 at 11:26

You could write a little helper function that checks for trailing NAs of a vector and then use group_by and filter.

f <- function(x) { rev(cumsum(!is.na(rev(x)))) != 0 }

library(dplyr)
df %>% 
  group_by(group) %>% 
  filter(f(value2))
# A tibble: 6 x 3
# Groups:   group [3]
  group value1 value2
  <dbl>  <int>  <dbl>
1     1      1     NA
2     1      2      4
3     2      3      9
4     2      4      6
5     2      5      2
6     3      8      1

edit

If we need to remove both leading and trailing zero we need to extend that function a bit.

f1 <- function(x) { cumsum(!is.na(x)) != 0 & rev(cumsum(!is.na(rev(x)))) != 0 }

Given df1

df1 = data.frame(group=c(1,1,2,2,2,2,2,3,3), value1=1:9, value2=c(NA,4,9,NA,2,NA,NA,1,NA))
df1
#  group value1 value2
#1     1      1     NA
#2     1      2      4
#3     2      3      9
#4     2      4     NA
#5     2      5      2
#6     2      6     NA
#7     2      7     NA
#8     3      8      1
#9     3      9     NA

We get this result

df1 %>% 
  group_by(group) %>% 
  filter(f1(value2))
# A tibble: 5 x 3
# Groups:   group [3]
  group value1 value2
  <dbl>  <int>  <dbl>
1     1      2      4
2     2      3      9
3     2      4     NA
4     2      5      2
5     3      8      1

Vibraharp answered 13/2, 2019 at 10:35 Comment(2)

thanks, would it work if I want also to remove leading NAs? – Lyn 13/2, 2019 at 11:3

No. Would you want to remove both leading and trailing zeros? – Vibraharp 13/2, 2019 at 11:5