Remove trailing NA by group in a data.frame
Asked Answered
L

2

6

I have a data.frame with a grouping variable, and some NAs in the value column.

df = data.frame(group=c(1,1,2,2,2,2,2,3,3), value1=1:9, value2=c(NA,4,9,6,2,NA,NA,1,NA))

I can use zoo::na.trim to remove NA at the end of a column: this will remove the last line of the data.frame:

library(zoo)
library(dplyr)
df %>% na.trim(sides="right")

Now I want to remove the trailing NAs by group; how can I achieve this using dplyr?

Expected output for value2 column: c(NA, 4,9,6,2,1)

Lyn answered 13/2, 2019 at 10:8 Comment(2)
Can you also have NAs in value1 that could cause the row to be removed or is it explicitly for value2?Brominate
in my use case there are NA in one numeric column only... but yes in theory could be moreLyn
S
3

Using lapply, loop through group:

do.call("rbind", lapply(split(df, df$group), na.trim, sides = "right"))

#     group value1 value2
# 1.1     1      1     NA
# 1.2     1      2      4
# 2.3     2      3      9
# 2.4     2      4      6
# 2.5     2      5      2
# 3       3      8      1

Or using by, as mentioned by @Henrik:

do.call("rbind", by(df, df$group, na.trim, sides = "right"))
Suzettesuzi answered 13/2, 2019 at 10:18 Comment(6)
or in a pipe indeed: df %>% split(.$group) %>% lapply(na.trim, sides="right") %>% do.call('rbind', .)Lyn
@Lyn ouch, not the best use of pipes. Pretty sure there is a prettier purrr solution to this.Suzettesuzi
Yes... one question, if I have not 1 grouping variable but more, I would need to create a custom split variable right?Lyn
Also, save some characters with good'ol by instead of lapply(split(Alphosis
@Lyn no, we just create list, see this postSuzettesuzi
@Alphosis Thanks, I always forget "by", more comfortable with "*apply".Suzettesuzi
V
7

You could write a little helper function that checks for trailing NAs of a vector and then use group_by and filter.

f <- function(x) { rev(cumsum(!is.na(rev(x)))) != 0 }

library(dplyr)
df %>% 
  group_by(group) %>% 
  filter(f(value2))
# A tibble: 6 x 3
# Groups:   group [3]
  group value1 value2
  <dbl>  <int>  <dbl>
1     1      1     NA
2     1      2      4
3     2      3      9
4     2      4      6
5     2      5      2
6     3      8      1

edit

If we need to remove both leading and trailing zero we need to extend that function a bit.

f1 <- function(x) { cumsum(!is.na(x)) != 0 & rev(cumsum(!is.na(rev(x)))) != 0 }

Given df1

df1 = data.frame(group=c(1,1,2,2,2,2,2,3,3), value1=1:9, value2=c(NA,4,9,NA,2,NA,NA,1,NA))
df1
#  group value1 value2
#1     1      1     NA
#2     1      2      4
#3     2      3      9
#4     2      4     NA
#5     2      5      2
#6     2      6     NA
#7     2      7     NA
#8     3      8      1
#9     3      9     NA

We get this result

df1 %>% 
  group_by(group) %>% 
  filter(f1(value2))
# A tibble: 5 x 3
# Groups:   group [3]
  group value1 value2
  <dbl>  <int>  <dbl>
1     1      2      4
2     2      3      9
3     2      4     NA
4     2      5      2
5     3      8      1
Vibraharp answered 13/2, 2019 at 10:35 Comment(2)
thanks, would it work if I want also to remove leading NAs?Lyn
No. Would you want to remove both leading and trailing zeros?Vibraharp
S
3

Using lapply, loop through group:

do.call("rbind", lapply(split(df, df$group), na.trim, sides = "right"))

#     group value1 value2
# 1.1     1      1     NA
# 1.2     1      2      4
# 2.3     2      3      9
# 2.4     2      4      6
# 2.5     2      5      2
# 3       3      8      1

Or using by, as mentioned by @Henrik:

do.call("rbind", by(df, df$group, na.trim, sides = "right"))
Suzettesuzi answered 13/2, 2019 at 10:18 Comment(6)
or in a pipe indeed: df %>% split(.$group) %>% lapply(na.trim, sides="right") %>% do.call('rbind', .)Lyn
@Lyn ouch, not the best use of pipes. Pretty sure there is a prettier purrr solution to this.Suzettesuzi
Yes... one question, if I have not 1 grouping variable but more, I would need to create a custom split variable right?Lyn
Also, save some characters with good'ol by instead of lapply(split(Alphosis
@Lyn no, we just create list, see this postSuzettesuzi
@Alphosis Thanks, I always forget "by", more comfortable with "*apply".Suzettesuzi

© 2022 - 2024 — McMap. All rights reserved.