Remove rows where all cells are empty except for the first column
Asked Answered
S

2

5

I have this dataframe (df):

df <- structure(list(id = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "K", "L"), name = c("Sam", "Dave", NA, "Alan", "Ted", 
"Ana", NA, NA, "Max", NA, "Walt", NA), age = c(28, 32, NA, NA, 
23, NA, 21, NA, NA, 22, 21, NA), height = c(NA, 161, NA, 168, 
167, NA, 166, NA, 158, 171, NA, NA)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -12L))

I want to remove rows that have all cells empty except for the id column. There could be any number of columns in the data. This df has only 3 columns (id excluding). This is the final data frame final:

final <- structure(list(id = c("A", "B", "D", "E", "F", "G", "I", "J", 
"K"), name = c("Sam", "Dave", "Alan", "Ted", "Ana", NA, "Max", 
NA, "Walt"), age = c(28, 32, NA, 23, NA, 21, NA, 22, 21), height = c(NA, 
161, 168, 167, NA, 166, 158, 171, NA)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -9L))

Is it possible with base R or the tidyverse package?

Shall answered 29/12, 2023 at 0:33 Comment(0)
B
4

In tidyverse you can use if_all inside filter

library(tidyverse)

df %>% 
  filter(!if_all(-id, is.na))
#> # A tibble: 9 x 4
#>   id    name    age height
#>   <chr> <chr> <dbl>  <dbl>
#> 1 A     Sam      28     NA
#> 2 B     Dave     32    161
#> 3 D     Alan     NA    168
#> 4 E     Ted      23    167
#> 5 F     Ana      NA     NA
#> 6 G     <NA>     21    166
#> 7 I     Max      NA    158
#> 8 J     <NA>     22    171
#> 9 K     Walt     21     NA

In base R, you can use apply:

df[!apply(df[-1], 1, \(x) all(is.na(x))),]
#> # A tibble: 9 x 4
#>   id    name    age height
#>   <chr> <chr> <dbl>  <dbl>
#> 1 A     Sam      28     NA
#> 2 B     Dave     32    161
#> 3 D     Alan     NA    168
#> 4 E     Ted      23    167
#> 5 F     Ana      NA     NA
#> 6 G     <NA>     21    166
#> 7 I     Max      NA    158
#> 8 J     <NA>     22    171
#> 9 K     Walt     21     NA
Brebner answered 29/12, 2023 at 0:41 Comment(2)
"except for the first column" <- surely there needs to be a check for the first column?Interception
I know it is deprecated (dplyr 1.0.8), but before if_all() or if_any() we used to do: filter(across(-id, ~is.na(.)))Partly
P
4

Another base R using rowSums:

df[rowSums(!is.na(df[-1])) > 0, ]

output:

 id    name    age height
  <chr> <chr> <dbl>  <dbl>
1 A     Sam      28     NA
2 B     Dave     32    161
3 D     Alan     NA    168
4 E     Ted      23    167
5 F     Ana      NA     NA
6 G     NA       21    166
7 I     Max      NA    158
8 J     NA       22    171
9 K     Walt     21     NA
Partly answered 29/12, 2023 at 4:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.