I am trying to exclude rows have missing values (NA
) in all columns for that row AND for which all subsequent rows have only missing values (or is the last empty row itself), i.e. I want to remove trailing "all-NA
" rows.
I came up with the solution below, which works but is too slow (I am using this function on thousands of tables), probably because of the while
loop.
## Aux function to remove NA rows below table
remove_empty_row_last <- function(dt){
dt[ , row_empty := rowSums(is.na(dt)) == ncol(dt)]
while (dt[.N, row_empty] == TRUE) {
dt <- dt[1:(.N-1)]
}
dt %>% return()
}
d <- data.table(a = c(1,NA,3,NA,5,NA,NA), b = c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d)
#EDIT2: adding more test cases
d2 <- data.table(A = c(1,NA,3,NA,5,1 ,NA), B = c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d2)
d3 <- data.table(A = c(1,NA,3,NA,5,NA,NA), B = c(1,NA,3,4,5,1,NA))
remove_empty_row_last(d3)
#Edit3:adding no NA rows test case
d4 <- data.table(A = c(1,2,3,NA,5,NA,NA), B = c(1,2,3,4,5,1,7))
d4 %>% remove_empty_row_last()
dt %>% return()
is really bad. – Lordanzoo::na.trim
:na.trim(d, is.na = "all", sides = "right")
– Henkel