r - check if every column is na
Asked Answered
L

5

6

I have a list of columns within a dataframe which where i want to check if all those columns are NA and create a new column which tells me if they are NA or not.

Here is an example of it working with one column, where Any_Flag is my new column:

ItemStats_2014$Any_Flag <- ifelse(is.na(ItemStats_2014$Item_Flag_A), "Y", "N")

When i try to run the check over multiple columns, I am getting what i expect:

ItemStats_2014$Any_Flag <- ifelse(all(is.na(ItemStats_2014[ ,grep("Flag", names(ItemStats_2014), value = T)])), "Y", "N")

It returns everything to be false or "N".

Lewie answered 6/4, 2018 at 18:3 Comment(0)
O
9

Data

set.seed(1)
data <- c(LETTERS, NA)
df <- data.frame(Flag_A = sample(data), Flag_B = sample(data), 
                 C = sample(data), D = sample(data), Flag_E = sample(data))

df <- rbind(NA, df)

Code

Identifying all NAs per row:

> df$All_NA <- apply(df[, grep("Flag", names(df))], 1, function(x) all(is.na(x)))
> head(df)
  Flag_A Flag_B    C    D Flag_E All_NA
1   <NA>   <NA> <NA> <NA>   <NA>   TRUE
2      H      K    B    T      Y  FALSE
3      J      W    C    K      P  FALSE
4      O      I    H    I   <NA>  FALSE
5      V      L    M    S      R  FALSE
6      E      N    P    E      I  FALSE

Identifying at least one NA per row:

> df$Any_NA <- apply(df[, grep("Flag", names(df))], 1, function(x) anyNA(x))
> head(df)
  Flag_A Flag_B    C    D Flag_E Any_NA
1   <NA>   <NA> <NA> <NA>   <NA>   TRUE
2      H      K    B    T      Y  FALSE
3      J      W    C    K      P  FALSE
4      O      I    H    I   <NA>   TRUE
5      V      L    M    S      R  FALSE
6      E      N    P    E      I  FALSE
Outlawry answered 6/4, 2018 at 19:49 Comment(7)
this it what i have been trying to do, but i keep getting this error: Error in apply(ItemStats_2014[, grep("Item_Flag", names(ItemStats_2014))], : dim(X) must have a positive lengthLewie
Can you paste the output of str(ItemStats_2014, list.len = 10) ?Langer
i assume you are looking for the structure of those fields. there are many columns. The ones i am trying to use in the function are characters. i am actually using an if statement to do the opposite of you any flag output Just so your request makes a little more sense.Lewie
` $ Item_Flag_A : chr NA NA NA NA ... $ Item_Flag_B : chr NA NA NA NA ... $ Item_Flag_C : chr NA NA NA NA ... $ Item_Flag_N : chr NA NA NA NA ... $ Item_Flag_O : chr NA NA NA NA ... $ Item_Flag_P : chr NA NA NA NA ... $ Item_Flag_R : chr "R" NA NA NA ... $ Item_Flag_V : chr NA NA NA NA ... $ Item_Flag_Z : chr NA NA NA NA ...`Lewie
Make sure your object is a data.frame or matrix, not a list. Can you verify that pls? Str() function usually tells that as the first info, but I missed that in your pasting. Just see dim() or class()...Langer
str(ItemStats_2014) Classes ‘data.table’ and 'data.frame': 19435 obs. of 151 variables:Lewie
That's it! You're dealing with 'data.table' object, so it require dt[, <cols_selection>, with = F]. Just add , with = F after the grep, yet inside the brackets.Langer
P
3

I'm not sure what the grep part is supposed to do, but here's a simpler way to accomplish what you want:

 apply(ItemStats_2014[, 2:10], MARGIN = 1, FUN = function(x) all(is.na(x)))

Replace 2:10 with whatever columns you want to check.

Amendment: If you want to detect which columns contain the word "Flag" rather than hard coding their indices -- which would be better anyway! -- I like the package stringr for working with text. You could do this to select your columns:

 library(stringr)
 MyCols <- which(str_detect(names(ItemStats_2014), "Flag"))

Now, replace 2:10 with MyCols in the apply(... code above.

Probabilism answered 6/4, 2018 at 18:11 Comment(2)
the grep is intended to call out columns where 'Flag' is in the name. is there any way to format to [, 2:10] to find the columns with 'flag' in the nameLewie
here is he error i get with apply: Error in apply(ItemStats_2014[, grep("Flag", names(ItemStats_2014))], : dim(X) must have a positive lengthLewie
S
3

And a data.table way without any apply is:

library(arsenal)
library(data.table)

# dummy data
set.seed(1)
data = c(LETTERS, NA)
dt = data.table(Flag_A=sample(data), Flag_B = sample(data), C=sample(data), D=sample(data), Flag_E=sample(data))
dt = rbind(NA, dt)

# All-NA/Any-NA check
columns_to_check = names(dt)[grep('Flag', names(dt))]
dt[, AllNA:=allNA(.SD), by=1:nrow(dt), .SDcols = columns_to_check]
dt[, AnyNA:=anyNA(.SD), by=1:nrow(dt), .SDcols = columns_to_check]
Sodality answered 22/11, 2018 at 18:43 Comment(0)
P
2

I think that you are trying to test if a row (not a column) contains at least one NA.

Here a dataset

x = c(1:10, NA)
df = data.frame(A = sample(x), B = sample(x), C = sample(x))

And here a loop that test that with anyNA

df$Any_na = apply(df[,2:3], 1, anyNA)
df

    A  B  C Any_na
1  NA  8  9  FALSE
2   5  9 NA   TRUE
3   9  3 10  FALSE
4   7  5  1  FALSE
5   4  2  3  FALSE
6  10  4  6  FALSE
7   3  1  2  FALSE
8   6  6  5  FALSE
9   1 10  7  FALSE
10  2 NA  8   TRUE
11  8  7  4  FALSE
Pisolite answered 6/4, 2018 at 18:12 Comment(1)
right a row but not of a whole data set. just the part of the dataset where 'Flag' is in the column name.Lewie
S
2

This might help you get started :

# Sample dataframe
dfx <- data.frame(
x = c(21L, 21L, 21L, 22L, 22L, NA),
y = c(1449, 1814, 582, 582, 947, 183),
s = c(26.4, 28.7, 32, 25.3, NA, 25.7),
z = c(NA,NA,NA,NA,NA,NA)
)

# Sapply works well here 
ifelse(sapply(dfx, function(x)all(is.na(x))) == TRUE, "Y","N")

output :

 x   y   s   z 
"N" "N" "N" "Y"
Starcrossed answered 6/4, 2018 at 18:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.