Remove columns from dataframe where some of values are NA
Asked Answered
P

8

46

I have a dataframe where some of the values are NA. I would like to remove these columns.

My data.frame looks like this

    v1   v2 
1    1   NA 
2    1    1 
3    2    2 
4    1    1 
5    2    2 
6    1   NA

I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work.

data=subset(Itun, select=c(is.na(colMeans(Itun))))

I got an error,

error : 'x' must be an array of at least two dimensions

Can anyone give me some help?

Pluviometer answered 17/9, 2012 at 7:4 Comment(1)
Please add an example of what you would like to have as a result. It would also be really helpful to have a fully reproducible example.Ahead
C
71

The data:

Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA)) 

This will remove all columns containing at least one NA:

Itun[ , colSums(is.na(Itun)) == 0]

An alternative way is to use apply:

Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]
Clayson answered 17/9, 2012 at 7:25 Comment(8)
This will remove rows with NAs, not columns.Melonymelos
@Backlin, but to Sven's benefit, the whole question is really poorly worded and it's not clear what exactly the OP wants to do. Drop the columns? Convert something to zero?Ekg
True. But he never says anything about rows and uses subset(..., select=...) so I figured he wants to extract all rows for certain columns.Melonymelos
@SvenHohenstein: Sorry for my poorly organized words. I would like to extract columns without NAs from a dataframe.Pluviometer
doesn't this return a logical array, without subsetting the data?Granulite
should it be Itun[ , colSums(is.na(Itun)) == 0, with = FALSE]?Granulite
@Granulite Itun is a data.frame, not a data.table.Clayson
@SvenHohenstein sorry I thought I had read data.table in the answer. My mistakeGranulite
B
47

Here's a convenient way to do it using the dplyr function select_if(). Combine not (!), any() and is.na(), which is equivalent to selecting all columns that don't contain any NA values.

library(dplyr)
Itun %>%
    select_if(~ !any(is.na(.)))
Barfly answered 27/10, 2017 at 16:48 Comment(5)
I was wondering if you can extract the column names of the removed columns simultaneously. Is this possible?Detonation
I'd split that into two operations. Use Itun %>% select_if(~ any(is.na(.))) %>% names(). Then remove columns in second operation using code above.Barfly
great solution. for the cases that collumns should be remove that only have NAs you can use select_if(~ !all(is.na(.))Pepi
This solution is very nice but very slow. Itun[ , colSums(is.na(Itun)) == 0] by @Sven-hohenstein is much faster.Su
What does it return though if I wanted to have the columns that have NA/NULL ? When I ran the opposite (i.e. without !), it returned a bunch of columns that didn't have NAs; the column that had NA was returned along with them, though.Victoria
H
18

Alternatively, select(where(~FUNCTION)) can be used:

library(dplyr)

(df <- data.frame(x = letters[1:5], y = NA, z = c(1:4, NA)))
#>   x  y  z
#> 1 a NA  1
#> 2 b NA  2
#> 3 c NA  3
#> 4 d NA  4
#> 5 e NA NA

# Remove columns where all values are NA
df %>% 
  select(where(~!all(is.na(.))))
#>   x  z
#> 1 a  1
#> 2 b  2
#> 3 c  3
#> 4 d  4
#> 5 e NA
  
# Remove columns with at least one NA  
df %>% 
  select(where(~!any(is.na(.))))
#>   x
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e
Hack answered 4/9, 2020 at 17:28 Comment(0)
I
14

You can use transpose twice:

newdf <- t(na.omit(t(df)))
Inelastic answered 1/4, 2016 at 19:13 Comment(0)
M
6
data[,!apply(is.na(data), 2, any)]
Melonymelos answered 17/9, 2012 at 7:27 Comment(2)
Shouldn't the data.frame version be the same as the matrix version, just without the first comma? I get an error (undefined columns selected) with your code as it is.Ekg
However, apply converts the input to a matrix prior to applying the function, so I prefer to use sapply or lapply on data frames. Then again so does is.na so in this case the input is already a matrix and my first example was actually incorrect! Perhaps the conceptually nices solution is sapply(data, function(x) !any(is.na(x))), but this is really nitpicking.Melonymelos
A
2

A base R method related to the apply answers is

Itun[!unlist(vapply(Itun, anyNA, logical(1)))]
  v1
1  1
2  1
3  2
4  1
5  2
6  1

Here, vapply is used as we are operating on a list, and, apply, it does not coerce the object into a matrix. Also, since we know that the output will be logical vector of length 1, we can feed this to vapply and potentially get a little speed boost. For the same reason, I used anyNA instead of any(is.na()).

Ablebodied answered 3/2, 2017 at 19:30 Comment(0)
H
2

Another alternative with the dplyr package would be to make use of the Filter function

Filter(function(x) !any(is.na(x)), Itun)

with data.table would be a little more cumbersome

setDT(Itun)[,.SD,.SDcols=setdiff((1:ncol(Itun)),
                                which(colSums(is.na(Itun))>0))]
Hendecahedron answered 15/7, 2019 at 15:44 Comment(0)
S
0

You can also try:

df <- df[,colSums(is.na(df))<nrow(df)]
Sakovich answered 17/8, 2022 at 4:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.