Remove columns from dataframe where some of values are NA

Asked 17/9, 2012 at 7:4 Answered 17/8, 2022 at 4:10

I have a dataframe where some of the values are NA. I would like to remove these columns.

My data.frame looks like this

    v1   v2 
1    1   NA 
2    1    1 
3    2    2 
4    1    1 
5    2    2 
6    1   NA

I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work.

data=subset(Itun, select=c(is.na(colMeans(Itun))))

I got an error,

error : 'x' must be an array of at least two dimensions

Can anyone give me some help?

Pluviometer answered 17/9, 2012 at 7:4 Comment(1)

Please add an example of what you would like to have as a result. It would also be really helpful to have a fully reproducible example. – Ahead 17/9, 2012 at 7:43

The data:

Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA))

This will remove all columns containing at least one NA:

Itun[ , colSums(is.na(Itun)) == 0]

An alternative way is to use apply:

Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]

Clayson answered 17/9, 2012 at 7:25 Comment(8)

This will remove rows with NAs, not columns. – Melonymelos 17/9, 2012 at 7:30

@Backlin, but to Sven's benefit, the whole question is really poorly worded and it's not clear what exactly the OP wants to do. Drop the columns? Convert something to zero? – Ekg 17/9, 2012 at 7:32

True. But he never says anything about rows and uses subset(..., select=...) so I figured he wants to extract all rows for certain columns. – Melonymelos 17/9, 2012 at 7:36

@SvenHohenstein: Sorry for my poorly organized words. I would like to extract columns without NAs from a dataframe. – Pluviometer 18/9, 2012 at 0:48

doesn't this return a logical array, without subsetting the data? – Granulite 23/8, 2017 at 13:36

should it be Itun[ , colSums(is.na(Itun)) == 0, with = FALSE]? – Granulite 23/8, 2017 at 13:40

@Granulite Itun is a data.frame, not a data.table. – Clayson 23/8, 2017 at 13:42

@SvenHohenstein sorry I thought I had read data.table in the answer. My mistake – Granulite 23/8, 2017 at 13:55

Here's a convenient way to do it using the dplyr function select_if(). Combine not (!), any() and is.na(), which is equivalent to selecting all columns that don't contain any NA values.

library(dplyr)
Itun %>%
    select_if(~ !any(is.na(.)))

Barfly answered 27/10, 2017 at 16:48 Comment(5)

I was wondering if you can extract the column names of the removed columns simultaneously. Is this possible? – Detonation 12/12, 2017 at 17:10

I'd split that into two operations. Use Itun %>% select_if(~ any(is.na(.))) %>% names(). Then remove columns in second operation using code above. – Barfly 13/12, 2017 at 19:55

great solution. for the cases that collumns should be remove that only have NAs you can use select_if(~ !all(is.na(.)) – Pepi 23/2, 2018 at 9:29

This solution is very nice but very slow. Itun[ , colSums(is.na(Itun)) == 0] by @Sven-hohenstein is much faster. – Su 13/8, 2020 at 8:27

What does it return though if I wanted to have the columns that have NA/NULL ? When I ran the opposite (i.e. without !), it returned a bunch of columns that didn't have NAs; the column that had NA was returned along with them, though. – Victoria 15/11, 2021 at 16:59

Alternatively, select(where(~FUNCTION)) can be used:

library(dplyr)

(df <- data.frame(x = letters[1:5], y = NA, z = c(1:4, NA)))
#>   x  y  z
#> 1 a NA  1
#> 2 b NA  2
#> 3 c NA  3
#> 4 d NA  4
#> 5 e NA NA

# Remove columns where all values are NA
df %>% 
  select(where(~!all(is.na(.))))
#>   x  z
#> 1 a  1
#> 2 b  2
#> 3 c  3
#> 4 d  4
#> 5 e NA
  
# Remove columns with at least one NA  
df %>% 
  select(where(~!any(is.na(.))))
#>   x
#> 1 a
#> 2 b
#> 3 c
#> 4 d
#> 5 e

Hack answered 4/9, 2020 at 17:28 Comment(0)

You can use transpose twice:

newdf <- t(na.omit(t(df)))

Inelastic answered 1/4, 2016 at 19:13 Comment(0)

data[,!apply(is.na(data), 2, any)]

Melonymelos answered 17/9, 2012 at 7:27 Comment(2)

Shouldn't the data.frame version be the same as the matrix version, just without the first comma? I get an error (undefined columns selected) with your code as it is. – Ekg 17/9, 2012 at 7:44

However, apply converts the input to a matrix prior to applying the function, so I prefer to use sapply or lapply on data frames. Then again so does is.na so in this case the input is already a matrix and my first example was actually incorrect! Perhaps the conceptually nices solution is sapply(data, function(x) !any(is.na(x))), but this is really nitpicking. – Melonymelos 17/9, 2012 at 8:5

A base R method related to the apply answers is

Itun[!unlist(vapply(Itun, anyNA, logical(1)))]
  v1
1  1
2  1
3  2
4  1
5  2
6  1

Here, vapply is used as we are operating on a list, and, apply, it does not coerce the object into a matrix. Also, since we know that the output will be logical vector of length 1, we can feed this to vapply and potentially get a little speed boost. For the same reason, I used anyNA instead of any(is.na()).

Ablebodied answered 3/2, 2017 at 19:30 Comment(0)

Another alternative with the dplyr package would be to make use of the Filter function

Filter(function(x) !any(is.na(x)), Itun)

with data.table would be a little more cumbersome

setDT(Itun)[,.SD,.SDcols=setdiff((1:ncol(Itun)),
                                which(colSums(is.na(Itun))>0))]

Hendecahedron answered 15/7, 2019 at 15:44 Comment(0)

You can also try:

df <- df[,colSums(is.na(df))<nrow(df)]

Sakovich answered 17/8, 2022 at 4:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags