R: Remove multiple empty columns of character variables

Asked 16/7, 2013 at 9:21 Answered 30/12, 2019 at 10:51

I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is there any way to subset out the empty columns?

Footcloth answered 16/7, 2013 at 9:21 Comment(1)

Btw, by empty, do you mean NA or ""? – Antecedent 16/7, 2013 at 9:33

If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.

Sample data:

mydf <- data.frame(
  A = c("a", "b"),
  B = c("y", ""),
  C = c("", ""),
  D = c("", ""),
  E = c("", "z")
)
mydf
#   A B C D E
# 1 a y      
# 2 b       z

Identifying and removing the "empty" columns.

mydf[!sapply(mydf, function(x) all(x == ""))]
#   A B E
# 1 a y  
# 2 b   z

Alternatively, as recommended by @Roland:

> mydf[, colSums(mydf != "") != 0]
  A B E
1 a y  
2 b   z

Mastic answered 16/7, 2013 at 9:25 Comment(6)

I just recommend to extend the condition in case there are NA values instead of "": all(x == "" || is.na(x)). But it will loose a bit of the elegance though :) – Icy 16/7, 2013 at 9:36

mydf[,colSums(mydf!="")!=0] might be faster. – Complexion 16/7, 2013 at 9:43

@Tomas, my guess here was that some data got read in incorrectly, leading to empty character columns, and NA is different than a read error, but good suggestion anyway. – Mastic 16/7, 2013 at 9:45

Hi Ananda... Actually, the reading-in is correct. The external data is in the form of many blocks appended together. I have to parse out the individual blocks and throw out the irrelevant columns. – Footcloth 16/7, 2013 at 9:49

@AnandaMahto Maybe mydf[,colSums(mydf!="")>0] is more to your taste? – Complexion 16/7, 2013 at 9:51

@user702432, Cool. Glad it worked out for you. In the future, please do try to make a reproducible example, as I have done in this answer. – Mastic 16/7, 2013 at 9:52

If you're talking about columns where all values are NA, use remove_empty("cols") from the janitor package.

If you have character vectors where every value is the empty string "", you can first convert those values to NA throughout your data.frame with na_if from the dplyr package:

dat <- data.frame(
  x = c("a", "b", "c"),
  y = c("", "", ""),
  z = c(NA, NA, NA),
  stringsAsFactors = FALSE
)

dat
#>   x y  z
#> 1 a   NA
#> 2 b   NA
#> 3 c   NA

library(dplyr)
library(janitor)

dat %>%
  mutate_all(funs(na_if(., ""))) %>%
  remove_empty("cols")
#>   x
#> 1 a
#> 2 b
#> 3 c

Allx answered 27/1, 2017 at 13:23 Comment(2)

Neat solution! But depreciated. Package says: This function is deprecated, use remove_empty("cols") instead. – Celebrity 7/8, 2018 at 6:7

@radek good catch, thanks. I updated the answer accordingly. – Allx 8/8, 2018 at 19:3

You can do either of the following:

emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]

or:

emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]

If by empty you mean they are "", the second approach can be adapted like so:

emptycols <- colSums(df == "") == nrow(df)

Antecedent answered 16/7, 2013 at 9:26 Comment(3)

This is throwing an error. I think is.na() works only on numeric variables. – Footcloth 16/7, 2013 at 9:29

user702432, the error thrown is not because of what you're saying, is.na will work just fine on character vectors as well. It's because he's passing an anonymous function to another. @asb, I think you can't do that. you'll have to do sapply(df, function(x) all(is.na(x))). – Lan 16/7, 2013 at 9:30

Oversight. Just fixed it. – Antecedent 16/7, 2013 at 9:31

I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.

The selected answer: https://mcmap.net/q/496473/-r-remove-multiple-empty-columns-of-character-variables didn't work for me, but this did:

df[!sapply(df, function (x) all(is.na(x) | x == ""))]

Myrica answered 12/5, 2016 at 5:19 Comment(0)

It depends what you mean by empty: Is it NA or "", or can it even be " "? Something like this might work:

df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]

Ineradicable answered 16/7, 2013 at 9:35 Comment(0)

This can also be done by dplyr and select_if

`select_if(df,function(x){any(!is.na(x))})`

or with is.null() or x=="" depending on how empty values are defined in your data.

Buckram answered 11/10, 2019 at 16:49 Comment(0)

A simple solution using the purrr package:

purrr::discard(my_data_frame, ~all(is.na(.)))

Hosbein answered 30/12, 2019 at 10:51 Comment(0)

Here is something that can be modified to exclude columns containing any variables specied.

newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" | 
x== "-4"} ) )]

Pterous answered 28/7, 2018 at 5:44 Comment(0)

-1

If you know the column indices, you can use

df[,-c(3, 5, 7)]

This will omit columns 3, 5, 7.

Icy answered 16/7, 2013 at 9:23 Comment(1)

Unfortunately I don't. There are 400+ variables, and the empty columns are all randomly placed. – Footcloth 16/7, 2013 at 9:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags