I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is there any way to subset out the empty columns?
If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.
Sample data:
mydf <- data.frame(
A = c("a", "b"),
B = c("y", ""),
C = c("", ""),
D = c("", ""),
E = c("", "z")
)
mydf
# A B C D E
# 1 a y
# 2 b z
Identifying and removing the "empty" columns.
mydf[!sapply(mydf, function(x) all(x == ""))]
# A B E
# 1 a y
# 2 b z
Alternatively, as recommended by @Roland:
> mydf[, colSums(mydf != "") != 0]
A B E
1 a y
2 b z
all(x == "" || is.na(x))
. But it will loose a bit of the elegance though :) –
Icy mydf[,colSums(mydf!="")!=0]
might be faster. –
Complexion NA
is different than a read error, but good suggestion anyway. –
Mastic mydf[,colSums(mydf!="")>0]
is more to your taste? –
Complexion If you're talking about columns where all values are NA
, use remove_empty("cols")
from the janitor package.
If you have character vectors where every value is the empty string ""
, you can first convert those values to NA
throughout your data.frame with na_if
from the dplyr package:
dat <- data.frame(
x = c("a", "b", "c"),
y = c("", "", ""),
z = c(NA, NA, NA),
stringsAsFactors = FALSE
)
dat
#> x y z
#> 1 a NA
#> 2 b NA
#> 3 c NA
library(dplyr)
library(janitor)
dat %>%
mutate_all(funs(na_if(., ""))) %>%
remove_empty("cols")
#> x
#> 1 a
#> 2 b
#> 3 c
remove_empty("cols")
instead. –
Celebrity You can do either of the following:
emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]
or:
emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]
If by empty you mean they are ""
, the second approach can be adapted like so:
emptycols <- colSums(df == "") == nrow(df)
is.na
will work just fine on character vectors as well. It's because he's passing an anonymous function to another. @asb, I think you can't do that. you'll have to do sapply(df, function(x) all(is.na(x)))
. –
Lan I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.
The selected answer: https://mcmap.net/q/496473/-r-remove-multiple-empty-columns-of-character-variables didn't work for me, but this did:
df[!sapply(df, function (x) all(is.na(x) | x == ""))]
It depends what you mean by empty: Is it NA or ""
, or can it even be " "
? Something like this might work:
df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]
This can also be done by dplyr
and select_if
`select_if(df,function(x){any(!is.na(x))})`
or with is.null()
or x==""
depending on how empty values are defined in your data.
A simple solution using the purrr
package:
purrr::discard(my_data_frame, ~all(is.na(.)))
Here is something that can be modified to exclude columns containing any variables specied.
newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" |
x== "-4"} ) )]
If you know the column indices, you can use
df[,-c(3, 5, 7)]
This will omit columns 3, 5, 7.
© 2022 - 2024 — McMap. All rights reserved.
NA
or""
? – Antecedent