Eliminating NAs from a ggplot
Asked Answered
R

7

48

Very basic question here as I'm just starting to use R, but I'm trying to create a bar plot of factor counts in ggplot2 and when plotting, get 14 little colored blips representing my actual levels and then a massive grey bar at the end representing the 5000-ish NAs in the sample (it's survey data from a question that only applies to about 5% of the sample). I've tried the following code to no avail:

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

The addition of the na.rm argument here has no apparent effect.

meanwhile

ggplot(data = na.omit(MyData),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

gives me

"Error: Aesthetics must either be length one, or the same length as the data"

as does affixing the na.omit() to the_variable, or both MyData and the_variable.

All I want to do is eliminate the giant NA bar from my graph, can someone please help me do this?

Reclinate answered 20/6, 2013 at 14:29 Comment(3)
It's really impossible to help without having your data. You need to provide a small example that we can actually run, so we are able to look at your actual data structure.Bordy
Without seeing your data, you may be able to subset down to just the non-NA values for plotting purposes. Ie MyData.sub <- MyData[!is.na(MyData)], then just plot the subset. I often do something similar to remove zeros.Domicile
Would it work to just refactor your fill variable? fill = factor(the_variable)Casia
S
60

You can use the function subset inside ggplot2. Try this

library(ggplot2)

data("iris")
iris$Sepal.Length[5:10] <- NA # create some NAs for this example

ggplot(data=subset(iris, !is.na(Sepal.Length)), aes(x=Sepal.Length)) + 
geom_bar(stat="bin")
Spaulding answered 21/4, 2016 at 19:37 Comment(5)
Unfortunately, iris has no NAs .)Womera
Ha! That's a nice way to treat the comment)) I guess, for almost any case there is a well suited dataset from the R built-in onesWomera
@Womera Thanks for that table. A hasNAs column would have been very helpful though :)Dorcus
@mad If you are creating a plot with two columns, make sure to remove the NA value in both of them. Example : subset(iris, !is.na(Sepal.Length & Sepal.Width))Spaulding
That's a great way to deal with NAs within the ggplot(). Thanks @SpauldingFoggia
W
32

Just an update to the answer of @rafa.pereira. Since ggplot2 is part of tidyverse, it makes sense to use the convenient tidyverse functions to get rid of NAs.

library(tidyverse)
airquality %>% 
        drop_na(Ozone) %>%
        ggplot(aes(x = Ozone))+
        geom_bar(stat="bin")

Note that you can also use drop_na() without columns specification; then all the rows with NAs in any column will be removed.

Womera answered 1/12, 2017 at 21:41 Comment(1)
I like this approach because it addresses the problem before it ever manifests into an actual problem; simply remove the NA values from the onset and you needn't worry about them any more.Warfore
R
27

Additionally, adding na.rm= TRUE to your geom_bar() will work.

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin", na.rm = TRUE)

I ran into this issue with a loop in a time series and this fixed it. The missing data is removed and the results are otherwise uneffected.

Radiator answered 15/6, 2018 at 17:28 Comment(1)
This suggestion does not work as intended ... the geom_bar help suggests it does, but it does not and apparently that is what the developers intended.Martyrdom
F
12

Not sure if you have solved the problem. For this issue, you can use the "filter" function in the dplyr package. The idea is to filter the observations/rows whose values of the variable of your interest is not NA. Next, you make the graph with these filtered observations. You can find my codes below, and note that all the name of the data frame and variable is copied from the prompt of your question. Also, I assume you know the pipe operators.

library(tidyverse) 

MyDate %>%
   filter(!is.na(the_variable)) %>%
     ggplot(aes(x= the_variable, fill=the_variable)) + 
        geom_bar(stat="bin") 

You should be able to remove the annoying NAs on your plot. Hope this works :)

Foretopmast answered 11/1, 2018 at 2:45 Comment(0)
B
12

Try remove_missing instead with vars = the_variable. It is very important that you set the vars argument, otherwise remove_missing will remove all rows that contain an NA in any column!! Setting na.rm = TRUE will suppress the warning message.

ggplot(data = remove_missing(MyData, na.rm = TRUE, vars = the_variable),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
       geom_bar(stat="bin") 
Bothy answered 28/8, 2018 at 16:38 Comment(1)
I think the vars argument needs to be a character vector, e.g. vars = "the_variable". See help file: "vars Character vector of variables to check for missings in"Campbellite
C
0

From my point of view this error "Error: Aesthetics must either be length one, or the same length as the data" refers to the argument aes(x,y) I tried the na.omit() and worked just fine to me.

Cotta answered 29/5, 2017 at 18:52 Comment(0)
L
0

Another option is using the function complete.cases like this:

library(ggplot2)
# With NA
ggplot(airquality, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 37 rows containing non-finite values (stat_bin).

# Remove NA using complete.cases
airquality_complete=airquality[complete.cases(airquality), ]
ggplot(airquality_complete, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2022-08-25 with reprex v2.0.2

Laicize answered 25/8, 2022 at 9:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.