NA's are being plotted in boxplot ggplot2
Asked Answered
A

3

20

I'm trying to plot a v. simple boxplot in ggplot2. I have species richness vs. landuse class. However, I have 2 NA's in my data. For some strange reason, they're being plotted, even when they're being understood as NA's by R. Any suggestion to remove them?

The code I'm using is:

ggplot(data, aes(x=luse, y=rich))+
  geom_boxplot(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge", outlier.colour = "red", outlier.shape = 16, outlier.size = 2, notch = F, notchwidth = 0.5)+
  scale_x_discrete("luse", drop=T)+
  geom_smooth(method="loess",aes(group=1))

However, the graph includes 2 NA's for luse. Unfortunately I cannot post images, but imagine that a NA bar is being added to my graph.

Amateurism answered 17/6, 2013 at 11:17 Comment(3)
ggplot(na.omit(data), aes(x=luse, y=rich)) + ...Designer
For a more general case: if the data contain variables other than the two being plotted, na.omit(data) will remove observations with missings on any variable. This can have unintended consequences for your graphs and/or analysis. One could use data=na.omit(data[,c("var1","var2",...)]), where var1, var2, ... are the variables you require for your graph.Goshorn
+1 for @Maxim.K, I ran into this exact problem with a large data frame in which one of the variables had an extremely high proportion of NA values. I couldn't quite workout the syntax to just get rid of the NA in my variable of interest. But note, if you are only interested in one variable, like I was, the code above returns a vector, you must select at least 2 columns in the data.frame to make it work as it is written.Optimist
S
10

You may try to use the subset() function in the first line of your code

ggplot(data=subset(data, !is.na(luse)), aes(x=luse, y=rich))+

as suggested in: Eliminating NAs from a ggplot

Scrubland answered 27/11, 2016 at 12:57 Comment(0)
F
3

Here is a formal answer using the comments above to incorporate !is.na() with filter() from tidyverse/dplyr. If you have a basic tidyverse operation such as filtering NAs, you can do it right in the ggplot call, as suggested, to avoid making a new data frame:

ggplot(data %>% filter(!is.na(luse)), aes(x = luse, y = rich)) + geom_boxplot()

Farcical answered 15/5, 2019 at 16:38 Comment(3)
How is this different from previous answers?Documentation
It is not using is.na() == FALSE or subsetting which are less directFarcical
How is subset is less direct? Also you're using pipe which has nothing to do with this question.Documentation
B
0

You can also use the filter() function in dplyr/tidyverse:

data %>% filter(is.na(luse) == FALSE) %>% 
   ggplot(aes(x=luse, y=rich)) +
   geom_boxplot()

This way you don't have to create a new object.

Biweekly answered 28/2, 2018 at 1:25 Comment(2)
did you maybe mean ! is.na() instead ? Or do you want all the NA's? ;) Also, you do not necessarily need to specify is.na (x) == TRUE , because it evaluates to a logical vector anyways which will then be used by filter() .... P.S. welcome to SOFNorling
Oh, yep. Typo, sorry. Thanks for catching that. Also, cool. I did not know you could just cast is.na() directly.Biweekly

© 2022 - 2024 — McMap. All rights reserved.