confidence interval error bars for ggplot
Asked Answered
L

2

5

I want to put confidence interval error bars for ggplot.

I have a dataset and I am plotting it with ggplot as:

df <- data.frame(
        Sample=c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"), 
        Weight=c(10.5, NA, 4.9, 7.8, 6.9))

p <- ggplot(data=df, aes(x=Sample, y=Weight)) + 
geom_bar(stat="identity", fill="black") + 
scale_y_continuous(expand = c(0,0), limits = c(0, 8)) + 
theme_classic() + 
theme(axis.text.x = element_text(angle = 45, hjust = 1)

p

I am new to adding error bars. I looked at some options using geom_bar but I could not make it work.

I will appreciate any help to put confidence interval error bars in the barplot. Thank you!

Lieberman answered 1/10, 2019 at 18:38 Comment(6)
You only have one observation per sampleSeek
How are you meant to estimate the error or confidence interval you want to plot? You need to make a statistical modeling assumption in order to produce an interval. If you just ask for my age, there is just one true value; there's not a "good" way to give error bars for my age.Maryammaryann
Actually, each weight observation is an average of eight observations.Lieberman
Do you have the original Weight values? If so, compute the mean and the standard error of each set of 8 values and then you can calculate an interval (mean +/- (2 * se) for a 95% interval for example)Navarro
can you show the raw data? do you want standard errors?Words
although the given comments and answers provide solid solutions to your problem, allow me to suggest an entirely different way to visualise your data . If you have only eight measurements, summary statistics may be somewhat error-prone. Why not showing box plots, or even the actual values, e.g. with geom_point - this will give you a much better idea of the actual measurements. Bar graphs are very misleading in this case and are actually better used for count statistics.Heilungkiang
J
7

Add a layer of error bars with geom_errorbar

df <- data.frame(
  Sample=c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"), 
  Average.Weight=c(10.5, NA, 4.9, 7.8, 6.9),
  # arbitrarily make up some Standard Errors for each mean:
  SE = c(1, NA, .3, .25, .2)) # JUST MAKING THINGS UP HERE
 

Now you have a data frame with a column of Average Weight and the SE for each sample in your study. Use ggplot to plot:

ggplot(data = na.omit(df)) + #don't bother plotting the NA
  geom_bar(stat = "identity", aes(x = Sample,y = Average.Weight)) +
  geom_errorbar(
    aes(x=Sample, 
        ymin = Average.Weight - 1.96*SE, 
        ymax = Average.Weight + 1.96*SE), 
    color = "red"
  )

enter image description here

Jennifer answered 1/10, 2019 at 19:13 Comment(7)
You might want to clarify that df$SE = 1/1:nrow(df) just creates some place holder values for the standard errors.Gwyngwyneth
Hello, I want to add significant differences at alpha 0.05 above the bars. It is a data frame object and I did the same thing with ANOVA using: lsmeans(df, pairwise~Weight, adjust='Tukey'). But I am not sure how can I put the differences (asterisk) in this dataset. Thank you!Lieberman
Looks like theres a similar question asked here: #17085066 are some packages called ggsignif and ggpubr that might be helpfulJennifer
Hello @Dij, I used your code for adding error bars but I am getting really small error bars that are hard to believe. I am not sure what is going on. Thank you!Lieberman
@Lieberman What are your standard errors and how did you compute them? It looks like your means range from about 5 to 10. I really can't tell you how accurate your confidence intervals are unless you share some data. In your question you only provided means, not the distribution that produced those means, so we have no way of knowing what the SE is of any mean estimate.Jennifer
@Dij, I have calculated the standard error using: SE <-function(df) sqrt(var(df)/length(df)). Then I just added the column as you suggested earlier SE = 1/1:nrow(df).Lieberman
Oh, I'm sorry for the confusion. I randomly generated SE, one for each mean in your data frame, merely by arbitrarily dividing 1 by the row, in order to ensure that each row (i.e., each mean in the column of means) had a corresponding SE. This is not part of the necessary code to plot error bars, or calculate standard error. Unfortunately, your SE function is incorrect, however, because the variance should be estimated by dividing the sum of squared deviations by n - 1. I will edit my answer to make this more clear.Jennifer
G
0

It is possible to generate error bars automatically with a tool called superb (summary plot with error bars).

First, lets have more than one point per sample using random data

df2 <- data.frame(
  Sample = rep(c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"),5),
  Weight = rep(c(10.5, NA, 4.9, 7.8, 6.9),5)+rnorm(25)
)

Then load the library and ask for the plot (default will be to show 95% confidence intervals):

library(superb)
superb( Weight ~ Sample, df2)

Mean plot of the 5 samples

You can ask for standard error (SE) rather than the default confidence intervals (CI) and add any additional formatting options, for example:

p <- superb( Weight ~ Sample, df2,
  errorbar = "SE" # or CI for confidence intervals
) + 
scale_y_continuous(expand = c(0,0), limits = c(0, 12)) + 
theme_classic() + 
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p

Mean plot of the 5 samples with standard error and formating

Note that I am the creator of superb.

Gravitation answered 19/10 at 22:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.