confidence interval error bars for ggplot

Asked 1/10, 2019 at 18:38 Answered 19/10, 2024 at 22:15

I want to put confidence interval error bars for ggplot.

I have a dataset and I am plotting it with ggplot as:

df <- data.frame(
        Sample=c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"), 
        Weight=c(10.5, NA, 4.9, 7.8, 6.9))

p <- ggplot(data=df, aes(x=Sample, y=Weight)) + 
geom_bar(stat="identity", fill="black") + 
scale_y_continuous(expand = c(0,0), limits = c(0, 8)) + 
theme_classic() + 
theme(axis.text.x = element_text(angle = 45, hjust = 1)

p

I am new to adding error bars. I looked at some options using geom_bar but I could not make it work.

I will appreciate any help to put confidence interval error bars in the barplot. Thank you!

Lieberman answered 1/10, 2019 at 18:38 Comment(6)

You only have one observation per sample – Seek 1/10, 2019 at 18:42

How are you meant to estimate the error or confidence interval you want to plot? You need to make a statistical modeling assumption in order to produce an interval. If you just ask for my age, there is just one true value; there's not a "good" way to give error bars for my age. – Maryammaryann 1/10, 2019 at 18:45

Actually, each weight observation is an average of eight observations. – Lieberman 1/10, 2019 at 18:52

Do you have the original Weight values? If so, compute the mean and the standard error of each set of 8 values and then you can calculate an interval (mean +/- (2 * se) for a 95% interval for example) – Navarro 1/10, 2019 at 19:9

can you show the raw data? do you want standard errors? – Words 1/10, 2019 at 19:11

although the given comments and answers provide solid solutions to your problem, allow me to suggest an entirely different way to visualise your data . If you have only eight measurements, summary statistics may be somewhat error-prone. Why not showing box plots, or even the actual values, e.g. with geom_point - this will give you a much better idea of the actual measurements. Bar graphs are very misleading in this case and are actually better used for count statistics. – Heilungkiang 1/10, 2019 at 21:24

Add a layer of error bars with geom_errorbar

df <- data.frame(
  Sample=c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"), 
  Average.Weight=c(10.5, NA, 4.9, 7.8, 6.9),
  # arbitrarily make up some Standard Errors for each mean:
  SE = c(1, NA, .3, .25, .2)) # JUST MAKING THINGS UP HERE

Now you have a data frame with a column of Average Weight and the SE for each sample in your study. Use ggplot to plot:

ggplot(data = na.omit(df)) + #don't bother plotting the NA
  geom_bar(stat = "identity", aes(x = Sample,y = Average.Weight)) +
  geom_errorbar(
    aes(x=Sample, 
        ymin = Average.Weight - 1.96*SE, 
        ymax = Average.Weight + 1.96*SE), 
    color = "red"
  )

Jennifer answered 1/10, 2019 at 19:13 Comment(7)

You might want to clarify that df$SE = 1/1:nrow(df) just creates some place holder values for the standard errors. – Gwyngwyneth 1/10, 2019 at 19:22

Hello, I want to add significant differences at alpha 0.05 above the bars. It is a data frame object and I did the same thing with ANOVA using: lsmeans(df, pairwise~Weight, adjust='Tukey'). But I am not sure how can I put the differences (asterisk) in this dataset. Thank you! – Lieberman 25/10, 2019 at 14:11

Looks like theres a similar question asked here: #17085066 are some packages called ggsignif and ggpubr that might be helpful – Jennifer 25/10, 2019 at 16:29

Hello @Dij, I used your code for adding error bars but I am getting really small error bars that are hard to believe. I am not sure what is going on. Thank you! – Lieberman 31/10, 2019 at 19:21

@Lieberman What are your standard errors and how did you compute them? It looks like your means range from about 5 to 10. I really can't tell you how accurate your confidence intervals are unless you share some data. In your question you only provided means, not the distribution that produced those means, so we have no way of knowing what the SE is of any mean estimate. – Jennifer 31/10, 2019 at 19:28

@Dij, I have calculated the standard error using: SE <-function(df) sqrt(var(df)/length(df)). Then I just added the column as you suggested earlier SE = 1/1:nrow(df). – Lieberman 1/11, 2019 at 18:27

Oh, I'm sorry for the confusion. I randomly generated SE, one for each mean in your data frame, merely by arbitrarily dividing 1 by the row, in order to ensure that each row (i.e., each mean in the column of means) had a corresponding SE. This is not part of the necessary code to plot error bars, or calculate standard error. Unfortunately, your SE function is incorrect, however, because the variance should be estimated by dividing the sum of squared deviations by n - 1. I will edit my answer to make this more clear. – Jennifer 1/11, 2019 at 23:59

It is possible to generate error bars automatically with a tool called superb (summary plot with error bars).

First, lets have more than one point per sample using random data

df2 <- data.frame(
  Sample = rep(c("Sample1", "Sample2", "Sample3", "Sample4", "Sample5"),5),
  Weight = rep(c(10.5, NA, 4.9, 7.8, 6.9),5)+rnorm(25)
)

Then load the library and ask for the plot (default will be to show 95% confidence intervals):

library(superb)
superb( Weight ~ Sample, df2)

You can ask for standard error (SE) rather than the default confidence intervals (CI) and add any additional formatting options, for example:

p <- superb( Weight ~ Sample, df2,
  errorbar = "SE" # or CI for confidence intervals
) + 
scale_y_continuous(expand = c(0,0), limits = c(0, 12)) + 
theme_classic() + 
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p

Note that I am the creator of superb.

Gravitation answered 19/10, 2024 at 22:15 Comment(0)

Recommended topics

Hot tags