Plot hline at mean with geom_bar and stat="identity"
Asked Answered
R

1

5

I have a barplot where the exact bar heights are in the dataframe.

df <- data.frame(x=LETTERS[1:6], y=c(1:6, 1:6 + 1), g=rep(x = c("a", "b"), each=6))

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge")

enter image description here

Now I want to add two hlines displaying the mean of all bars per group. All I get with

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge") +
  stat_summary(fun.y=mean, aes(yintercept=..y.., group=g), geom="hline")

is

enter image description here

As I want to do this for a arbitrary number of groups as well, I would appreciate a solution with ggplot only.

I want to avoid a solution like this, because it does not rely purely on the dataset passed to ggplot, has redundant code and is not flexible in the number of groups:

ggplot(df, aes(x=x, y=y, fill=g, group=g)) + 
  geom_bar(stat="identity", position="dodge") +
  geom_hline(yintercept=mean(df$y[df$g=="a"]), col="red") +
  geom_hline(yintercept=mean(df$y[df$g=="b"]), col="green")

Thanks in advance!

Edits:

  • added dataset
  • comment on resulting code
  • changed the data and plots to clarify the question
Reduplicate answered 14/9, 2018 at 10:43 Comment(3)
any reproducible dataset?Autocade
I am not sure what we are aiming for... you want a single command for that ggplot()+geom_bar()? or what?Middlesworth
sry, forgot to add the dataset. I would like to have a solution like ggplot() + geom_bar() + stat_summary(geom="hline")Reduplicate
T
7

If I understand your question correctly, your first approach is almost there:

ggplot(df, aes(x = x, y = y, fill = g, group = g)) + 
  geom_col(position="dodge") + # geom_col is equivalent to geom_bar(stat = "identity")
  stat_summary(fun.y = mean, aes(x = 1, yintercept = ..y.., group = g), geom = "hline")

plot

According to the help file for stat_summary:

stat_summary operates on unique x; ...

In this case, stat_summary has inherited the top level aesthetic mappings of x = x and group = g by default, so it would calculate the mean y value at each x for each value of g, resulting in a lot of horizontal lines. Adding x = 1 to stat_summary's mapping overrides x = x (while retaining group = g), so we get a single mean y value for each value of g instead.

Thermy answered 14/9, 2018 at 16:8 Comment(4)
Hey! Just to chime in, when doing this with with a datetime variable on the x-axis, setting x=1 or x=NULL or x=lubridate::today() all result in Error: Invalid input: time_trans works with objects of class POSIXct only. Any ideas?Montagu
replace x=1 with x=as.Posixct("2020-01-01")Mercaptopurine
or better even: replace x=1 with mean(.data[[x]],na.rm=TRUE), so that the reference point falls within your dataMercaptopurine
Why are dots added to ..y.. in stat_summary?Coatee

© 2022 - 2024 — McMap. All rights reserved.