ggplot barplot : How to display small positive numbers with log scaled y-axis

> print(df) type mean sd se snp V7 outer 1.596946e-07 2.967432e-06 1.009740e-08 A V8 outer 7.472417e-07 6.598652e-06 2.245349e-08 B V9 outer 1.352327e-07 2.515771e-06 8.560512e-09 C V10 outer 2.307726e-07 3.235821e-06 1.101065e-08 D V11 outer 4.598375e-06 1.653457e-05 5.626284e-08 E V12 outer 5.963164e-07 5.372226e-06 1.828028e-08 F V71 middle 2.035414e-07 3.246161e-06 1.104584e-08 A V81 middle 9.000131e-07 7.261463e-06 2.470886e-08 B V91 middle 1.647716e-07 2.875840e-06 9.785733e-09 C V101 middle 3.290817e-07 3.886779e-06 1.322569e-08 D V111 middle 6.371170e-06 1.986268e-05 6.758752e-08 E V121 middle 8.312429e-07 6.329386e-06 2.153725e-08 F

ggplot(data=df, aes(x=snp,y=mean,fill=type))+ geom_bar(stat="identity",position=position_dodge(),width=0.5) + geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45))

ggplot(data=df, aes(x=snp,y=mean,fill=type))+ geom_bar(stat="identity",position=position_dodge(),width=0.5) + scale_y_log10() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se),width=.3, position=position_dodge(.45))

Here's a bit of hacking to show what happens if you try to get bars that start at zero on a log scale. I've used geom_segment for illustration, so that I can create "bars" (wide line segments, actually) extending over arbitrary ranges. To make this work, I've also had to do all the dodging manually, which is why the x mapping looks weird.

In the example below, the scale goes from y=1e-20 to y=1. The y-axis intervals are log scaled, meaning that the physical distance from, say 1e-20 to 1e-19 is the same as the physical distance from, say, 1e-8 to 1e-7, even though the magnitudes of those intervals differ by a factor of one trillion.

Bars that go down to zero can't be displayed, because zero on the log scale is an infinite distance below the bottom of the graph. We could get closer to zero by, for example, changing 1e-20 to 1e-100 in the code below. But that will just make the already-small physical distances between the data values even smaller and thus even harder to distinguish.

The bars are also misleading in another way, because, as @hrbrmstr pointed out, our brains treat distance along the bar linearly, but the magnitude represented by each increment of distance along the bar changes by a factor of 10 about every few millimeters in the example below. The bars simply aren't encoding meaningful information about the data.

ggplot(data=df, aes(x=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5), 
                    y=mean, colour=type)) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.3) +
  geom_segment(aes(xend=as.numeric(snp) + 0.3*(as.numeric(type) - 1.5),
                   y=1e-20, yend=mean), size=5) +
  scale_y_log10(limits=c(1e-20, 1), breaks=10^(-100:0), expand=c(0,0)) +
  scale_x_continuous(breaks=1:6, labels=LETTERS[1:6])

If you want to stick with a log scale, maybe plotting points would be a better approach:

pd = position=position_dodge(.5)
ggplot(data=df, aes(x=snp,y=mean,fill=type))+
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se, colour=type), width=.3, position=pd) +
  geom_point(aes(colour=type), position=pd) +
  scale_y_log10(limits=c(1e-7, 1e-5), breaks=10^(-10:0)) +
  annotation_logticks(sides="l")

Recommended topics

Hot tags