Using ggplot2, can I insert a break in the axis?
Asked Answered
A

10

94

I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:

df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))

p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar() 
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1))  + coord_flip()
p

enter image description here

Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)

[Edit 2019-05-06]:

8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:

library(ggplot2)
ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  coord_flip()
Albert answered 25/8, 2011 at 17:33 Comment(6)
i dont think you can introduce breaks in ggplot2. an alternative would be to use the log scale which would make the graph easier to read.Huberman
I realize it would make it easier to read on a log scale, but I don't want to show the information in this way, as there is significant variation amongst the small values which would be hidden when they are transformed.Albert
what about a combination of facet_wrap() with scales = "free_x"Equuleus
Could also approach this problem with a custom transformation... I'll write up an answer when I have a minuteFassett
Consider the following stackoverflow thread.Indecipherable
related #69534748Deepseated
M
55

As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.

Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:

enter image description here

Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.

And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.

Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.

Mabelmabelle answered 25/8, 2011 at 18:6 Comment(5)
Joran, I do have mixed feelings about using this type of plot. You are correct - a table is probably the best way of showing this.Albert
@celenius - I didn't mean to sound preachy or like I was scolding you. I just feel like tables don't get much love, and sometimes I get worked up about it. ;)Mabelmabelle
@joran, I agree that for this particular data, a table is the optimal representation (and what I would recommend if just given the question of how to represent it best). In my answer, I was was too narrowly focused on the exact question to think to answer the broader one that should have been asked.Dispersive
"broken axes are generally considered questionable" --- by who? I agree that tables are often useful, but sometimes you have too much data to display in a table, and plots are in any case a useful visual tool to understand your data. Suppose that you want to show a histogram with 100 bins, and that largest bin has twenty or fifty times the observations of the next-largest. A broken axis -- ideally combined with a broken bar -- is perfectly reasonable then, and a table would be suboptimal.Cognizant
@Cognizant I said "questionable" not "always, unequivocally bad in all cases". Broken axes are sort of like dual y axes or truncating your y axis to a narrow range: sometimes ok if you are very careful about it, but in general a very common source of bad or misleading graphics.Mabelmabelle
L
55

Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).

Zoom facet

library(ggforce)
ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  facet_zoom(ylim = c(0, 10))

enter image description here

Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.

The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:

library(ggforce)
ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))

enter image description here

Two plots

Hadley Wickham suggested

I think it's much more appropriate to show two plots - one of all the data, and one of just the small values.

This code creates two plots

library(ggplot2)
g1 <- ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  coord_flip()
g2 <- ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  coord_flip() +
  ylim(NA, 10)

which can be combined into one plot by

cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)

enter image description here

or

gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)

enter image description here

Two facets

This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use

faceted plots, one with all the data, one zoomed in a particular region

but no code was supplied for this approach, so far.

As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:

library(dplyr)
library(ggplot2)
ggplot() + 
  aes(x = b, y = a) +
  geom_col(data = df %>% mutate(subset = "all")) +
  geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
  coord_flip() + 
  facet_wrap(~ subset, scales = "free_x")

enter image description here

Liddie answered 6/5, 2019 at 7:52 Comment(3)
The error with facet_zoom(), geom_col(), and coord_flip() has been reported to github.com/thomasp85/ggforce/issues/143.Liddie
thanks, I really like the ggforce facet zoom function! Do you know if one can zoom into two ranges, one on each side of the plot? Or just have a vertical line with just the y-axis and one plot on either side with plots in two different ranges of values of y?Claytor
Thanks @Liddie - was re-reading answers to this question 11 years later and you've outlined useful and interesting suggestions. My personal preference is the multiple charts; while facet_zoom() is nifty, I fear it might be difficult to explain the chart to someone unfamiliar with it.Albert
D
28

No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).

Dispersive answered 25/8, 2011 at 17:50 Comment(0)
M
26

Not with ggplot, but with plotrix you can easily do that:

library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
Mesics answered 6/6, 2012 at 13:31 Comment(1)
This would work better: gap.barplot(y=df$a, gap=c(5,495),horiz=T, xaxlab=df$b, ylim=c(0,12), ytics=df$a, yaxlab=df$a)Fieldwork
T
19

An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:

df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))

library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() 
p2 <- ggplot(df) + 
  aes(x = b, y = a) +
  geom_col() +
  scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3)) 
  
p1 + p2

Created on 2022-08-22 with reprex v2.0.2

As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.

Arguments used:

  • breaks:

a numeric or numeric vector, the points to be divided

  • which:

integer, the position of subplots to scales, started from left to right or top to bottom.

  • scales:

numeric, relative width or height of subplots.

To change the space between the subplots, you can use the argument space.

For some extra information and examples check this tutorial.

Tetrahedron answered 22/8, 2022 at 16:25 Comment(2)
Wow, this package is very impressive, thanks!Immemorial
its a sensational news which can follow ggplot2 plots to generate truncated axis. the answer goes beyond chatgpt3 and 4.Fan
T
16

No, unfortunately not

The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.

For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.

Compare:

Example of good use of continuous vs discontinuous axis

Teasel answered 23/2, 2016 at 16:51 Comment(0)
P
4

As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.

library(ggplot2)
library(ggbreak) 

set.seed(2019-01-19)
d <- data.frame(x = 1:20,
   y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
 
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") + 
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)

enter image description here

Phemia answered 19/8, 2022 at 3:14 Comment(0)
M
2

A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:

library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2))  # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue)) 
  + geom_point() 
  + facet_grid(. ~ myFacet, scales="free", space="free")
  + scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
  + theme(strip.text.x = element_blank()) # get rid of the facet labels
)

enter image description here

Magistracy answered 10/12, 2020 at 21:56 Comment(3)
Is there a way to control the proportion of the left and the right panel?Empyreal
@MiaoCai - One way to control the proportions is with the "grid" package, per the nicely detailed example at https://mcmap.net/q/225358/-how-to-adjust-facet-size-manuallyMagistracy
this is a nice solution, but it assumes there are no data points within the skipped sectionInflux
N
1

I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.

This probably should only be used if you must have a graphic representation for some reason.

Nephew answered 30/9, 2011 at 23:39 Comment(0)
D
0

One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10

Derward answered 20/10, 2018 at 14:21 Comment(1)
A good option for some graphs, but it won't work well for bar graphs that start at 0.Fassett

© 2022 - 2024 — McMap. All rights reserved.