Hollow histogram or binning for geom_step
Asked Answered
A

7

19

I would like to draw a hollow histogram that has no vertical bars drawn inside of it, but just an outline. I couldn't find any way to do it with geom_histogram. The geom_step+stat_bin combination seemed like it could do the job. However, the bins of geom_step+stat_bin are shifted by a half bin either to the right or to the left, depending on the step's direction= parameter value. It seems like it is doing its "steps" WRT bin centers. Is there any way to change this behavior so it would do the "steps" at bin edges?

Here's an illustration:

d <- data.frame(x=rnorm(1000))
qplot(x, data=d, geom="histogram",
      breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
geom_step(stat="bin", breaks=seq(-4,4,by=.5), color="black", direction="vh")

enter image description here

Aboard answered 15/5, 2014 at 17:59 Comment(1)
there now is direction = "mid" which does just that (see my answer below)Heartburning
A
12

I propose making a new Geom like so:

library(ggplot2)
library(proto)

geom_stephist <- function(mapping = NULL, data = NULL, stat="bin", position="identity", ...) {
  GeomStepHist$new(mapping=mapping, data=data, stat=stat, position=position, ...)
}

GeomStepHist <- proto(ggplot2:::Geom, {
  objname <- "stephist"

  default_stat <- function(.) StatBin
  default_aes <- function(.) aes(colour="black", size=0.5, linetype=1, alpha = NA)

  reparameterise <- function(., df, params) {
    transform(df,
              ymin = pmin(y, 0), ymax = pmax(y, 0),
              xmin = x - width / 2, xmax = x + width / 2, width = NULL
    )
  }

  draw <- function(., data, scales, coordinates, ...) {
    data <- as.data.frame(data)[order(data$x), ]

    n <- nrow(data)
    i <- rep(1:n, each=2)
    newdata <- rbind(
      transform(data[1, ], x=xmin, y=0),
      transform(data[i, ], x=c(rbind(data$xmin, data$xmax))),
      transform(data[n, ], x=xmax, y=0)
    )
    rownames(newdata) <- NULL

    GeomPath$draw(newdata, scales, coordinates, ...)
  }
  guide_geom <- function(.) "path"
})

This also works for non-uniform breaks. To illustrate the usage:

d <- data.frame(x=runif(1000, -5, 5))
ggplot(d, aes(x)) +
  geom_histogram(breaks=seq(-4,4,by=.5), color="red", fill=NA) +
  geom_stephist(breaks=seq(-4,4,by=.5), color="black")

plot

Amazed answered 15/5, 2014 at 21:29 Comment(7)
That's a nice seamless hack! It even allows the usual simple faceting and default binning. But the most natural solution would probably be to add a parameter to geom_histogram for disabling inner vertical bars.Aboard
@VadimKhotilovich The parameter option is difficult, I think, because geom_histogram is built about stat_bin and geom_bar and geom_bar isn't really set up to selectively include/exclude only portions of its vertical edges.Cordeelia
@joran: such technical difficulties cannot overturn the fact that "a histogram is not a bar chart" (it's a quote straight from "The Grammar of Graphics" book). Generally speaking, histograms represent distributions and bar charts are for comparing categories. While ggplot2 implements a histogram as a trivial alias over bar+bin, it doesn't have to stay that way. And I would add that a histogram is not a step chart either.Aboard
@VadimKhotilovich There's no need to lecture me, I'm well aware of all that. I was simply explaining why such a change might be more work than is feasible given limited developer time, that's all.Cordeelia
@joran: thanks for clarifying. It's sometimes hard to guess people's intentions from small posts... If I would ever have time to dig deeper into the ggplot2 source and proto, I would contribute to improving the histogram. Some things in it were bugging me for a while.Aboard
@VadimKhotilovich No problem. In fact, I should apologize, I wrote that comment while under the cloud of some extremely irritating things going on offline and let that influence me too much.Cordeelia
I used to rely on geom_stephist very much but it doesn't work anymore with ggproto of ggplot2's v2 (aka ggplot2_2.0.0). It would be really helpful if someone could use this as an example to illustrate creating new gems in ggplot2_2.0.0 Thanks!Wickman
C
11

This isn't ideal, but it's the best I can come up with:

h <- hist(d$x,breaks=seq(-4,4,by=.5))
d1 <- data.frame(x = h$breaks,y = c(h$counts,NA))

ggplot() + 
    geom_histogram(data = d,aes(x = x),breaks = seq(-4,4,by=.5),
                                 color = "red",fill = "transparent") + 
    geom_step(data = d1,aes(x = x,y = y),stat = "identity")

enter image description here

Cordeelia answered 15/5, 2014 at 18:37 Comment(1)
@Henrik I like all three of these solutions, frankly.Cordeelia
B
11

Yet another one. Use ggplot_build to build a plot object of the histogram for rendering. From this object x and y values are extracted, to be used for geom_step. Use by to offset x values.

by <- 0.5
p1 <- ggplot(data = d, aes(x = x)) +
  geom_histogram(breaks = seq(from = -4, to = 4, by = by),
                 color = "red", fill = "transparent")

df <- ggplot_build(p1)$data[[1]][ , c("x", "y")]

p1 +
  geom_step(data = df, aes(x = x - by/2, y = y))

enter image description here

Edit following comment from @Vadim Khotilovich (Thanks!)

The xmin from the plot object can be used instead (-> no need for offset adjustment)

df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y")]

p1 +
  geom_step(data = df, aes(x = xmin, y = y))   
Blocky answered 15/5, 2014 at 19:0 Comment(2)
Thanks for pointing me to ggplot_build. It provides lots of potentially useful data! In this particular case though, I would subset it by [ , c("xmin", "y")] to get the lower edges directly.Aboard
You are welcome. Yes, when you run out of 'normal' ggplot options, it can be quite fruitful to walk the ggplot_build path. You can also manipulate the data within the plot object and then plot it using grid functions.Blocky
W
7

An alternative, also less than ideal:

qplot(x, data=d, geom="histogram", breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
  stat_summary(aes(x=round(x * 2 - .5) / 2, y=1), fun.y=length, geom="step")

Missing some bins that you can probably add back if you mess around a bit. Only (somewhat meaningless) advantage is it is more in ggplot than @Joran's answer, though even that is debatable.

enter image description here

Waddle answered 15/5, 2014 at 18:46 Comment(0)
W
4

I answer my own comment earlier today: here is a modified version of @RosenMatev's answer updated for the v2 (ggplot2_2.0.0) using ggproto:

GeomStepHist <- ggproto("GeomStepHist", GeomPath,
                        required_aes = c("x"),

                        draw_panel = function(data, panel_scales, coord, direction) {
                          data <- as.data.frame(data)[order(data$x), ]

                          n <- nrow(data)
                          i <- rep(1:n, each=2)
                          newdata <- rbind(
                            transform(data[1, ], x=x - width/2, y=0),
                            transform(data[i, ], x=c(rbind(data$x-data$width/2, data$x+data$width/2))),
                            transform(data[n, ], x=x + width/2, y=0)
                          )
                          rownames(newdata) <- NULL

                          GeomPath$draw_panel(newdata, panel_scales, coord)
                        }
)


geom_step_hist <- function(mapping = NULL, data = NULL, stat = "bin",
                           direction = "hv", position = "stack", na.rm = FALSE, 
                           show.legend = NA, inherit.aes = TRUE, ...) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomStepHist,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      direction = direction,
      na.rm = na.rm,
      ...
    )
  )
}
Wickman answered 18/1, 2016 at 16:16 Comment(0)
H
3

TLDR: use geom_step(..., direction = "mid")

This has become much easier since Daniel Mastropietro and Dewey Dunnington implemented the "mid" as an additional option for the direction argument of geom_step for ggplot2 v3.3.0:

library(ggplot2)

set.seed(1)
d <- data.frame(x = rnorm(1000))
ggplot(d, aes(x)) + 
  geom_histogram(breaks = seq(-4, 4, by=.5), color="red", fill = "transparent") +
  geom_step(stat="bin", breaks=seq(-4, 4, by=.5), color = "black", direction = "mid")

Below, for reference, the code from the question formatted like above answer:

ggplot(d, aes(x)) + 
  geom_histogram(breaks = seq(-4, 4, by=.5), color = "red", fill = "transparent") +
  geom_step(stat="bin", breaks = seq(-4, 4, by=.5), color = "black", direction = "vh")

Created on 2020-09-02 by the reprex package (v0.3.0)

Heartburning answered 2/9, 2020 at 16:51 Comment(0)
C
0

a simple way to do something similar to @Rosen Matev (that does not work with ggplot2_2.0.0 as mentioned by @julou), I would just 1) calculate manually the value of the bins (using a small function as shown below) 2) use geom_step() Hope this helps !

geom_step_hist<- function(d,binw){
  dd=NULL
  bin=min(d$y) # this enables having a first value that is = 0 (to have the left vertical bar of the plot when using geom_step)
  max=max(d$y)+binw*2 # this enables having a last value that is = 0 (to have the right vertical bar of the plot when using geom_step)
  xx=NULL
  yy=NULL
  while(bin<=max){
    n=length(temp$y[which(temp$y<bin & temp$y>=(bin-binw))])
    yy=c(yy,n)
    xx=c(xx,bin-binw)
    bin=bin+binw
    rm(n)
  }
  dd=data.frame(xx,yy)
  return(dd)
}
hist=ggplot(dd,aes(x=xx,y=yy))+
geom_step()
Conspiracy answered 25/10, 2016 at 13:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.