We use geom_segment
to create the "bars" for the histogram and also to create the rug plots. Adjust the size
parameter to change the "bar" widths in the histogram. In the example below, the bar heights are equal to the percentage of values within a given x range. If you want to change the absolute heights of the bars, just multiply n/sum(n)
by a scaling factor when you create the h
data frame of histogram counts.
To generate histogram counts for the plot, we pre-summarize the data to create the histogram values. Note the ifelse
statement in the mutate
function, which adjusts the values of pct
in order to get the upward and downward bars in the plot, depending on whether y
is 0 or 1, respectively. You can do this in the plot code itself, but then you need two separate calls to geom_segment
.
library(dplyr)
# Fake data
set.seed(1926)
dat = data.frame(y = sample(0:1, 1000, replace=TRUE))
dat$x1 = rnorm(1000, 5, 2) * (dat$y+1)
# Summarise data to create histogram counts
h = dat %>% group_by(y) %>%
mutate(breaks = cut(x1, breaks=seq(-2,20,0.5), labels=seq(-1.75,20,0.5),
include.lowest=TRUE),
breaks = as.numeric(as.character(breaks))) %>%
group_by(y, breaks) %>%
summarise(n = n()) %>%
mutate(pct = ifelse(y==0, n/sum(n), 1 - n/sum(n)))
ggplot() +
geom_segment(data=h, size=4, show.legend=FALSE,
aes(x=breaks, xend=breaks, y=y, yend=pct, colour=factor(y))) +
geom_segment(dat=dat[dat$y==0,], aes(x=x1, xend=x1, y=0, yend=-0.02), size=0.2, colour="grey30") +
geom_segment(dat=dat[dat$y==1,], aes(x=x1, xend=x1, y=1, yend=1.02), size=0.2, colour="grey30") +
geom_line(data=data.frame(x=seq(-2,20,0.1),
y=predict(glm(y ~ x1, family="binomial", data=dat),
newdata=data.frame(x1=seq(-2,20,0.1)),
type="response")),
aes(x,y), colour="grey50", lwd=1) +
scale_y_continuous(limits=c(-0.02,1.02)) +
scale_x_continuous(limits=c(-1,20)) +
theme_bw(base_size=12)
geom_segment
forgeom_rug(dat=dat[dat$y==0,], sides = "b")
, and the third forgeom_rug(dat=dat[dat$y==1,], sides = "t")
– Oospore