ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph
Asked Answered
U

1

9

I am trying to plot combined graphs for logistic regressions as the function logi.hist.plot but I would like to do it using ggplot2 (aesthetic reasons).

The problem is that only one of the histograms should have the scale_y_reverse().

Is there any way to specify this in a single plot (see code below) or to overlap the two histograms by using coordinates that can be passed to the previous plot?

ggplot(dat) + 
    geom_point(aes(x=ind, y=dep)) + 
    stat_smooth(aes(x=ind, y=dep), method=glm, method.args=list(family="binomial"), se=FALSE) + 
    geom_histogram(data=dat[dat$dep==0,], aes(x=ind)) +
    geom_histogram(data=dat[dat$dep==1,], aes(x=ind)) ## + scale_y_reverse()

This final plot is what I have been trying to achieve:

derisable outcome

Unilobed answered 12/2, 2016 at 15:38 Comment(0)
I
15

We use geom_segment to create the "bars" for the histogram and also to create the rug plots. Adjust the size parameter to change the "bar" widths in the histogram. In the example below, the bar heights are equal to the percentage of values within a given x range. If you want to change the absolute heights of the bars, just multiply n/sum(n) by a scaling factor when you create the h data frame of histogram counts.

To generate histogram counts for the plot, we pre-summarize the data to create the histogram values. Note the ifelse statement in the mutate function, which adjusts the values of pct in order to get the upward and downward bars in the plot, depending on whether y is 0 or 1, respectively. You can do this in the plot code itself, but then you need two separate calls to geom_segment.

library(dplyr)

# Fake data
set.seed(1926)
dat = data.frame(y = sample(0:1, 1000, replace=TRUE))
dat$x1 = rnorm(1000, 5, 2) * (dat$y+1)

# Summarise data to create histogram counts
h = dat %>% group_by(y) %>%
  mutate(breaks = cut(x1, breaks=seq(-2,20,0.5), labels=seq(-1.75,20,0.5), 
                      include.lowest=TRUE),
         breaks = as.numeric(as.character(breaks))) %>%
  group_by(y, breaks) %>% 
  summarise(n = n()) %>%
  mutate(pct = ifelse(y==0, n/sum(n), 1 - n/sum(n))) 

ggplot() +
  geom_segment(data=h, size=4, show.legend=FALSE,
               aes(x=breaks, xend=breaks, y=y, yend=pct, colour=factor(y))) +
  geom_segment(dat=dat[dat$y==0,], aes(x=x1, xend=x1, y=0, yend=-0.02), size=0.2, colour="grey30") +
  geom_segment(dat=dat[dat$y==1,], aes(x=x1, xend=x1, y=1, yend=1.02), size=0.2, colour="grey30") +
  geom_line(data=data.frame(x=seq(-2,20,0.1), 
                            y=predict(glm(y ~ x1, family="binomial", data=dat), 
                                      newdata=data.frame(x1=seq(-2,20,0.1)),
                                      type="response")), 
            aes(x,y), colour="grey50", lwd=1) +
  scale_y_continuous(limits=c(-0.02,1.02)) +
  scale_x_continuous(limits=c(-1,20)) +
  theme_bw(base_size=12)

enter image description here

Intreat answered 12/2, 2016 at 16:21 Comment(1)
Can also swap out the second geom_segment for geom_rug(dat=dat[dat$y==0,], sides = "b"), and the third for geom_rug(dat=dat[dat$y==1,], sides = "t")Oospore

© 2022 - 2024 — McMap. All rights reserved.