R - emulate the default behavior of hist() with ggplot2 for bin width
Asked Answered
W

2

12

I'm trying to plot an histogram for one variable with ggplot2. Unfortunately, the default binwidth of ggplot2 leaves something to be desired:

default ggplot2 output

I've tried to play with binwidth, but I am unable to get rid of that ugly "empty" bin:

ggplot2 output with tweaked binwidth

Amusingly (to me), the default hist() function of R seems to produce a much better "segmentation" of the bins:

default output of hist

Since I'm doing all my other graphs with ggplot2, I'd like to use it for this one as well - for consistency. How can I produce the same bin "segmentation" of the hist() function with ggplot2?

I tried to input hist at the terminal, but I only got

function (x, ...) 
UseMethod("hist")
<bytecode: 0x2f44940>
<environment: namespace:graphics>

which bears no information for my problem.

I am producing my histograms in ggplot2 with the following code:

ggplot(mydata, aes(x=myvariable)) + geom_histogram(color="darkgray",fill="white", binwidth=61378) + scale_x_continuous("My variable") + scale_y_continuous("Subjects",breaks=c(0,2.5,5,7.5,10,12.5),limits=c(0,12.5)) + theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

One thing I should add is that looking at the histogram produced byhist(), it would seem that the bins have a width of 50000 (e.g. from 1400000 to 1600000 there are exactly two bins); setting binwidth to 50000 in ggplot2 does not produce the same graph. The graph produced by ggplot2 has the same gap.

Waxwing answered 5/8, 2014 at 18:52 Comment(1)
I really wish that geom_histogram would have the same default behavior as the native hist.Impeccant
I
16

Without sample data, it's always difficult to get reproducible results, so i've created a sample dataset

set.seed(16)
mydata <- data.frame(myvariable=rnorm(500, 1500000, 10000))

#base histogram
hist(mydata$myvariable)

As you've learned, hist() is a generic function. If you want to see the different implementations you can type methods(hist). Most of the time you'll be running hist.default. So if be borrow the break finding logic from that funciton, we come up with

brx <- pretty(range(mydata$myvariable), 
    n = nclass.Sturges(mydata$myvariable),min.n = 1)

which is how hist() by default calculates the breaks. We can then use these breaks with the ggplot command

ggplot(mydata, aes(x=myvariable)) + 
    geom_histogram(color="darkgray",fill="white", breaks=brx) + 
    scale_x_continuous("My variable") + 
    theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

and the plot below shows the two results side-by-side and as you can see they are quite similar.

enter image description here

Also, that empty bim was probably caused by your y-axis limits. If a shape goes outside the limits of the range you specify in scale_y_continuous, it will simply get dropped from the plot. It looks like that bin wanted to be 14 tall, but you clipped y at 12.5.

Igal answered 5/8, 2014 at 19:35 Comment(1)
How can one abstract this into a geom_base_histogram()? I.e. how to extract the x variable inside such a function?Fourdimensional
C
0

My solution is similar to the one pointed out by @MrFlick.

You can define a function that will generate the width of the bins. For instance, if we use the number of classes used by the Sturges method (default of hist) the function looks as follows:

bins_sturges <- function(x) diff(range(x)) / nclass.Sturges(x)

Using the same data as in the previous examples, we have:

set.seed(16)
mydata <- data.frame(myvariable = rnorm(500, 1500000, 10000))

ggplot(my_data) +
   geom_histogram(aes(x = myvariable), 
                  color = "darkgray",
                  fill = "white", 
                  binwidth = bins_sturges)

And the result is enter image description here

I like this solution better because we do not have to redefine the breaks separately for every variable we want to create a histogram. Also, it works well with facet_wrap and facet_grid (unlike the previous solution).

Cook answered 14/6, 2023 at 13:57 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.