The ggplot2
package has 3 functions that work well for these tasks:
cut_number()
: Makes n groups with (approximately) equal numbers of observation
cut_interval()
: Makes n groups with equal range
cut_width
: Makes groups of width width
My go-to is cut_number()
because this uses evenly spaced quantiles for binning observations. Here's an example with skewed data.
library(tidyverse)
skewed_tbl <- tibble(
counts = c(1:100, 1:50, 1:20, rep(1:10, 3),
rep(1:5, 5), rep(1:2, 10), rep(1, 20))
) %>%
mutate(
counts_cut_number = cut_number(counts, n = 4),
counts_cut_interval = cut_interval(counts, n = 4),
counts_cut_width = cut_width(counts, width = 25)
)
# Data
skewed_tbl
#> # A tibble: 265 x 4
#> counts counts_cut_number counts_cut_interval counts_cut_width
#> <dbl> <fct> <fct> <fct>
#> 1 1 [1,3] [1,25.8] [-12.5,12.5]
#> 2 2 [1,3] [1,25.8] [-12.5,12.5]
#> 3 3 [1,3] [1,25.8] [-12.5,12.5]
#> 4 4 (3,13] [1,25.8] [-12.5,12.5]
#> 5 5 (3,13] [1,25.8] [-12.5,12.5]
#> 6 6 (3,13] [1,25.8] [-12.5,12.5]
#> 7 7 (3,13] [1,25.8] [-12.5,12.5]
#> 8 8 (3,13] [1,25.8] [-12.5,12.5]
#> 9 9 (3,13] [1,25.8] [-12.5,12.5]
#> 10 10 (3,13] [1,25.8] [-12.5,12.5]
#> # ... with 255 more rows
summary(skewed_tbl$counts)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.00 3.00 13.00 25.75 42.00 100.00
# Histogram showing skew
skewed_tbl %>%
ggplot(aes(counts)) +
geom_histogram(bins = 30)
# cut_number() evenly distributes observations into bins by quantile
skewed_tbl %>%
ggplot(aes(counts_cut_number)) +
geom_bar()
# cut_interval() evenly splits the interval across the range
skewed_tbl %>%
ggplot(aes(counts_cut_interval)) +
geom_bar()
# cut_width() uses the width = 25 to create bins that are 25 in width
skewed_tbl %>%
ggplot(aes(counts_cut_width)) +
geom_bar()
Created on 2018-11-01 by the reprex package (v0.2.1)
dplyr
) I'd saymutate( df , a = cut( a , breaks = quantile( a , probs = seq( 0 , 1 , 0.2 ) ) ) )
... – Cyrie