Getting observations corresponding to each quartile
Asked Answered
C

3

5
q <- quantile(faithful$eruptions)
> q
     0%     25%     50%     75%    100% 
1.60000 2.16275 4.00000 4.45425 5.10000 

I get the following result, the dataset is provided in R.

 head(faithful)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55

I want a dataframe containing the data and an additional column for pointing out the quantile to which each observations belong. For example the final dataset should look like

     eruptions waiting Quartile
1     3.600      79      Q1
2     1.800      54      Q2
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55

How can this be done?

Coranto answered 25/2, 2014 at 9:33 Comment(0)
O
9

Something along the lines of this? Use values from quantile function as values to cut the desired vector.

faithful$kva <- cut(faithful$eruptions, q)
levels(faithful$kva) <- c("Q1", "Q2", "Q3", "Q4")
faithful

    eruptions waiting  kva
1       3.600      79   Q2
2       1.800      54   Q1
3       3.333      74   Q2
4       2.283      62   Q2
5       4.533      85   Q4
Occupational answered 25/2, 2014 at 9:36 Comment(0)
D
3

The cut function has the option to create numeric labels for each quantile right away:

faithful$Quartile <- cut(faithful$eruptions,
                         quantile(faithful$eruptions),
                         labels = FALSE)

This will create an NA for the smallest eruption, if you want to assign the lowest eruption to the first quantile, you can add include.lowest = TRUE when calling the cut function:

faithful$Quartile <- cut(faithful$eruptions,
                         quantile(faithful$eruptions),
                         labels = FALSE,
                         include.lowest = T)
Deflect answered 8/7, 2018 at 11:55 Comment(0)
S
2

This can now be done more conveniently via a dplyr pipe and ggplot2::cut_number().

library(dplyr)
library(ggplot2)

faithful %>% 
   mutate(Quartile = cut_number(eruptions, n = 4, labels = c("Q1", "Q2", "Q3", "Q4")))

The lowest observation is included by default unlike base R cut().

Success answered 5/9, 2019 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.