I want to group a data.table based on a column's range value, how can I do this with the dplyr library?
For example, my data table is like below:
library(data.table)
library(dplyr)
DT <- data.table(A=1:100, B=runif(100), Amount=runif(100, 0, 100))
Now I want to group DT into 20 groups at 0.05 interval of column B, and count how many rows are in each group. e.g., any rows with a column B value in the range of [0, 0.05) will form a group; any rows with the column B value in the range of [0.05, 0.1) will form another group, and so on. Is there an efficient way of doing this group function?
Thank you very much.
-----------------------------More question on akrun's answer. Thanks akrun for your answer. I got a new question about the "cut" function. If my DT is like below:
DT <- data.table(A=1:10, B=c(0.01, 0.04, 0.06, 0.09, 0.1, 0.13, 0.14, 0.15, 0.17, 0.71))
by using the following code:
DT %>%
group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05), right=F) ) %>%
summarise(n= n()) %>%
arrange(as.numeric(gr))
I expect to see results like this:
gr n
1 [0,0.05) 2
2 [0.05,0.1) 2
3 [0.1,0.15) 3
4 [0.15,0.2) 2
5 [0.7,0.75) 1
but the result I got is like this:
gr n
1 [0,0.05) 2
2 [0.05,0.1) 2
3 [0.1,0.15) 4
4 [0.15,0.2) 1
5 [0.7,0.75) 1
Looks like the value 0.15 is not correctly allocated. Any thoughts on this?
DT[,.N ,.(gr=cut(B, breaks=seq(0, max(B), by=0.05)))]
– Kathaset.seed
when producing random example data, so that we're all looking at the same data. – Elan