I often find myself trying to create a categorical variable from a numerical variable + a user-provided set of ranges.
For instance, say that I have a data.frame with a numeric variable df$V
and would like to create a new variable df$VCAT
such that:
df$VCAT
= 0 ifdf$V
is equal to 0df$VCAT
= 1 ifdf$V
is between 0 to 10 (i.e. (0,10))df$VCAT
= 2 isdf$V
is equal to 10 (i.e. [10,10])df$VCAT
= 3 isdf$V
is between 10 to 20 (i.e. (10,20))df$VCAT
= 4 isdf$V
is greater or equal to than 20 (i.e. [20,Inf])
I am currently doing this by hard coding the "scoring function" myself by doing something like:
df = data.frame(V = seq(1,100))
df = df %>% mutate(VCAT = (V>0) + (V==10) + 2*(V>10) + (V>=20))
I am wondering if there is an easier hacky way to do this in R, preferably usingdplyr
(so that I can chain commands). Ideally, I am looking for a short function that can be used in mutate
that will take in the variable V
and a vector describing the ranges such as buckets
.
Note that buckets
may not be described in the best way here since it is not clear to me how it would allow users to customize the endpoints of the ranges.
cut()
? Check out?cut
or perhaps evenHmisc::cut2()
. – Orvietobuckets
and return a data frame that looks like the result of the above? Or do you want a function that takes a vector andbuckets
that can be passed tomutate
? – Perineummutate
. – Laccolithcut
orcut2
but they seem to do the trick. That said, I'm sure how to deal with points (e.g. an interval like [0,0]), and whether it can be incorporated withmutate.
– LaccolithfindInterval
seems to be a better suggestion here thanks to @Henrik EDIT: you can also pass it to mutate so I would suggest it solves your problem. – Perineumcar::recode
could be best here if you specify your "equal to" criteria first:recode(df$V, "0=0; 100=5; 50=3; 0:10=1; 10:50=2; 50:100=4")
-- note: possible duplicate question -- see this post. – Orvietodf$VCAT2 <- cut(df$V, c(0,9.999,10,20,Inf), labels=F)
works just fine. @BerkU – Lamm