here's some dummy data:
user_id date category
27 2016-01-01 apple
27 2016-01-03 apple
27 2016-01-05 pear
27 2016-01-07 plum
27 2016-01-10 apple
27 2016-01-14 pear
27 2016-01-16 plum
11 2016-01-01 apple
11 2016-01-03 pear
11 2016-01-05 pear
11 2016-01-07 pear
11 2016-01-10 apple
11 2016-01-14 apple
11 2016-01-16 apple
I'd like to calculate for each user_id
the number of distinct categories
in the specified time period (e.g. in the past 7, 14 days), including the current order
The solution would look like this:
user_id date category distinct_7 distinct_14
27 2016-01-01 apple 1 1
27 2016-01-03 apple 1 1
27 2016-01-05 pear 2 2
27 2016-01-07 plum 3 3
27 2016-01-10 apple 3 3
27 2016-01-14 pear 3 3
27 2016-01-16 plum 3 3
11 2016-01-01 apple 1 1
11 2016-01-03 pear 2 2
11 2016-01-05 pear 2 2
11 2016-01-07 pear 2 2
11 2016-01-10 apple 2 2
11 2016-01-14 apple 2 2
11 2016-01-16 apple 1 2
I posted similar questions here or here, however none of it referred to counting cumulative unique values for the specified time period. Thanks a lot for your help!
0
? – Limassoldistinct_7
are correct? If I look at 2016-01-10, should it start as a new group. Also, if you look at the value ofdistinct_7
foruser_id
11, it starts at 0. – Limassoldistinct_7
, between2016-01-10
and2016-01-03
there were in total 3 categories foruser 27
and 2 foruser 11
. Does it make sense now? – Imaginabledate
, i.e.library(tidyverse); df %>% group_by(user_id) %>% mutate(distinct_7 = map_int(date, ~n_distinct(category[date >= .x - 7 & date <= .x])), distinct_14 = map_int(date, ~n_distinct(category[date >= .x - 14 & date <= .x])))
, though I think there's probably a more elegant way to do this withzoo::rollapply
or the like. – Wily11 2016-01-01
not be1
for both output cases? – Kindlyorder_value_minus7 = map_int(order_date, ~cumsum(order_value[order_date >= .x - 7 & order_date <= .x]))
, but it returnsError: object 'order_value' not found
. Why can't the same approach work universally, or can it? Thanks again for your help – Imaginablen_distinct
(in alistaire's comment-answer) isuniqueN
. – Bossuetorder_value
in what you show above. I'm not posting an answer because the data doesn't match your desired result, I believe because the desired result is miscalculated, if I understand what you want. Also, I'm pretty sure there's a better way to do it, anyway. – Wilyorder_value
because your solution looked pretty universal and I thought it would require only a small tweak to use it for numerical variable. Anyway, why do you think that the desired solution is miscalculated? – Imaginable