Redshift doesn't support DISTINCT
aggregates in its window functions. AWS documentation for COUNT
states this, and distinct
isn't supported for any of the window functions.
My use case: count customers over varying time intervals and traffic channels
I desire monthly and YTD unique customer counts for the current year, and also split by traffic channel as well as total for all channels. Since a customer can visit more than once I need to count only distinct customers, and therefore the Redshift window aggregates won't help.
- I can count distinct customers using
count(distinct customer_id)...group by
, but this will give me only a single result of the four needed. - I don't want to get into the habit of running a full query for each desired count piled up between a bunch of
union all
. I hope this is not the only solution.
This is what I would write in postgres (or Oracle for that matter):
select order_month
, traffic_channel
, count(distinct customer_id) over(partition by order_month, traffic_channel) as customers_by_channel_and_month
, count(distinct customer_id) over(partition by traffic_channel) as ytd_customers_by_channel
, count(distinct customer_id) over(partition by order_month) as monthly_customers_all_channels
, count(distinct customer_id) over() as ytd_total_customers
from orders_traffic_channels
/* otc is a table of dated transactions of customers, channels, and month of order */
where to_char(order_month, 'YYYY') = '2017'
How can I solve this in Redshift?
The result needs to work on a redshift cluster, furthermore this is a simplified problem and the actual desired result has product category and customer type, which multiplies the number of partitions needed. Therefore a stack of union all
rollups is not a nice solution.