Generate rows with incrementing dates based on just a starting date in Redshift
Asked Answered
M

1

0

I'm dealing with a table of user subscriptions info, where each row is a specific user along with the start date of their subscription and how many months they have pre-paid. I'm trying to break this table out so that there's one row per month. I'm on Redshift, and the only other answers I've found suggest the generate_series which doesn't always work on Redshift.

Starting data:

userid  |  amount_paid  |  start_date  |  months
------------------------------------------------
asdf    |  20.00        | 2020-01-01   |  1
------------------------------------------------
qwer    |  10.00        | 2021-06-01   |  3

Desired results (months column value doesn't matter but I'd like amount_paid to be 0 or null for new rows):

userid  |  amount_paid  |  start_date  |  months
------------------------------------------------
asdf    |  20.00        | 2020-01-01   |  1
------------------------------------------------
qwer    |  10.00        | 2021-06-01   |  3
------------------------------------------------
qwer    |  0            | 2021-07-01   |  3
------------------------------------------------
qwer    |  0            | 2021-08-01   |  3
Maintopmast answered 9/11, 2021 at 6:13 Comment(3)
generate_series() is a Leader-only function, which means that it can't be used in a query that references any tables. You could obtain similar behaviour by joining to a table that has a series of dates, or even just a series of numbers.Hesperian
I'm not able to create such a table, don't have DBT or similar tools set up unfortunately. Have to run a single query in my BI tool that accomplishes what I put in the questionMaintopmast
You could cheat by creating a CTE table with just the numbers 1 to something big (whatever your max duration is), and then you could join to that table with something that does start_date + interval '1 month' * number_from_cte. It would effectively generate one row for each joined value, and you would use a WHERE to limit the number of rows joined.Hesperian
A
0

On redshift, as you have seen, generate_series is not supported as a means to make data for use against your table data. A simple replacement is a recursive CTE to generate the numbers you are looking for.

with recursive numbers(n) as
( select 1 as n
    union all
    select n + 1
    from numbers n
    where n.n <= 500
    )
select n from numbers;

The above produces the numbers between 1 and 500.

If your tables are large and the performance of the resulting query matters significantly you may want to think about the distribution of this set of data as it can impact the query plan. You can create a numbers table on Redshift with DISTSTYLE ALL so that the overall query plan can be better optimized especially when performing a cross join.

Agrology answered 9/11, 2021 at 16:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.