REDSHIFT: How can I generate a series of numbers without creating a table called "numbers" in redshift (Postgres 8.0.2)?
Asked Answered
S

4

4

I need to create an empty time table series for a report so I can left join activity from several tables to it. Every hour of the day does not necessarily have data, but I want it to show null or zero for inactivity instead of omitting that hour of the day.

In later versions of Postgres (post 8.0.2), this is easy in several ways:

SELECT unnest(array[0,1,2,3,4...]) as numbers

OR

CROSS JOIN (select generate_series as hours from generate_series(now()::timestamp, now()::timestamp + interval '1 day', '1 hour'::interval )) date_series

Redshift can run some of these commands, but throws an error when you attempt to run it in conjunction with any of the tables.

WHAT I NEED:

A reliable way to generate a series of numbers (e.g. 0-23) as a subquery that will run on redshift (uses postgres 8.0.2).

Supersensible answered 29/7, 2016 at 20:46 Comment(6)
https://mcmap.net/q/394340/-generate_series-method-fails-in-redshiftSuprasegmental
Because Redshift, is not versioned and thus the older version can not ever be used by anyone I can't see why this question should stay open. I'm voting to close. Redshift now supports this.Fortyish
Evan Carroll, Redshift does not support cross join into subquery with generate series. It shows error "Specified types or functions (one per INFO message) not supported on Redshift tables.;"Spanker
@EvanCarroll is the support for this documented somewhere? I still see generate_series listed as "unsupported" on the redshift docs. We migrated to bigquery so I can't test it myself.Supersensible
Does this answer your question? sequence number generation function in AWS redshiftGayn
no @SteveChambers the question is very similar, but less specific. The existing workarounds and proposed solutions are all painful to use, since none of them are reliable or versatile. I haven't used redshift in a year or two, so it's possible they've added functionality to address this.Supersensible
C
13

As long as you have a table that has more rows than your required series has numbers, this is what has worked for me in the past:

select
    (row_number() over (order by 1)) - 1 as hour
from
    large_table
limit 24
;

Which returns numbers 0-23.

Coracle answered 1/8, 2016 at 18:37 Comment(3)
This is a decent solution, and if I use something like pg_catalog.pg_operator, I don't have to worry about schema changes ruining the query.Supersensible
ugly but effectiveHuckaby
And cross join it again to get more values (pg_operator only had 647 objects).Sungkiang
D
5

Recursion was released for Redshift in April 2021. Now that recursion is possible in Redshift. You can generate series of numbers (or even table) with below code

with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
Dougherty answered 26/8, 2021 at 23:13 Comment(0)
I
2

Unfortunately, Amazon Redshift does not allow use of generate_series() for table functions. The workaround seems to be creating a table of numbers.

See also:

Incisive answered 30/7, 2016 at 6:5 Comment(2)
Creating a table of numbers irks me in a very terrible way. It feels like writing the numbers out one by one in a python script. It's just wrong.Supersensible
You could create a date table, with columns such as DayOfWeek, PublicHoliday, Month, Quarter, etc. This is a common practice to extract useful information from dates. Then, you could re-use this table as a Numbers table without feeling so irksome.Incisive
B
1

I'm not a big fan of querying a system table just to get a list of row numbers. If it's something constant and small enough like hours of a day, I would go with plain old UNION ALL:

WITH 
  hours_in_day AS (
    SELECT 0 AS hour
    UNION ALL SELECT 1
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    ...
    UNION ALL SELECT 23
  )

And then joinhours_in_day to whatever you want to.

Broadleaf answered 28/6, 2017 at 19:43 Comment(2)
I've actually had better luck querying a very small table and selecting row_number() over (). Redshift doesn't play nice with repeated UNION ALL sub queries, and even for something as small as hours of the day, we've seen better performance with row_number.Supersensible
This answer also describes how this can be adapted with CROSS JOINs to generate far more numbers.Gayn

© 2022 - 2024 — McMap. All rights reserved.