REDSHIFT: How can I generate a series of numbers without creating a table called "numbers" in redshift (Postgres 8.0.2)?

Asked 29/7, 2016 at 20:46 Answered 26/8, 2021 at 23:13

Solved amazon-web-services amazon-redshift generate-series postgresql-8.0

I need to create an empty time table series for a report so I can left join activity from several tables to it. Every hour of the day does not necessarily have data, but I want it to show null or zero for inactivity instead of omitting that hour of the day.

In later versions of Postgres (post 8.0.2), this is easy in several ways:

SELECT unnest(array[0,1,2,3,4...]) as numbers

CROSS JOIN (select generate_series as hours from generate_series(now()::timestamp, now()::timestamp + interval '1 day', '1 hour'::interval )) date_series

Redshift can run some of these commands, but throws an error when you attempt to run it in conjunction with any of the tables.

WHAT I NEED:

A reliable way to generate a series of numbers (e.g. 0-23) as a subquery that will run on redshift (uses postgres 8.0.2).

Supersensible answered 29/7, 2016 at 20:46 Comment(6)

https://mcmap.net/q/394340/-generate_series-method-fails-in-redshift – Suprasegmental 2/8, 2016 at 20:40

Because Redshift, is not versioned and thus the older version can not ever be used by anyone I can't see why this question should stay open. I'm voting to close. Redshift now supports this. – Fortyish 5/7, 2018 at 20:48

Evan Carroll, Redshift does not support cross join into subquery with generate series. It shows error "Specified types or functions (one per INFO message) not supported on Redshift tables.;" – Spanker 24/7, 2018 at 9:12

@EvanCarroll is the support for this documented somewhere? I still see generate_series listed as "unsupported" on the redshift docs. We migrated to bigquery so I can't test it myself. – Supersensible 20/7, 2020 at 23:21

Does this answer your question? sequence number generation function in AWS redshift – Gayn 2/3, 2021 at 9:1

no @SteveChambers the question is very similar, but less specific. The existing workarounds and proposed solutions are all painful to use, since none of them are reliable or versatile. I haven't used redshift in a year or two, so it's possible they've added functionality to address this. – Supersensible 15/3, 2021 at 16:32

As long as you have a table that has more rows than your required series has numbers, this is what has worked for me in the past:

select
    (row_number() over (order by 1)) - 1 as hour
from
    large_table
limit 24
;

Which returns numbers 0-23.

Coracle answered 1/8, 2016 at 18:37 Comment(3)

This is a decent solution, and if I use something like pg_catalog.pg_operator, I don't have to worry about schema changes ruining the query. – Supersensible 3/8, 2016 at 23:5

ugly but effective – Huckaby 30/7, 2018 at 20:6

And cross join it again to get more values (pg_operator only had 647 objects). – Sungkiang 27/9, 2019 at 14:53

Recursion was released for Redshift in April 2021. Now that recursion is possible in Redshift. You can generate series of numbers (or even table) with below code

with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)

Dougherty answered 26/8, 2021 at 23:13 Comment(0)

Unfortunately, Amazon Redshift does not allow use of generate_series() for table functions. The workaround seems to be creating a table of numbers.

See also:

Using sql function generate_series() in redshift
Generate Series in Redshift and MySQL, which does not seem correct but does introduce some interesting ideas

Incisive answered 30/7, 2016 at 6:5 Comment(2)

Creating a table of numbers irks me in a very terrible way. It feels like writing the numbers out one by one in a python script. It's just wrong. – Supersensible 3/8, 2016 at 23:9

You could create a date table, with columns such as DayOfWeek, PublicHoliday, Month, Quarter, etc. This is a common practice to extract useful information from dates. Then, you could re-use this table as a Numbers table without feeling so irksome. – Incisive 4/8, 2016 at 12:10

I'm not a big fan of querying a system table just to get a list of row numbers. If it's something constant and small enough like hours of a day, I would go with plain old UNION ALL:

WITH 
  hours_in_day AS (
    SELECT 0 AS hour
    UNION ALL SELECT 1
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    ...
    UNION ALL SELECT 23
  )

And then joinhours_in_day to whatever you want to.

Broadleaf answered 28/6, 2017 at 19:43 Comment(2)

I've actually had better luck querying a very small table and selecting row_number() over (). Redshift doesn't play nice with repeated UNION ALL sub queries, and even for something as small as hours of the day, we've seen better performance with row_number. – Supersensible 12/2, 2018 at 20:42

This answer also describes how this can be adapted with CROSS JOINs to generate far more numbers. – Gayn 2/3, 2021 at 9:2

Recommended topics

Hot tags