generate_series() method fails in Redshift
Asked Answered
A

8

13

When I run the SQL Query:

 select generate_series(0,g)
 from ( select date(date1) - date(date2) as g from mytable ;

It returns an error:

 INFO:  Function "generate_series(integer,integer)" not supported.
 ERROR:  Specified types or functions (one per INFO message) not supported 
 on Redshift tables.

But when I run this query:

select  generate_series(0, g) from (select 5 as g)

It returns the below response:

 generate_series
-----------------
 0
 1
 2
 3
 4
 5
(6 rows)

Why does the second query work, while the first fails?

Angevin answered 31/3, 2014 at 11:5 Comment(5)
Apparently the first sub-query returns an interval not an integer (because of: "generate_series(integer, interval) does not exist")Siriasis
see my edit in the error message.Angevin
You are not using PostgreSQL. You are using Amazon Redshift.Aldo
Because you report this to be working with Redshift, and redshift is no versioned and thus the older version can not ever be used by anyone I can't see why this question should stay open. I'm voting to close.Gower
silota.com/docs/recipes/…Kalimantan
A
12

The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide:

In the specific examples, the second query is executed entirely on the leader node as it does not need to scan any actual table data, while the first is trying to select data and as such would be executed on the compute node(s).

UPDATE:

generate_series is working with Redshift now.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

This will generate date for last 30 days

Angevin answered 1/4, 2014 at 6:41 Comment(2)
This does not solve the problem of the generate_series not being supported on compute nodes. If you try to join this select to a table or use it in a CTE, or create a temp table, the same ERROR will occur.Existence
It's 2023 and generate_series is still supported only on the master node, but not on compute nodes.Dari
D
7

You can use a window function to achieve a similar result. This requires an existing table (like stv_blocklist) to seed off that has at least the number of rows you need but not too many which might slow things down.

with days as (
    select (dateadd(day, -row_number() over (order by true), sysdate::date)) as day 
    from [other_existing_table] limit 30
)
select day from days order by 1 asc

You can use this method to get other time ranges as well for bucketing purposes. This version generates all the minutes for the previous day so you could do a left join against it and bucket your data.

with buckets AS (
    select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute 
    from [other_table] limit 1440
)
select minute from buckets order by 1 asc

I may have first seen this here.

Delmadelmar answered 8/12, 2015 at 23:1 Comment(0)
A
4

You will need to use functions that are supported by the leader node. The trick is to use the row_number() function from any table that you want. Let's say that we want to generate a date-series from 10 days ago up tp now:

   SELECT DATEADD('day', -n, (CURRENT_DATE+1)) AS generated_date
   FROM (SELECT ROW_NUMBER() OVER () AS n FROM my_table LIMIT 10) n
   ORDER BY generated_date DESC

And we get:

generated_date
2020-06-24 00:00:00
2020-06-23 00:00:00
2020-06-22 00:00:00
2020-06-21 00:00:00
2020-06-20 00:00:00
2020-06-19 00:00:00
2020-06-18 00:00:00
2020-06-17 00:00:00
2020-06-16 00:00:00
2020-06-15 00:00:00
Armour answered 24/6, 2020 at 16:6 Comment(0)
S
2

You are correct that this does not work on Redshift. See here.

You could use something like this

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
,generted_numbers AS
(
    SELECT (1000*t1.num) + (100*t2.num) + (10*t3.num) + t4.num-5000 as gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
select  gen_num from generted_numbers
where gen_num between -10 and 0
order by 1;
Scottyscotus answered 18/10, 2017 at 10:3 Comment(0)
A
0

You are not using PostgreSQL. You are using Amazon Redshift.

Amazon Redshift does not support generate_series when used with Redshift tables. It says it right there in the error message.

Either use real PostgreSQL, or if you need Redshift's features, you must also work within the limitations of Redshift.

Your second example works because it does not use any Redshift tables.

Aldo answered 31/3, 2014 at 12:36 Comment(2)
Isnt this a leader node only supported function?Kreit
@Kreit Maybe it is now, but not when I wrote this.Aldo
F
0

Why it's not working was explained above. Still, the question "what can we do about this?" is open.

If you develop a BI system on any platform (with generators supported or not), it is very handy to have dimension tables with sequences of numbers and dates. How can you create one in Redshift?

  1. in Postgres, produce the necessary sequence using generator
  2. export to CSV
  3. create a table with the same schema in Redshift
  4. import the CSV from Step 2 to Redshift

Imagine you have created a very simple table called calendar:

 id, date
 1, 2017-01-01
 2, 2017-01-02
 ..., ...
 xxx, 2020-01-01

So your query will look like this:

SELECT t.id, t.date_1, t.date_2, c.id as date_id, c.date
FROM mytable t
JOIN calendar c
ON c.date BETWEEN t.date_1::date AND t.date_2::date
ORDER BY 1,4

In calendar table you can also have first dates of week, month, quarter, weekdays (Mon,Tue,etc.), which makes such table super effective for time-based aggregations.

Fruge answered 16/6, 2017 at 12:39 Comment(0)
S
0

I've found in practice that counting rows off an existing table is too slow for my uses (and introduces a dependency).

Combining the "count rows" answer with Jon Scott's answer above we get this solution that we use at my company:

with n as (
    select 1 union select 2 union select 3 union select 4 union select 5 union
    select 6 union select 7 union select 8 union select 9 union select 10
  )
  select row_number() over () as n
  from n n1, n n2, n n3, n n4, n n5, n n6
Scroop answered 25/8, 2021 at 17:43 Comment(0)
P
-2

This works here (pg-9.3.3) Maybe your issue is just the result of a Redshift-"feature"?

CREATE TABLE mytable
        ( date1 timestamp
        , date2 timestamp
        );
INSERT INTO mytable(date1,date2) VALUES
( '2014-03-30 12:00:00' , '2014-04-01 12:00:00' );

SELECT  generate_series(0, ss.g) FROM
   ( SELECT date(date2) - date(date1) AS g
     FROM mytable
   ) ss ;
Perspiration answered 31/3, 2014 at 12:53 Comment(1)
OP asking about redshift.Existence

© 2022 - 2024 — McMap. All rights reserved.