sequence number generation function in AWS redshift
Asked Answered
D

4

6

Is there a sequence number generation function in redshift ? Or a function that takes combination of values and gives out a numerical hash key ?

Doubledealing answered 25/3, 2014 at 18:31 Comment(0)
R
7

There is no concept of sequences (as seen in Oracle) at the moment.

You have a few options:

  • Number tables
  • RANK() or ROW_NUMBER() window functions over the whole set. Note that this can have some negative performance implications if you have a multi-node cluster.
  • Columns defined as IDENTITY(seed, step). Note that IDENTITY sequence may be 'sparse' (e.g. have gaps in the sequence).
Repp answered 26/3, 2014 at 14:58 Comment(3)
We have multinode redshift cluster. IDENTITY - is expected to provide unique values without collision ?Doubledealing
Yes, IDENTITY values will not collide but they are not guaranteed to be contiguous, e.g., you'll get a run of values like 1,9,17,25 for an 8 node cluster.Repp
In my case, always i have to keep incrementing the value whenever i run the sql. So, i used Max(existing sequence from target) + row_number()... Hope it helps to someoneCorrectitude
P
9

Here is another way to generate 1 million numbers

with seq_0_9 as (
select 0 as num
union all select 1 as num
union all select 2 as num
union all select 3 as num
union all select 4 as num
union all select 5 as num
union all select 6 as num
union all select 7 as num
union all select 8 as num
union all select 9 as num
), seq_0_999 as (
select a.num + b.num * 10 + c.num * 100 as num
from seq_0_9 a, seq_0_9 b, seq_0_9 c
)
select a.num + b.num * 1000 as num
from seq_0_999 a, seq_0_999 b
order by num
Partisan answered 19/2, 2016 at 7:56 Comment(0)
R
7

There is no concept of sequences (as seen in Oracle) at the moment.

You have a few options:

  • Number tables
  • RANK() or ROW_NUMBER() window functions over the whole set. Note that this can have some negative performance implications if you have a multi-node cluster.
  • Columns defined as IDENTITY(seed, step). Note that IDENTITY sequence may be 'sparse' (e.g. have gaps in the sequence).
Repp answered 26/3, 2014 at 14:58 Comment(3)
We have multinode redshift cluster. IDENTITY - is expected to provide unique values without collision ?Doubledealing
Yes, IDENTITY values will not collide but they are not guaranteed to be contiguous, e.g., you'll get a run of values like 1,9,17,25 for an 8 node cluster.Repp
In my case, always i have to keep incrementing the value whenever i run the sql. So, i used Max(existing sequence from target) + row_number()... Hope it helps to someoneCorrectitude
B
2

I am new to Redshift, and I found this article looking for a common sequence, that is not supported on Amazon database. I found this solution I will report with a complete example using ROW_NUMBER.

I have schemas sta and dim. In sta I have staging tables, while in dim I have dimension tables I want to populate with ids. I have a source of information that has fields trk_key, name containing for instance some publishers.

CREATE TABLE sta.publisher (
        trk_key VARCHAR(20),
        name VARCHAR(225)
);
CREATE TABLE dim.publisher (
        id SMALLINT,
        trk_key VARCHAR(20),
        name VARCHAR(255),
        PRIMARY KEY (id)
);

First I truncate sta.publisher table and load there a csv file. Then I launch the following query

-- This query is idempotent:
-- it will insert a publisher found in sta.publisher table only if
-- it is not already in dim.publisher table.
INSERT INTO dim.publisher
SELECT
        -- Generate id using max id found in dim.publisher.
        -- Start with id=1 if dim.publisher is empty.
        (
                SELECT NVL(MAX(id), 0)
                FROM dim.publisher
        ) + ROW_NUMBER() OVER() AS id,
        trk_key,
        name
FROM sta.publisher
        -- Only insert record if trk_key is not found in dim.publisher table.
        WHERE trk_key NOT IN (
                SELECT trk_key
                FROM dim.publisher
        )
Batchelor answered 20/10, 2015 at 12:50 Comment(0)
P
2

You can still query the Posgres function generate_series(start, stop, step)

But it isn't a supported function and can only run on the leader node, which makes it fairly useless, but hey you can still select generate_series(1,100) and get a list of 100 numbers. You just can't do anything with that function not on the leader node.

http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html

Pouncey answered 20/6, 2017 at 22:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.