Get identity of row inserted in Snowflake Datawarehouse

Asked 18/12, 2018 at 17:8 Answered 14/9, 2024 at 14:31

Solved sql snowflake-cloud-data-platform

If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created. I know that generally, StackOverflow questions need some sort of code that was attempted or research effort, but I'm not sure where to begin with Snowflake. I've dug through their documentation and I've found nothing for this.

The best I could do so far is try result_scan() and last_query_id(), but these don't give me any relevant information about the row that was inserted, just confirmation that a row was inserted.

I believe what I'm asking for is along the lines of MS SQL Server's SCOPE_IDENTITY() function.

Is there a Snowflake equivalent function for MS SQL Server's SCOPE_IDENTITY()?

EDIT: for the sake of having code in here:

CREATE TABLE my_db..my_table
(
    ROWID INT IDENTITY(1,1),
    some_number INT,
    a_time TIMESTAMP_LTZ(9),
    b_time TIMESTAMP_LTZ(9),
    more_data VARCHAR(10)
);
INSERT INTO my_db..my_table
(
    some_number,
    a_time,
    more_data
)
VALUES
(1, my_time_value, some_data);

I want to get to that auto-increment ROWID for this row I just inserted.

Clementeclementi answered 18/12, 2018 at 17:8 Comment(4)

Forgive my ignorance...don't you just want the MAX(ROWID)? So... SELECT MAX(ROWID) FROM my_db.my_table; – Unthankful 18/12, 2018 at 17:38

@JonJaussi Pretend that 5-20 other processes are inserting records into this table at the same time. MAX(ROWID) might grab a row that was created by another process in that same time period. – Clementeclementi 18/12, 2018 at 17:39

Thanks for this clarification. Is there a "natural key" in your data? Some concatenated combination of other fields in the record that you are inserting that would define the record as unique? The idea is you could lookup the ROWID using the "natural key" (Example: some_number | a_time | more_data)? – Unthankful 18/12, 2018 at 17:52

@JonJaussi I'm doing something to that effect now, but I feel it's a sloppy way to go about this. I was really hoping for a more proper way to get that ROWID. – Clementeclementi 18/12, 2018 at 19:52

NOTE: The answer below can be not 100% correct in some very rare cases, see the UPDATE section below

Original answer

Snowflake does not provide the equivalent of SCOPE_IDENTITY today.

However, you can exploit Snowflake's time travel to retrieve the maximum value of a column right after a given statement is executed.

Here's an example:

create or replace table x(rid int identity, num int);
insert into x(num) values(7);
insert into x(num) values(9);
-- you can insert rows in a separate transaction now to test it
select max(rid) from x AT(statement=>last_query_id());
----------+
 MAX(RID) |
----------+
 2        |
----------+

You can also save the last_query_id() into a variable if you want to access it later, e.g.

insert into x(num) values(5);
set qid = last_query_id();
...
select max(rid) from x AT(statement=>$qid);

Note - it will be usually correct, but if the user e.g. inserts a large value into rid manually, it might influence the result of this query.

UPDATE

Note, I realized the code above might rarely generate incorrect answer.

Since the execution order of various phases of a query in a distributed system like Snowflake can be non-deterministic, and Snowflake allows concurrent INSERT statements, the following might happen

Two queries, Q1 and Q2, do a simple single row INSERT, start at roughly the same time
Q1 starts, is a bit ahead
Q2 starts
Q1 creates a row with value 1 from the IDENTITY column
Q2 creates a row with value 2 from the IDENTITY column
Q2 gets ahead of Q1 - this is the key part
Q2 commits, is marked as finished at time T2
Q1 commits, is marked as finished at time T1

Note that T1 is later than T2. Now, when we try to do SELECT ... AT(statement=>Q1), we will see the state as-of T1, including all changes from statements before, hence including the value 2 from Q2. Which is not what we want.

The way around it could be to add a unique identifier to each INSERT (e.g. from a separate SEQUENCE object), and then use a MAX.

Sorry. Distributed transactions are hard :)

Heads answered 23/12, 2018 at 12:42 Comment(6)

This looks promising. I'll give this a try shortly and report back! – Clementeclementi 23/12, 2018 at 13:16

Sorry, I realized my answer might generate wrong results, updated it. – Heads 26/12, 2018 at 0:24

@MarcinZukowski is Snowflake working on fixing this? I am tasked with investigating Snowflake for our company and currently the one and only showstopper I see is this issue. From Snowflake Community Forums, it appears no progress is made. Any color would be greatly appreciated. – Flightless 16/9, 2019 at 17:43

@JohnZabroski It's on our todo list, but no active development as of now. If this is blocking you, please raise it through your sales/support channel, this might help prioritizing it. – Heads 17/9, 2019 at 22:37

Just reporting back after having used this for over a year - @MarcinZukowski is correct with his update, we experience concurrency issues with this method. With that said, I haven't had time to update the feature that implements this with a unique identifier in each INSERT. To be fair to snowflake, in my use case, it's supporting some back end features to a website, which isn't really what data warehouses exist for in the first place. – Clementeclementi 6/3, 2020 at 15:59

@MarcinZukowski What about using CHANGE TRACKING? I was playing around with this:

sql CREATE OR REPLACE TABLE MYTEST ( ID INT IDENTITY(1,1), VAL1 VARCHAR);  ALTER TABLE MYTEST SET CHANGE_TRACKING = TRUE;  set ts1 = current_timestamp(); INSERT INTO MYTEST(VAL1) VALUES('Salut!'); set ts2 = current_timestamp(); select *  from mytest    changes(information => append_only)   at(timestamp => $ts1)   end(timestamp => $ts2);

Do you think this could be a valid approach? – Thomsen 19/8, 2021 at 21:13

New answer to an old question here, as @MarcinZukowski good answer relies on Time Travel it may be difficult to get right in a distributed environment.

Instead of using AUTOINCREMENT directly in the definition of the table column, we can use a SEQUENCE separate from the table definition, and use it for the DEFAULT value of the identity column.

CREATE OR REPLACE SEQUENCE my_table_seq;

CREATE TABLE my_table
(
    ROWID INT DEFAULT my_table_seq.nextval,
    some_number INT,
    a_time TIMESTAMP_LTZ(9),
    b_time TIMESTAMP_LTZ(9),
    more_data VARCHAR(10)
);

When inserting a new item, we can now use the next id in the sequence to insert it explicitly:

BEGIN TRANSACTION;
SET row_id = (SELECT my_table_seq.nextval id);
INSERT INTO my_table
(
    rowid,
    some_number,
    a_time,
    more_data
)
VALUES
($row_id, 1, current_timestamp(), 'some_data');
SELECT $row_id;
COMMIT;

Inside that transaction you can grab that $row_id value and do something else, like inserting related data into another table. Above we simply do a SELECT $row_id; to output it.

It also works as a single line statement where you don't need to capture the identity of the new item:

INSERT INTO my_db..my_table
(
    some_number,
    a_time,
    more_data
)
VALUES
(1, current_timestamp(), 'some_data');

Trophy answered 5/3, 2024 at 15:32 Comment(0)

If I have a table with an auto-incrementing ID column, I'd like to be able to insert a row into that table, and get the ID of the row I just created.

FWIW, here's a slight variation of the current accepted answer (using Snowflake's 'Time Travel' feature) that gives any column values "of the row I just created." It applies to auto-incrementing sequences and more generally to any column configured with a default (e.g. CURRENT_TIMESTAMP() or UUID_STRING()). Further, I believe it avoids any inconsistencies associated with a second query utilizing MAX().

Assuming this table setup:

CREATE TABLE my_db.my_table
(
    ROWID INT IDENTITY(1,1),
    some_number INT,
    a_time TIMESTAMP_LTZ(9),
    b_time TIMESTAMP_LTZ(9),
    more_data VARCHAR(10)
);

Make sure the 'Time Travel' feature (change_tracking) is enabled for this table with:

ALTER TABLE my_db.my_table SET change_tracking = true;

Perform the INSERT per usual:

INSERT INTO my_db.my_table
(
    some_number,
    a_time,
    more_data
)
VALUES
(1, my_time_value, some_data);

Use the CHANGES clause with BEFORE(statement... and END(statement... specified as LAST_QUERY_ID() to SELECT the row(s) added to my_table which are the precise result of the previous INSERT statement (with column values that existed the moment the row(s) was(were) added, including any defaults):

SET insertQueryId=LAST_QUERY_ID();
SELECT
    ROWID,
    some_number,
    a_time,
    b_time,
    more_data
FROM my_db.my_table
    CHANGES(information => default)
    BEFORE(statement => $insertQueryId)
    END(statement => $insertQueryId);

For more information on the CHANGES, BEFORE, END clauses see the Snowflake documentation here.

Poetess answered 9/6, 2022 at 20:23 Comment(0)

There's a simple solution here that I use in cases like using a url parameter to get my last insert ID when creating a new row of data on a table.

Select your table with a simple get all ids, order by descending and set a limit of 1. Here's an example of how you'd implement that statement with an extremely simple query:

SELECT id FROM your_database.table
ORDER BY id DESC
LIMIT 1;

Which will select all id's from said table, reverse the order; meaning the rows on the table will now be ordered from last inserted to first inserted, then setting the limit to 1. This will return only the first row being selected, which will be the last id that you inserted.

Hope this helps!

Impressionist answered 14/9, 2024 at 14:31 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Original answer

UPDATE

Recommended topics

Hot tags