Google Spanner - How do you copy data to another table?
Asked Answered
V

3

6

Since spanner does not have ddl feature like

insert into dest as (select * from source_table)

How do we select subset of a table and copy that rows into another table ?

I am trying to write data to temporary table and then move data to archive table at the end of day. But only solution i could find so far is, select rows from source table and write them to new table. Which is done using java api, and it does not have a ResultSet to Mutation converter, so i need to map every column of table to new table, even they are exactly same.


Another thing is updating just one column data, like there is no way of doing "update table_name set column= column-1 "

Again to do that, i need to read that row and map every field to update Mutation, but this is not useful if have many tables, i need to code for all of them, a ResultSet -> Mutation converted would be nice too.

Is there any generic Mutation cloner and/or any other way to copy data between tables?

Verniavernice answered 8/9, 2017 at 14:13 Comment(2)
Also there is a 20000 mutation limit per transaction. Which is very low i think, if i have a 20 columns table and want to copy it to another table, i can only copy 1000 rows in one transaction, which leads to an useless transaction scenario for this case (i have at least 10k rows per key to copy at the end of day). cloud.google.com/spanner/docs/limitsVerniavernice
We are building Apache Beam connectors, so it will be possible to run large scale batch operations with Cloud Dataflow. You can follow the status here issues.apache.org/jira/browse/BEAM-1542Moradabad
F
5

As of version 0.15 this open source JDBC Driver supports bulk INSERT-statements that can be used to copy data from one table to another. The INSERT-syntax can also be used to perform bulk UPDATEs on data.

Bulk insert example:

INSERT INTO TABLE  
(COL1, COL2, COL3)  
SELECT C1, C2, C3  
FROM OTHER_TABLE  
WHERE C1>1000  

Bulk update is done using an INSERT-statement with the addition of ON DUPLICATE KEY UPDATE. You have to include the value of the primary key in your insert statement in order to 'force' a key violation which in turn will ensure that the existing rows will be updated:

INSERT INTO TABLE  
(COL1, COL2, COL3)  
SELECT COL1, COL2+1, COL3+COL2  
FROM TABLE  
WHERE COL2<1000  
ON DUPLICATE KEY UPDATE  

You can use the JDBC driver with for example SQuirreL to test it, or to do ad-hoc data manipulation.

Please note that the underlying limitations of Cloud Spanner still apply, meaning a maximum of 20,000 mutations in one transaction. The JDBC Driver can work around this limit by specifying the value AllowExtendedMode=true in your connection string or in the connection properties. When this mode is allowed, and you issue a bulk INSERT- or UPDATE-statement that will exceed the limits of one transaction, the driver will automatically open an extra connection and perform the bulk operation in batches on the new connection. This means that the bulk operation will NOT be performed atomically, and will be committed automatically after each successful batch, but at least it will be done automatically for you.

Have a look here for some more examples: http://www.googlecloudspanner.com/2018/02/data-manipulation-language-with-google.html

Footlights answered 25/9, 2017 at 21:0 Comment(2)
Thank you very much for the driver link and explanation. I implemented a similar thing with java api, to move data between tables.Verniavernice
This doesn't work for me in Jan 2023. ON DUPLICATE KEY UPDATE is not supported by Spanner in general.University
T
2

Another approach to perform Bulk update can be using LIMIT & OFFSET


insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000);

insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000 OFFSET 1001);

insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000 OFFSET 2001);

.

.

.

reach till where required.

PS: This is more of a trick. But will definitely save you time.

Trutko answered 16/1, 2020 at 11:3 Comment(0)
U
0

Spanner supports expression in the SET section of an UPDATE statement which can be used to supply a subquery fetching data from another table like this:

UPDATE target_table
SET target_field = (
    -- use subquery as an expression (must return a single row)
    SELECT source_table.source_field
    FROM source_table
    WHERE my_condition IS TRUE
) WHERE my_other_condition IS TRUE;

The generic syntax is:

UPDATE table SET column_name = { expression | DEFAULT } WHERE condition
University answered 30/1, 2023 at 17:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.