How to select data from a table and insert into another table?
Asked Answered
P

4

19

I want to select specific fields of a table in cassandra and insert them into another table. I do this in sql server like this:

INSERT INTO Users(name,family)
SELECT name,family FROM Users

How to to this in cassandra-cli or cqlsh?

Patrinapatriot answered 26/1, 2014 at 11:59 Comment(2)
in your table is their any column with UUID datatype?Hakenkreuz
Possible duplicate of cassandra copy data from one columnfamily to another columnfamilyKyungkyushu
H
29
COPY keyspace.columnfamily1 (column1, column2,...) TO 'temp.csv';
COPY keyspace.columnfamily2 (column1, column2,...) FROM 'temp.csv';

here give your keyspace(schema-name) and instead of columnfamilyname1 use the table to which you want to copy and in columnfamily2 give the tablename in which you want to copy..

And yes this is solution for CQL,however I have never tried in with CLI.

Hakenkreuz answered 27/1, 2014 at 10:24 Comment(1)
This works for cqlsh, but not for CQL in general. It is supported in DataStax though.Bianchi
L
5

For tables that are not very large, save yourself the file and use an anonymous pipe:

cqlsh -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" | cqlsh -e "COPY keyspace.target_table (col1, col2, ...,ColN ) FROM STDIN" 

For very large datasets, this won't work. A strategy of per token range should be explored

Lamori answered 5/2, 2017 at 16:5 Comment(0)
L
2

For very large tables, CQLSH will have a hard time handling the COPY TO/FROM. Here is how to sweep the table by local token ranges and copy desired columns from one table to another. Because it uses the local copy, this is to be executed on every node of the Datacenter:

    #!/bin/bash
    #
    # Script to COPY from a SOURCE table's select columns over a TARGET table
    # Assumes the following:
    #
    # * The SOURCE table is very large - otherwise just try:
    #   cqlsh  -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" \
    #   |cqlsh -e "COPY keyspace.tgt_table (col1, col2, ...,ColN ) FROM STDIN"
    # * SOURCE AND TARGET TABLES are in the SAME KEYSPACE
    # * TARGET columns are named the SAME as SOURCE
    #
    # The script sweeps thru the local tokens to copy only the local data over to the new table
    # Therefore, this script needs to run on every node on the datacenter
    #
    # Set these variables before executing
    #
    USR=cassandra
    PWD=password
    KSP=my_keyspace
    SRC=src_table
    COL="col1, col2, col3"
    PKY="col1"
    TGT=tgt_table

    CQLSH="cqlsh -u ${USR} -p ${PWD} -k ${KSP}"

    function getTokens(){
       for i in $($CQLSH -e "select tokens from system.local;" | awk -F, '/{/{print $0}' | tr -d '{' | tr -d '}' | tr -d ','); do
           echo ${i//\'/}
       done | sort -n
    }

    function getDataByTokenRange(){
       i=0
       tokens=($(getTokens))
       while [ ${i} -lt ${#tokens[@]} ]; do 
             [ ${i} -eq 0 ]         && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) <= ${tokens[i]};" 
             [ "${tokens[i+1]}" ]   && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) >  ${tokens[i]} AND token(${PKY}) <= ${tokens[i+1]};"
             [ ! "${tokens[i+1]}" ] && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) >  ${tokens[i]};"
             ((i++))
       done
    } 

function cqlExec(){ 
  while IFS='' read -r cql || [[ -n "$line" ]]; do
    $CQLSH -e "CONSISTENCY LOCAL_ONE; $cql"                                                                \
    |awk -F\| '( !/LOCAL_ONE/ && !/'"${COL/, /|}"'/ && !/^\-+/ && !/^\([0-9]+ rows)/ && !/^$/ ){print $0}' \
    |sed -e 's/^[ ]*//g' -e 's/[ ]*|[ ]*/|/g'                                                              \
    |$CQLSH -e "COPY ${TGT} (${COL}) FROM STDIN WITH DELIMITER = '|' and HEADER=false;"
    [ "$?" -gt 0 ] && echo "ERROR: Failed to import data from command: ${command}"
  done < "$1"
}

    main(){
       echo "Begin processing ..."
       getDataByTokenRange > getDataByTokenRange.ddl
       cqlExec getDataByTokenRange.ddl
       echo "End procesing"
    }

    main
Lamori answered 5/2, 2017 at 22:10 Comment(0)
R
0

If you (like me) have data that has complex types or object types it's getting tricky to copy data with cqlsh COPY. I solve my problems with DataGrip. you can export data as insert statements, later you can load .cql script directly into cqlsh.

Ransome answered 7/1, 2022 at 19:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.