Synchronize data from MySql to Amazon RedShift
Asked Answered
G

4

6

We do some aggregation on huge datasets in Amazon RedShift, and we have some relatively small amount of data in MySQL. For some of the joins in RedShift we need the data in MySQL. What is the best way to synchronize the MySql data to RedShift? Is there such a thing in redshift like the remote view in oracle? Or should I programatically query MySql and insert / update in RedShift?

Gheber answered 16/10, 2013 at 8:21 Comment(0)
I
3

When MySQL data is required for joins in Redshift, we usually just send it over from one to another.

It implies:

  1. Redshift: Creating an analogous table schema (bearing in mind Redshift/PSQL's particularities)
  2. MySQL: Dumping the data table (in csv format)
  3. Zipping the export, and sending it to S3
  4. Redshift: Truncating the table, and importing all data using COPY

Steps 2 to 4 can be scripted, and allow you to send fresh data over to Redshift when necessary or regularly.

Intertype answered 16/10, 2013 at 23:16 Comment(0)
L
7

Redshift now supports loading data from remote hosts via SSH. This technique involves:

  1. Adding the public key from the cluster to the authorized_keys file on the remote host(s)
  2. Allowing SSH access to the remote host(s) from the IP addresses of the cluster nodes
  3. Uploading a JSON manifest to S3 specifying the remote host(s), the public key(s), and the command(s) to execute on the remote host
  4. Running the COPY command with a specified manifest file and AWS credentials

The command specified by the manifest runs an arbitrary command that prints text output in a format suitable for ingest by the Redshift COPY command.

Lublin answered 11/11, 2013 at 23:25 Comment(0)
I
3

When MySQL data is required for joins in Redshift, we usually just send it over from one to another.

It implies:

  1. Redshift: Creating an analogous table schema (bearing in mind Redshift/PSQL's particularities)
  2. MySQL: Dumping the data table (in csv format)
  3. Zipping the export, and sending it to S3
  4. Redshift: Truncating the table, and importing all data using COPY

Steps 2 to 4 can be scripted, and allow you to send fresh data over to Redshift when necessary or regularly.

Intertype answered 16/10, 2013 at 23:16 Comment(0)
G
0

What is "remote view" in Oracle?

Anyway, if you can extract data from table to CSV file you have one more scripting option. You can use Python/boto/psycopg2 combo to script your CSV load to Amazon Redshift.

In my MySQL_To_Redshift_Loader I do the following:

  1. Extract data from MySQL into temp file.

    loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server]    
    ...
    q="""
    %s %s
    INTO OUTFILE '%s'
    FIELDS TERMINATED BY '%s'
    ENCLOSED BY '%s'
    LINES TERMINATED BY '\r\n';
    """ % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote)
    p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
    p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
    ...
    
  2. Compress and load data to S3 using boto Python module and multipart upload.

    conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(bucket_name)
    k = Key(bucket)
    k.key = s3_key_name
    k.set_contents_from_file(file_handle, cb=progress, num_cb=20, 
    reduced_redundancy=use_rr )
    
  3. Use psycopg2 COPY command to append data to Redshift table.

    sql="""
    copy %s from '%s' 
    CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' 
    DELIMITER '%s' 
    FORMAT CSV %s 
    %s 
    %s 
    %s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)
    
Garlaand answered 27/4, 2016 at 21:19 Comment(0)
P
-3

Check this simplest way to load Mysql Data to redshift. When your expectation just loading initial data snapshots to redshift, try with that free solution. Moreover you will get schema migration, side by side query console, and some statistical report (with chart) of entire loading process.

Planimetry answered 17/11, 2015 at 14:41 Comment(1)
Please do not post to promote your post, which has no solution but just a reference. This is not a lead generation portal.Whosoever

© 2022 - 2024 — McMap. All rights reserved.