Is it possible to dump a RDS database to S3 using AWS Data Pipeline?
Asked Answered
A

2

6

Basically I want to pg_dump my RDS database to S3 using AWS Data Pipeline,

I am not 100% sure if this is possible I got up to the stage where the SqlDataNode wants a selectQuery at which point i am wondering what to do.

Below is my template so far:

AWSTemplateFormatVersion: "2010-05-15"

Description: RDS to S3 Dump

Parameters:
  RDSInstanceID:
    Description: "Instance ID of RDS to Dump from"
  DatabaseName:
    Description: "Name of the Database to Dump"
    Type: String
  Username:
    Description: "Database Username"
    Type: String
  Password:
    Description: "Database password"
    Type: String
    NoEcho: true

RDSToS3Dump:
  Type: "AWS::DataPipeline::Pipeline"
  Properties:
    Name: "RDSToS3Dump"
    Description: "Pipeline to backup RDS data to S3"
    Activate: true
    ParameterObjects:
      -
        name: "SourceRDSTable"
        type: "SqlDataNode"
        Database: !Ref DatabaseName
      -
        name: !Ref DatabaseName
        type: "RdsDatabase"
        databaseName: !Ref DatabaseName
        username: !Ref Username
        password: !Ref Password
        rdsInstanceId: !Ref RDSInstanceID
      -
        name: "S3OutputLocation"
        type: "S3DataNode"
        filePath: #TODO: S3 Bucket here parameterized? Will actually need to create one.
      -
        name: "RDStoS3CopyActivity"
        type: "CopyActivity"
        input: "SourceRDSTable"
        output: "S3OutputLocation"
        #TODO: do we need a runsOn?
Awn answered 15/5, 2017 at 23:30 Comment(0)
K
3

As mentioned in another answer, AWS Data Pipeline only allows you to dump tables and not the entire DB. If you really want to use pg_dump to dump the entire contents of your DB to S3 using AWS CloudFormation, you can use Lambda-backed custom resources. Going down that route, you'll have to write a Lambda function that:

  • Connects to the DB
  • Takes the dump of your DB using pg_dump
  • Uploads it to S3
Kalvin answered 19/5, 2017 at 16:41 Comment(0)
U
1

Using Data Pipeline I believe you can only dump tables rather than the whole db as with pg_dump.

Have you looked at the docs as selectQuery just requires a SQL statement for what you want to dump, i.e. "select * from mytable"? Maybe this helps. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-sqldatanode.html

  -
    name: "SourceRDSTable"
    type: "SqlDataNode"
    Database: !Ref DatabaseName
    table: "mytable"
    selectQuery: "select * from #{table}"
Ultrared answered 19/5, 2017 at 12:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.