Here is an answer using only the command line.
Pre-requisites:
- you must have Postgres client tools installed (don't need the actual server)
- client version must be same or higher than your postgres server version
- network access to the RDS instance
- credentials for accessing the relevant database accounts
Example context:
- I have an RDS instance at
rds.example.com
which has a master user named rds_master
.
- I have an "application user" named
db_dev_user
, a database named dev_db
that contains the schema app_schema
.
- note that "user" and "role" in postgres are synonymous
Note: this guide was written in 2017, for postgres version 9.6.
If you find that some steps are no longer working on a recent version of postgres - please do post any fixes to this post as comments or alternative answers.
pg_dump prints out the schema and data of the original database and will work even while there are active connections to the database. Of course, performance for those connections is likely to be affected, but the resultant copy of the DB is transactional.
pg_dump --host=rds.example.com --port=5432 \
--format=custom \
--username=db_dev_user --dbname=dev_db \
> pgdumped
The createuser command creates the user that your test application/processes should connect with (for better isolation), note that the created user is not a superuser and it cannot create databases or roles.
createuser --host=rds.example.com --port=5432 \
--username=rds_master \
--no-createdb --no-createrole --no-superuser \
--login --pwprompt \
db_test_user
Without this next grant command the following createdb
will fail:
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=postgres \
--command="grant db_test_user TO rds_master"
createdb does what it says on the tin; note that the db_test_user
role "owns" the DB.
createdb --host=rds.example.com --port=5432 \
--username=rds_master --owner=db_test_user test_db
The create schema command is next. The db_test_user
cannot create the schema, but it must be authorized for the schema or the pg_restore
would fail because it would end up trying to restore into the pg_catalog
schema (so note that user=rds_master
, but dbname=test_db
).
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=test_db \
--command="create schema app_schema authorization db_test_user"
Finally, we issue the pg_restore command, to actually create the schema objects (tables, etc.) and load the data into them:
pg_restore --host=rds.example.com --port=5432 \
--verbose --exit-on-error --single-transaction \
--username=db_test_user --schema=app_schema \
--dbname=test_db --no-owner \
./pgdumped
exit-on-error
- because otherwise finding out what went wrong involves too much scrolling and scanning (also it's implied by single-transaction
anyway)
single-transaction
- avoids having to drop or recreate the DB if things go pear-shaped
schema
- only do the schema we care about (can also supply this to the original pg_dump
command)
dbname
- to ensure use of the DB we created
no-owner
- we're connecting as db_test_user
anyway, so everything should be owned by the right user
pg_dump
works, but dumping direct from RDS seems to be excruciatingly slow. We're talking ~500KB/s->2MB/s for tables that are 50GB is size on gigabit connections. I can only imagine there is some throttling or limitation RDS-side? – Stoddart