I have currently a task to improve a database-structure. For this we want to effectively dump and restore one single giant database. (approx. 1TB and growing)
To test things with this database, we wanted to transfer this database to another server-node, and this via pg_dump
and pg_restore
.
We are running a v10 (https://www.postgresql.org/docs/10/app-pgdump.html) Server, so we are limited to their possible parameters. It is also required to dump the full database, and not only parts.
For this I tried a couple of approaches, these sources helped a lot:
and foremost:
The problem is, that you can almost only improve one of these task, but not both simultaneously.
Case 1
Dumping in directory format is extremely fast (~1 hour), but restoring is not.
pg_dump --blobs --dbname="$DBNAME" --file=$DUMPDIR --format=directory --host=$SERVERHOSTNAME --jobs=$THREADS --port=$SERVERPORT--username="$SERVERUSERNAME"
pg_restore --clean --create --format=directory --jobs=$THREADS --host=$SERVERHOSTNAME --port=$SERVERPORT --username="$SERVERUSERNAME" "./"
Problem about this restore-method is, even though I assigned multiple cores to it, it only uses one, with barely 4% CPU used on the server-core.
Case 2
Dumping in custom format is extremely slow, that the server even couldn't complete it overnight (Session timeout).
pg_dump --blobs --compress=9 --dbname="$dbname" --file="$DUMPDIR/db.dump" --format=custom --host=$SERVERHOSTNAME --port=$SERVERPORT --username=$SERVERUSERNAME
So I had different approaches in mind:
- dump it with approach #1, convert it afterwards (how?) and use a faster restore method (variant #2? )
- Creating multiple dumps simultaniously on different cores but with different schemas (Having a total of 6), and then merge them back (how?)
Piping seems to be an ineffective way of dumping according to the author stated above.
Does anyone have more experience in this? And are my approach-ideas useful, or do you have a complete different solution in mind?
Oh, before I forget: We are currently limited to 5TB on our external server, and the internal server which runs the db should not get bloated with data-fragments, even temporarily.