Disabling WAL archiving during pg_restore?
Asked Answered
P

1

6

Let's say I made a quick backup dump of my Postgres 9.3 DB through pg_dump before doing a large destructive migration and I discovered I want to undo it. No writes were performed against the DB in the meantime.

Say I run pg_restore -c -d mydb < foo.dump to load the dump back into the db. Assuming I have WAL-E set up to archive every 16mb of WAL, do I need to turn off archive_mode before performing the restore? It would not be super useful for me to archive the xlog as I'm writing the dump back into the DB, since I already have perfectly valid base backups and WAL segments archived for before the dump. Also there are serious consequences to performing xlog shipping as I'm restoring the dump, which get worse with the size of the dump.

Do you end up disabling archiving before a restore? Do you do anything else to speed things up? There's a discussion of restore performance in this post, but it doesn't cover archiving at all, unless I missed something.

Presentational answered 3/6, 2014 at 4:30 Comment(0)
B
8

You can't really turn WAL archiving on and off like that. WAL replay requires continuity.

If you turned WAL archiving off, then made changes, then turned it back on, the new WAL generated after turning it off and on again would be useless. They could not be applied to the DB, and you'll have to make a new base backup before you can resume WAL replay / PITR.

If you turn xlog shipping off during a restore, you'll want to purge your old base backup and WAL archives, then create a new base backup before resuming WAL shipping.

Bangup answered 3/6, 2014 at 4:59 Comment(7)
Yes, I should have clarified that I'd have had to make a new base backup right afterwards and lost continuity with the previous chunk of history. My question still stands though: what's the best practice for reading with restores in this kind of situation?Presentational
If there's only one DB you care about in the cluster, you can speed up restores massively by stopping Pg, setting fsync=off, starting Pg, doing the restore, stopping Pg, setting fsync=on and starting Pg back up. However, if Pg crashes with fsync=off you may have severe or total data loss. You can also turn off archiving and set wal_level = minimal for the period. There are existing questions on dba.stackexchange.com about speeding up Pg restores...Bangup
Is there really any way I can have severe data loss if I'm actually restoring from a dump through pg_restore? I still have the dump on the disk, right? I can just try again, no?Presentational
If that's the only DB in your PostgreSQL install, sure. But if you have other DBs in the same install that have other data of value, that might be a problem as the write-ahead log, transaction log, etc is global across all databases.Bangup
@CraigRinger What, really? As-in, every database on the cluster ('instance' in SQL Server world, i.e. all the databases served by a single PostgreSQL program) shares the same WAL and transaction logs?Mcclung
@KennyEvitt Correct. There's no way to separate them. It's one of the numerous reasons why I tend to recommend that people stick to one PostgreSQL database per instance for scalable deployments. It's a significant hassle. If the WAL was split out, we'd have issues with flush ordering, persistence of transaction commit logs etc vs data logs, and more. But really it still needs to be done...Bangup
@CraigRinger "I tend to recommend that people stick to one PostgreSQL database per instance for scalable deployments." – that's really good to know! Instead of splitting the WAL out by database, they could double down on the per-cluster focus and allow cross-database queries and referential integrity in the same cluster.Mcclung

© 2022 - 2024 — McMap. All rights reserved.