restore from pg_basebackup
Asked Answered
F

2

5

I made daily backups of a postgresql DB using the command

/usr/bin/pg_basebackup -D $outdir -Ft -x -z -w -R -v

Now I want to restore this DB on another server. I used the description on https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-PITR-RECOVERY.

The recovery.conf file included in the backup has the following contents:

standby_mode = 'on'
primary_conninfo = 'user=postgres port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

The next step (8.) in the documentation says to start postgresql. This results in a failure due to a timeout:

3783 postgres: startup process   waiting for 0000000100000024000000B 

On the original server I don't have this file. Is it possible to restore only the state of the pg_basebackup without using any WAL files? What should then be in the recovery.conf file?

Following the suggestion by @JosMac I moved the recovery.conf with this result:

shaun2:/var/lib/pgsql/data # service postgresql start
Job for postgresql.service failed because the control process exited with error code. See "systemctl status postgresql.service" and "journalctl -xe" for details.
shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2018-06-18 12:02:53 CEST; 12s ago
  Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
  Process: 9355 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
 Main PID: 1060 (code=exited, status=0/SUCCESS)

Jun 18 12:02:52 shaun2 postgres[9369]: [3-1] 2018-06-18 12:02:52 CEST   LOG:  invalid checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-1] 2018-06-18 12:02:52 CEST   FATAL:  could not locate required checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-2] 2018-06-18 12:02:52 CEST   HINT:  If you are not restoring from a backup, try removing the file "/var/lib/pgsql/data/backup_label".
Jun 18 12:02:52 shaun2 postgres[9367]: [2-1] 2018-06-18 12:02:52 CEST   LOG:  startup process (PID 9369) exited with exit code 1
Jun 18 12:02:52 shaun2 postgres[9367]: [3-1] 2018-06-18 12:02:52 CEST   LOG:  aborting startup due to startup process failure
Jun 18 12:02:53 shaun2 postgresql-init[9355]: pg_ctl: could not start server
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Control process exited, code=exited status=1
Jun 18 12:02:53 shaun2 systemd[1]: Failed to start PostgreSQL database server.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Unit entered failed state.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Failed with result 'exit-code'.

I suppose that PostgreSQL is still looking for the missing WAL file because of the contents of backup_label:

shaun2:/var/lib/pgsql/data # cat backup_label
START WAL LOCATION: 24/B0000028 (file 0000000100000024000000B0)
CHECKPOINT LOCATION: 24/B0000028
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-06-14 02:55:08 CEST
LABEL: pg_basebackup base backup

Result after moving backup_label away:

shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2018-06-18 12:17:54 CEST; 4s ago
  Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
  Process: 10401 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
 Main PID: 1060 (code=exited, status=0/SUCCESS)

Jun 18 12:17:53 shaun2 postgres[10414]: [4-1] 2018-06-18 12:17:53 CEST   LOG:  invalid secondary checkpoint record
Jun 18 12:17:53 shaun2 postgres[10414]: [5-1] 2018-06-18 12:17:53 CEST   PANIC:  could not locate a valid checkpoint record
Jun 18 12:17:54 shaun2 postgres[10412]: [2-1] 2018-06-18 12:17:54 CEST   LOG:  startup process (PID 10414) was terminated by signal 6: Aborted
Feltie answered 15/6, 2018 at 13:4 Comment(0)
D
8

We use pg_basebackup for backups and also did several restorations so generally it works very well without problems.

But I would recommend you to use parameter -X stream instead of -x (meaning "-X fetch"). With this parameter pg_basebackup will catch and store WAL log segments created during the time of backup together with data files. These WAL logs will be stored in separate pg_xlog.tar or pg_wal.tar files (depending on PG version).

Full description of restoration can be find here - pg_basebackup / pg-barman – restore tar backup

Dowel answered 15/6, 2018 at 13:17 Comment(9)
Unfortunately I want to restore a backup file from 2 days ago, that was made with the parameter -x and not -X stream. I don't have neither a pg_xlog.tar nor a pg_wal.tar. The description in your link does presume the existence of one of these files in the recovery.conf file.Feltie
Well theoretically - if there were no transactions during basebackup run you could try to start restored backup without recovery file. But I doubt that... If you have archive_mode ON or Always on your DB and archive_command set or wal_keep_segments really high you can copy WAL logs from your DB. If not I would make another basebackup with new parameters...Dowel
I need the data from last Wednesday, so a new backup does not help. The backup is made 3am, and there is no activity at this time. So the data in the base backup would be accurate enough - if I could restore them. The current WAL files begin today in the morning, so they are not of any help.Feltie
OK, if you are sure there was no activity stop postgresql service (if it is running), rename recovery.conf file and try to start postgresql service again and watch messages in /var/log/postgresql/postgresql-xxx-main.logDowel
I see - look at the message HINT: If you are not restoring from a backup, try removing the file "/var/lib/pgsql/data/backup_label" - try to rename this file and try it again.Dowel
I edited once adding the result without backup_label file.Feltie
I see - there is a possibility to reset checkpoint - see here - #8799974Dowel
Actually we have got a step further. We replayed the tar decompression and had the one WAL file that was needed. Unfortunately we still can't start PostgreSQL, because of 'FATAL: could not connect to the primary server'. I suppose that this is due to the fact that we installed the backup on another server. I'll ask another question for this part.Feltie
We managed now to restore the data on the second server by returning to your first suggestion: delete the recovery.conf (but this time with the requested WAL file - and it worked. Thank you very much for your help and patience!Feltie
V
1

The -R option generates a recovery.conf file that is useful if the backup will be used in replica servers, because it sets the server in standby_mode and it also has the primary_conninfo to pull data from the primary.

So, if you just want to make/restore backups, I wouldn't use -R. Just in case it helps, I used these options: -v -P -x -F tar -z.

To restore the backup, unzip it to the proper directory (e.g. /var/lib/postgresql/$VERSION/main), create an empty recovery.conf file there (or clear the one you have, but better don't use -R), and start the server.

Vedavedalia answered 1/11, 2022 at 11:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.