Repair Corrupt database postgresql
Asked Answered
P

4

19

I have multiple errors with my postgresql db, which resulted after a power surge:

I cannot access most tables from my database. When I try for example select * from ac_cash_collection, I get the foolowing error:

ERROR: missing chunk number 0 for toast value 118486855 in pg_toast_2619

when I try pg_dump I get the following error:

Error message from server: ERROR:  relation "public.st_stock_item_newlist" does not exist
pg_dump: The command was: LOCK TABLE public.st_stock_item_newlist IN ACCESS SHARE MODE

I went ahead and tried to run reindex of the whole database, I actually I left it runnng, went to sleep, and I found it had not done anything in the morning, so I had to cancel it.

I need some help to fix this as soon as possible, Please help.

Pyaemia answered 14/6, 2012 at 9:23 Comment(4)
I guess the power failure corrupted the harddisk somehow, you will most probably need to restore your backupRies
Have you tried to Google it?Latent
The power surge corrupted the the back ups too...Pyaemia
Maybe this can be helpful.Virgy
M
9

Before you do anything else, http://wiki.postgresql.org/wiki/Corruption and act on the instructions. Failure to do so risks making the problem worse.


There are two configuration parameters listed in the Fine Manual that might be of use: ignore_system_indexes and zero_damaged_pages. I have never used them, but I would if I were desparate ...

I don't know if they help against toast-tables. In any case, if setting them causes your database(s) to become usable again, I would {backup + drop + restore} to get all tables and catalogs into newborn shape again. Success!

Mannino answered 14/6, 2012 at 19:34 Comment(5)
Added warning about taking a copy first; it's seriously important. zero_damaged_pages does affect TOAST tables (and indexes and everything else). It works by erasing data, not recovering it. Think very, very carefully before you use it, or pg_resetxlog, or any of the other "big hammer" recovery tools.Oliguria
Excellent advice, IMHO. Rereading the OQ, it appears there are also damaged catalogs involved.Mannino
Backup may not be possible here. Expect pg_dump to fail with errors like compressed data is corrupt if your tables use extended storage.Transpose
That's quite possibly one of the least helpful wiki pages in existence. Your answer here gives more specifics than anything on that page.Doubleedged
For me the only way to fix the issue was to remove data from tables (that was not critical for the application).Disseminule
L
8

If you have backups, just restore from them.

If not - you've just learned why you need regular backups. There's nothing PostgreSQL can do if hardware misbehaves.

In addition, if you ever find yourself in this situation again, first stop PostgreSQL and take a complete file-level backup of everything - all tablespaces, WAL etc. That way you have a known starting point.

So - if you still want to recover some data.

  1. Try dumping individual tables. Get what you can this way.
  2. Drop indexes if they cause problems
  3. Dump sections of tables (id=0..9999, 1000..19999 etc) - that way you can identify where some rows may be corrupted and dump ever-smaller sections to recover what's still good.
  4. Try dumping just certain columns - large text values are stored out-of-line (in toast tables) so avoiding them might get the rest of your data out.
  5. If you've got corrupted system tables then you're getting into a lot of work.

That's a lot of work, and then you'll need to go through and audit what you've recovered and try to figure out what's missing/incorrect.

There are more things you can do (creating empty blocks in some cases can let you dump partial data) but they're all more complicated and fiddly and unless the data is particularly valuable not worth the effort.

Key message to take away from this - make sure you take regular backups, and make sure they work.

Latonya answered 14/6, 2012 at 15:17 Comment(1)
+1 Today I'm happy to have learned about HOW IMPORTANT IS TO DO BACKUPS and I have successfully recovered a database.Iconostasis
O
1

Before you do ANYTHING ELSE, take a complete file-system-level copy of the damaged database.

http://wiki.postgresql.org/wiki/Corruption

Failure to do so destroys evidence about what caused the corruption, and means that if your repair efforts go badly and make things worse you can't undo them.

Copy it now!

Oliguria answered 15/6, 2012 at 4:9 Comment(0)
P
0

If few/specific files are corrupted, following tricks might help.

  • Restore Older dump in different node or a second installation.
  • Copy required files from RESTORED/second installation to FAILED node.
  • Stop & Start PSQL.

From Today's experience!

Error message from server: ERROR:  could not read block 226448 in file "base/12345/12345.1": Input/output error
  • try (probably it will fail) cp base/12345/12345.1 /root/backup/12345.1-orig

  • try mv base/12345/12345.1 /root/backup/12345.1-orig #expecting this to finish. Else do rm -rf base/12345/12345.1 /root/backup/12345.1-orig

Finally,

Magic of tar. (if below tar completes, you have luck!)

  • tar -zcvf my_backup.tar.gz /var/lib/postgresql/xx/main/xx
  • Extract the corrupted file from TAR.
  • Replace it in original locationbase/12345/12345.1.
  • Stop & Start PGSQL

IMPORTANT: Please try googling and do try vaccum, reindex and disk checks like fsck etc before getting to this stage.

Also, always take a Filesystem backup before doing any TRIAL and ERROR method :)

Plata answered 17/10, 2022 at 17:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.