PostgreSQL: invalid page header in block
Asked Answered
A

5

22

I'm getting an Error

ERROR:  invalid page header in block 411 of relation "t_value_time"

in my PostgreSQL database. This keeps happening on different machines. Is there a way to prevent it from happening, or at least telling PSQL to ignore the data on the invalid block and move on?

I'd rather lose the data from the block and have him skip over it, reading the rest of the data. Is there a way to tell PSQL to skip this block?

Adequacy answered 7/3, 2011 at 13:36 Comment(0)
S
35

WARNING: You will lose some data!

We managed to get over it (crashed DEV VM) by issuing:

database=# SET zero_damaged_pages = on;
SET
database=# VACUUM FULL damaged_table;
WARNING: invalid page header in block xxx of relation base/yyy/zzz; zeroing out page
[..]
REINDEX TABLE damaged_table;

Fix via pwkg.ork.

Shoup answered 18/2, 2013 at 15:31 Comment(2)
This is the right solution. The VACUUM FULL and REINDEX isn't strictly necessary to clear the warning. Any operation that reads the page from disk and triggers the "invalid page header" error will fix the damaged page (by re-writing it with all zeroes) if zero_damaged_pages is on, so you could just do a query that hits the bad page.Arguseyed
@Arguseyed this is not quite true. You can read at PostgreSQL docs that "Zeroed-out pages are not forced to disk so it is recommended to recreate the table or the index before turning this parameter off again". I believe a VACUUM FULL would help though, and REINDEX may be required.Coucal
G
2

Same block every time?

From what I've read, the most common cause of invalid blocks is hardware. Red Hat has a utility, pg_filedump, that formats "PostgreSQL heap, index, and control files into a human-readable form". I don't think they support any PostgreSQL version greater than 8.4.0, but I could be wrong.

You want to prove your hardware is good by using tough, thorough disk, RAM, and NIC diagnostics.

Gyp answered 7/3, 2011 at 14:44 Comment(0)
L
1

There's no simple way to do it, but it's reasonably easy to do just by editing the data file directly (relfilenode of the pg_class entry gives the filename).

Just copy a block from elsewhere in the file over the bad block. Ideally, synthesise an empty block or update the one you're overwriting to have no valid tuples in it.

Once you've got something that doesn't produce that error, dump the table and reload it for safety.

Lunneta answered 7/3, 2011 at 18:18 Comment(0)
I
0

these are almost always hardware problems btw. Verify and test RAM, disk, CPU. Make sure your environment is good (bad power input can cause problems as can overheating). That's the best way to prevent it. Best way to address it is point in time recovery from a base backup.

Incrustation answered 28/9, 2012 at 12:53 Comment(0)
T
0

If you have a slave, set hot_standby_feedback to 'on' on it if not already. Do pg_dump and write it to /dev/null so that you don't consume any space. nohup pg_dump db_name -v -Fc -f /dev/null & If the dump succeeds, then your slave is fine. Do a failover. There will be no data loss.

Another way to validate your slave is to do, explain select count(*) from table_name; If it succeeds and if it is using a sequence scan, then your slave is good. You may not have to consider this option if it is using index scan.

Note: This works only if your master is affected with storage level corruption.

I happened to face same issue just today and i was able to fix it.

Telethon answered 11/7, 2019 at 12:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.