Why is pg_restore that slow and PostgreSQL almost not even using the CPU?
Asked Answered
H

2

5

I just had to use pg_restore with a small dump of 30MB and it took in average 5 minutes! On my colleagues' computers, it is ultra fast, like a dozen of seconds. The difference between the two is the CPU usage: while for the others, the database uses quite a bunch of CPU (60-70%) during the restore operation, on my machine, it stays around a few percents only (0-3%) as if it was not active at all.

The exact command was : pg_restore -h 127.0.0.1 --username XXX --dbname test --no-comments test_dump.sql

The originating command to produce this dump was: pg_dump --dbname=XXX --user=XXX --no-owner --no-privileges --verbose --format=custom --file=/sql/test_dump.sql

Look at the screenshot taken in the middle of the restore operation:

Screenshot during the restore

Here is the corresponding vmstat 1 result running the command:

Screenshot of vmstat

I've looked at the web for a solution during a few hours but this under-usage of the CPU remains quite mysterious. Any idea will be appreciated.

For the stack, I am on Ubuntu 20.04 and postgres version 13.6 is running into a docker container. I have a decent hardware, neither bad nor great.

EDIT: This very same command worked in the past on my machine with a same common HDD but now it is terribly slow. The only difference I saw with others (for whom it is blazing fast) was really on the CPU-usage from my point of view (even if they have an SSD which shouldn't be at all the limiting factor especially with a 30 MB dump).

EDIT 2: For those who proposed the problem was about IO-boundness and maybe a slow disk, I just tried without any conviction to run my command on an SSD partition I just made and nothing has changed.

Hemianopsia answered 10/3, 2022 at 15:29 Comment(5)
A dump isn't a backup, it's a batch file full of INSERT statements. Executing 30MB worth of INSERT statements will be affected by IO throughput and delays. Since you use a container, IO will be slow because the actual database files are inside the container. Increasing the size of the database will require increasing the size of the container image. This will be at least twice as slow as using a physical database fileImpresario
@PanagiotisKanavos thanks, I didn't get your point, look at the EDIT maybe but for me, it is related to computation running too slow and the CPU not being used enough compared to othersHemianopsia
@PanagiotisKanavos. 1) By default a Postgres is not compromised of INSERTs it uses COPY to import data. You can force it to use INSERTs with --inserts or --column-inserts, but that is not what eytienne did. 2) Depending on how you specify the command a dump can include more then just data. 3) A dump can be point in time backup of an entire database or using pg_dumpall an entire cluster.Amphimacer
That's still not a backup, it's a script. It may be called a "logical" backup but it's actually a script. That's slow and will cause the database file to increase, thus causing the image to increase. An actual, physical backup taken with pg_basebackup would be a lot faster to take and restoreImpresario
@PanagiotisKanavos. Huh? You have empty database you restore the pg_dump file to it and you end up with the same thing in $PGDATA as you would with pg_basebackup. I have no idea what you are talking about database file increasing.Amphimacer
B
5

The vmstat output shows that you are I/O bound. Get faster storage, and performance will improve.

Bayberry answered 10/3, 2022 at 16:4 Comment(7)
Thank you, I am not really convinced so I put an EDIT to explain it.Hemianopsia
@Hemianopsia but your edit doesn't make any sense. The same script did something different when applied to something different? So what.Mercado
What then is your interpretation of the high iowait% value? Perhaps the disk is kept busy by something else.Bayberry
@LaurenzAlbe Maybe you should drop your answer regarding my second EDIT... (if you have some other idea?)Hemianopsia
What does vmstat 1 show on your fast disk?Bayberry
I finally changed my machine and it is OK now but the other answer with tuning may help.Hemianopsia
The other answer is dangerous, bad advice.Bayberry
U
3

PostgreSQL, by default, is tuned for data durability. Usually transactions are flushed to the disk at each and every commit, forcing write-through of any disk write cache, so it seems to be very IO-bound.

When restoring database from a dump file, it may make sense to lower these durability settings, especially if the restore is done while your application is offline, especially in non-production environments.

I temporarily run postgres with these options: -c fsync=off -c synchronous_commit=off -c full_page_writes=off -c checkpoint_flush_after=256 -c autovacuum=off -c max_wal_senders=0

Refer to these documentation sections for more information:

Also this article:

Ungual answered 17/1, 2023 at 14:0 Comment(2)
I would warn everybody not to follow this advice. Setting fsync = off is not only a sacrifice in durability, but it will break your database and cause data corruption if you have a crash. The same holds for full_page_writes = off. You can only change these two parameters if you are ready to restore from backup.The other parameters are safe, but max_wal_senders will offer no benefit, as far as I know.Bayberry
@LaurenzAlbe " You can only change these two parameters if you are ready to restore from backup." Well, yeah... that's why you're doing pg_restore in the first place (especially if you have the luxury of restarting the instance).Ailment

© 2022 - 2025 — McMap. All rights reserved.