Postgres pg_dump dump file size bigger than pg_database_size
Asked Answered
W

2

9

I have a client made a backup using pg_dump and its size is almost 700GB while the pg_database_size is just about 195GB. I already did run the VACUUM FULL but the result still the same.

How could this be possible? As far as I know dump file should be smaller because of compression.

I used PostgreSQL version 9.5. Firstly i was thinking it's a bug, then i used a binary of PostgreSQL 10 but the result is also the same.

We use regular command for backup

pg_dump -U username -d dbname > dbname.sql

What could be the problem?

Walliw answered 17/4, 2020 at 9:49 Comment(2)
Do you have a lot of huge text (or json or jsonb) values in the database? Those are stored compressed in the database, but obviously not in the SQL dump. The same goes for bytea columns which will use twice as much space in the SQL dump due to the encoding needed for binary valuesBurtis
Also, vacuum full will have no effect on the logical dumps from pg_dump. It could reduce the size of a file level backup from pg_basebackup, if your tables are bloated.Rebeca
H
7

pg_dump file can be smaller than database size due to the following reasons:

  1. pg_dump doesn't dump the indexes blocks. It only stores the definitions of indexes.
  2. If your database has boats, then the size of the database will be higher.

If you SQL Dump is bigger than the database, it could be possible you have bigger data for TEXT/JSONB/JSON/BYTEA, etc. datatypes which get toasted and compressed in PG.

Henrie answered 17/4, 2020 at 11:24 Comment(2)
The question is why the dump is bigger than the database.Burtis
Ah! Got it! The only possible reason is JSON/JSONB/TEXT/BYTEA. These gets toasted and compressed in the database.Henrie
A
2

"As far as I know dump file should be smaller because of compression."

Your pg_dump command line shows that it is in plain format and has not requested any compression. Nor do you show the output being piped through a compression program. So there is no compression applied. Compression is used by default for the custom format, but not the plain format.

The database on the other hand automatically compresses some data in some circumstances. That compression is usually not very good, but apparently in your case it is good enough to be better than nothing (as well as to overcome the extra space used by indexes). The actual database also stores numbers in binary form, which is more compact than textual form used by pg_dump.

Arlina answered 17/4, 2020 at 14:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.