How to reduce PostgreSQL database size?
Asked Answered
M

3

7

I was planning to move from MySQL to PostgreSQL because I wanted to make use of TimescaleDB.

Everything looked good until I did a check on the storage size used by PostgreSQL (v11.2) compared to MySQL (v5.6). For exactly the same number of rows (1,440,000) and content:

  • MySQL: 156 MB
  • PostgreSQL: 246 MB
  • PostgreSQL + TimescaleDB (partitioned/chunked data): 324 MB

The MySQL and PostgreSQL numbers are like for like (i.e. including indexes and other constraints), the PostgreSQL + TimescaleDB has overhead of adding a timestamp to the table. The table concerned looks like this:

create table cell(
    cell_id            serial not null
   ,ts                 timestamp not null
   ,parent_id          int references parent( parent_id )
   ,instance_id        smallint
   ,v                  float
   ,a                  float
   ,t                  float
   ,s1                 float
   ,s2                 float
   ,s3                 float
   ,s4                 float
   ,s5                 float
   ,primary key( cell_id )
);
create index ix_cell_pid on cell( parent_id );
create index ix_cell_inst on cell( instance_id );

Why does PostgreSQL occupy so much more storage than MySQL?
And is there some way of significantly reducing it near to the MySQL level?

Manion answered 12/4, 2019 at 15:44 Comment(0)
R
12

Adding a timestamp column should add no more than 11 MB in your case (1440000 * 8 bytes, no added padding).

Have you run VACUUM FULL in Postgres before you measured the size, for a fair comparison? I suspect table and index bloat.

Related:

In MySQL the data type float is the single-precision floating point type occupying 4 bytes.

In Postgres the same float is the double-precision floating point type, occupying 8 bytes (alias: float8 or double precision).

That should explain another 44 MB of the difference. To compare apples with apples, create the Postgres table with 4-byte real columns (alias float4). Note the difference to MySQL, where real is used for 8-byte floating point numbers! Unfortunate disagreements.

The MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html
The Postgres manual: https://www.postgresql.org/docs/current/datatype-numeric.html

Related:

You show two indexes. Depending on what those are for, one multicolumn index might be able to replace both in Postgres - occupying just as much disk space as one of the two it replaces in this particular case (saving ~ 50 MB with the given specs).

create index ix_cell_pid on cell( parent_id, instance_id );

Consider:

Repeated answered 12/4, 2019 at 15:53 Comment(2)
Thx. Just did a vacuum full on the table for the PostgreSQL + TimescaleDB case. It reduced it to 297 MB, which is still quite a bit more than MySQL's 156 MB. Thanks for the references. Will read through them.Manion
@RayCh: There is another elephant in the room: in MySQL float <> Postgres float. See added explanation.Repeated
W
3

In addition to Erwin Brandstetter's answer:

There are also the usually hidden system columns to take account of (used to implement MVCC) along with other tuple information such as hint bits (used to cache row visibility).

You can use the pageinspect extension to look at this information, e.g.:

create extension pageinspect;
select * from heap_page_items(get_raw_page('cell', 0));

See here for a description of the system columns.

Your indexes, too, will contain empty space to allow further tuples to be inserted more quickly. Look for "fillfactor" in the CREATE INDEX docs.

Waites answered 15/4, 2019 at 9:35 Comment(0)
L
1
  1. connect to the database ( in psql, type \c your-db-name)

  2. checkout how many size each table cost. refer to: https://wiki.postgresql.org/wiki/Disk_Usage

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 100;

  1. delete or truncte the table.
delete ...
truncte <table_name>  cascade; 
  1. (if disk space not released) close other client which connected to postgres
Ludovico answered 28/2, 2022 at 0:52 Comment(2)
worthy of mention: You have to be connected to the database before issuing the query in #1. Saved a dev.Dominy
worthy of mention: if you are considering option 2 in production, rethink, take a break for 3 days, and rethink and talk to your shrink, peers, middle managements, and then think again before execution.Madiemadigan

© 2022 - 2024 — McMap. All rights reserved.