Why PostgreSQL(timescaledb) costs more storage in table?
Asked Answered
C

1

7

I'm new to database. Recently I start using timescaledb, which is an extension in PostgreSQL, so I guess this is also PostgreSQL related.

I observed a strange behavior. I calculated my table structure, 1 timestamp, 2 double, so totally 24bytes per row. And I imported (by psycopg2 copy_from) 2,750,182 rows from csv file. I manually calculated the size should be 63MB, but I query timescaledb, it tells me the table size is 137MB, index size is 100MB and total 237MB. I was expecting that the table size should equal my calculation, but it doesn't. Any idea?

Cripple answered 23/11, 2017 at 21:53 Comment(4)
(1) postgresql.org/docs/10/static/storage-file-layout.html (2) postgresql.org/docs/10/static/storage-page-layout.htmlChelsea
Postgres has a row header of 23 bytes. So every row is 47 bytes, not just 24 bytes as you calculated. And then you need to take padding into account as wellSpoilsman
If you are concerned about storage space, then try storing data into specialized time series databases with on-disk data compression such as InfluxDB or VictoriaMetrics. They usually require much less storage space comparing to TimescaleDB. For instance, VictoriaMetrics uses 70x less storage space in TSBS benchmark comparing to TimescaleDB.Clippard
TimescaleDB now has native compression support (the comparisons that valyala left no longer apply, for example): blog.timescale.com/blog/…Bradney
V
13

There are two basic reasons your table is bigger than you expect: 1. Per tuple overhead in Postgres 2. Index size

  1. Per tuple overhead: An answer to a related question goes into detail that I won't repeat here but basically Postgres uses 23 (+padding) bytes per row for various internal things, mostly multi-version concurrency control (MVCC) management (Bruce Momjian has some good intros if you want more info). Which gets you pretty darn close to the 137 MB you are seeing. The rest might be because of either the fill factor setting of the table or if there are any dead rows still included in the table from say a previous insert and subsequent delete.
  2. Index Size: Unlike some other DBMSs Postgres does not organize its tables on disk around an index, unless you manually cluster the table on an index, and even then it will not maintain the clustering over time (see https://www.postgresql.org/docs/10/static/sql-cluster.html). Rather it keeps its indices separately, which is why there is extra space for your index. If on-disk size is really important to you and you aren't using your index for, say, uniqueness constraint enforcement, you might consider a BRIN index, especially if your data is going in with some ordering (see https://www.postgresql.org/docs/10/static/brin-intro.html).
Volcanic answered 9/1, 2018 at 18:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.