Calculating and saving space in PostgreSQL
Asked Answered
C

5

97

I have a table in pg like so:

CREATE TABLE t (
    a BIGSERIAL NOT NULL,               -- 8 b
    b SMALLINT,                         -- 2 b
    c SMALLINT,                         -- 2 b
    d REAL,                             -- 4 b
    e REAL,                             -- 4 b
    f REAL,                             -- 4 b
    g INTEGER,                          -- 4 b
    h REAL,                             -- 4 b
    i REAL,                             -- 4 b
    j SMALLINT,                         -- 2 b
    k INTEGER,                          -- 4 b
    l INTEGER,                          -- 4 b
    m REAL,                             -- 4 b
    CONSTRAINT a_pkey PRIMARY KEY (a)
);

The above adds up to 50 bytes per row. My experience is that I need another 40% to 50% for system overhead, without even any user-created indexes to the above. So, about 75 bytes per row. I will have many, many rows in the table, potentially upward of 145 billion rows, so the table is going to be pushing 13-14 terabytes. What tricks, if any, could I use to compact this table? My possible ideas below ...

Convert the real values to integer. If they can stored as smallint, that is a saving of 2 bytes per field.

Convert the columns b .. m into an array. I don't need to search on those columns, but I do need to be able to return one column's value at a time. So, if I need column g, I could do something like

SELECT a, arr[5] FROM t;

Would I save space with the array option? Would there be a speed penalty?

Any other ideas?

Cheston answered 3/6, 2010 at 13:44 Comment(1)
The PostgreSQL documentation advises to avoid the use of real and double precision types for exact floating point number representation (for example to store accounting values). Use instead the numeric data type. Just a tip in case those real values were to be used for accounting or exact result operations.Scrope
S
13

I see nothing to gain (and something to lose) in storing several numeric fields in an array.

The size of each numerical type is clearly documented, you should simply use the smallest sized type compatible with your desired range-resolution; and that's about all you can do.

I don't think (but I'm not sure) if there is some byte alignment requirement for the columns along a row, in that case a reordering of the columns could alter the space used - but I don't think so.

BTW, there is a fix overhead per row, about 23 bytes.

Spinel answered 3/6, 2010 at 14:1 Comment(3)
As of 9.2, it's 24 bytes per row for the row header and 4 bytes for the page offset (stored in the page header), or 28 bytes per row. There are other items that can come in to play, for example 1 byte per 8 columns that support NULL values (NULL values are stored as a bitmask).Amalea
@Sean: This is not quite correct. The row header (HeapTupleHeader) has 23 Bytes, not 24, according to the manual here: There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field. The difference is relevant, the NULL bitmask for tables with up to 8 columns fits into this one spare byte making NULL storage effectively free for these tables.Theretofore
Correct, however due to alignment of data types, between bytes 23 and 24 there is almost certainly a hole, and starting on byte 25 an INT begins. So the header is only 23 bytes, but the space consumed is 24 bytes.Amalea
T
267

"Column Tetris"

Actually, you can do something, but this needs deeper understanding. The keyword is alignment padding. Every data type has specific alignment requirements.

You can minimize space lost to padding between columns by ordering them favorably. The following (extreme) example would waste a lot of physical disk space:

CREATE TABLE t (
    e int2
    -- 6 bytes of padding after int2
  , a int8
  , f int2
    -- 6 bytes of padding after int2
  , b int8
  , g int2
    -- 6 bytes of padding after int2
  , c int8
  , h int2
    -- 6 bytes of padding after int2
  , d int8)

To save 24 bytes per row, use instead:

CREATE TABLE t (
    a int8
  , b int8
  , c int8
  , d int8
  , e int2
  , f int2
  , g int2
  , h int2)   -- 4 int2 occupy 8 byte (MAXALIGN), no padding at the end

db<>fiddle here
Old sqlfiddle

As a rule of thumb, if you put 8-byte columns first, then 4-bytes, 2-bytes and 1-byte columns last you can't go wrong.

boolean, uuid (!) and a few other types need no alignment padding. text, varchar and other "varlena" (variable length) types nominally require "int" alignment (4 bytes on most machines). But I observed no alignment padding in disk format (unlike in RAM). Eventually, I found the explanation in a note in the source code:

Note also that we allow the nominal alignment to be violated when storing "packed" varlenas; the TOAST mechanism takes care of hiding that from most code.

So "int" alignment is only enforced when the (possibly compressed) datum including a single leading length-byte exceeds 127 bytes. Then varlena storage switches to four leading bytes and requires "int" alignment.

Normally, you may save a couple of bytes per row at best playing "column tetris". None of this is necessary in most cases. But with billions of rows it can mean a couple of gigabytes easily.

You can test the actual column / row size with the function pg_column_size().
Some types occupy more space in RAM than on disk (compressed or "packed" format). You can get bigger results for constants (RAM format) than for table columns when testing the same value (or row of values vs. table row) with pg_column_size().

Finally, some types can be compressed or "toasted" (stored out of line) or both.

Where possible, move NOT NULL columns to the front, and columns with many NULL values to the rear. NULL values are served from the null bitmap directly, so their position in the row has no bearing on the access cost of NULL values, but they add a tiny cost for computing the offset of columns positioned to the right (further back in the row).

Overhead per tuple (row)

4 bytes per row for the item identifier - not subject to above considerations.
And at least 24 bytes (23 + padding) for the tuple header. The manual on Database Page Layout:

There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data.

For the padding between header and user data, you need to know MAXALIGN on your server - typically 8 bytes on a 64-bit OS (or 4 bytes on a 32-bit OS). If you are not sure, check out pg_controldata.

Run the following in your Postgres binary dir to get a definitive answer:

./pg_controldata /path/to/my/dbcluster

The manual:

The actual user data (columns of the row) begins at the offset indicated by t_hoff, which must always be a multiple of the MAXALIGN distance for the platform.

So you typically get the storage optimum by packing data in multiples of 8 bytes.

There is nothing to gain in the example you posted. It's already packed tightly. 2 bytes of padding after the last int2, 4 bytes at the end. You could consolidate the padding to 6 bytes at the end, which wouldn't change anything.

Overhead per data page

Data page size is typically 8 KB. Some overhead / bloat at this level, too: Remainders not big enough to fit another tuple, and more importantly dead rows or a percentage reserved with the FILLFACTOR setting.

There are a couple of other factors for size on disk to take into account:

Array types?

With an array type like you were evaluating, you would add 24 bytes of overhead for the type. Plus, array elements occupy space as usual. Nothing to gain there.

Theretofore answered 15/9, 2011 at 13:14 Comment(16)
As a rule of thumb, if you put 8-byte columns first, then 4-bytes, 2-bytes and 1-byte columns last you can't go wrong. some legit cool advice.Sympathin
This is a great/interesting answer, but one thing I don't get is why the order of column names in the CREATE TABLE is important at all. I didn't think it mattered. Why doesn't/can't postgres do this tetris optimization for you? Why is column order considered significant enough to keep as-defined?Matriarch
@Russ: Because nobody implemented the division between logical and physical column order. It's an open TODO item, but not trivial since it's messing with system catalogs all over the place. Still true almost 6 years later. Since Postgres 9.3 a VIEW can serve to present a different column order for simple cases (writes are propagated automatically, too.)Theretofore
Thanks, @Erwin. It still seems strange to leave to the user, though. I get that messing with system catalogs after table creation would be tricky, but it doesn't seem like it would be an issue if the reorder was done up front. What would be the issue with something like a storage parameter to achieve this? Something along the lines of CREATE TABLE WITH column_reorder_ok that implies "I don't care about logical column order, so fiddle with it however you like to optimize the table". Or, with your terminology, WITH column_tetris_ok. :)Matriarch
@Russ: Sounds like a useful feature. A tool to reorder columns before the table gets created, which avoids complications from messing with system catalogs. Might also be implemented in any client software, a tool to optimize CREATE TABLE statement for minimal storage ...Theretofore
FYI: your answer made it into the top 15 on news.ycombinator.com (2018-02-26). congratulations! comments are here: news.ycombinator.com/item?id=16471242Silesia
@hiroprotagonist: Thanks, this explains all the sudden love.Theretofore
@Russ, I agree that something can be done, but it's complex. When you add and remove columns, the existing rows remain compatible because the ordering does not change. Newly added columns go onto the end of the row, while deleted columns never actually disappear, but are just disabled and effectively NULL'd. (And, as a benefit, open transactions can still see that column.) Certainly some optimization can be done with a logical-physical map on the table, or even on the disk-segment (or whatever its called).Vraisemblance
@ErwinBrandstetter Would it make difference in ordering of columns, if I know some of the columns will probably have null values in most rows or one still use the simple rule of thumb as suggested?Orbit
@DebasishMitra: Columns that are (almost) always NULL don't mess with alignment, as NULL is only represented as a single bit in the NULL bitmap. Their placement is unimportant. Might go to the end of the table for homeopathic gains.Theretofore
I came to this post confused why changing VARCHAR(1) to CHAR(1) increased storage on Netezza Release 7.2.1.11-P1 [Build 4], but I confirmed column Tetris affects storage size also on Netezza.Bethune
+1 it may be worth it to say put nullable columns at the end as well because having nullable columns appear before non-null columns can make it slower to access the non-null columns that come after the nullable column.Naker
"boolean, uuid (!) and a few other types need no alignment padding" - unfortunately, booleans are subject to alignment too, as seen in the following test: dbfiddle.uk/_z-tgQYbLong
@Meglio: You are misinterpreting the test. The bigint columns cause the bloat, not the booleans.Theretofore
Hi @ErwinBrandstetter , here is my suggestion: in the first example, instead of adding a comment 6 bytes of padding after int2 to the first column (e int2), add a comment 6 bytes of padding before int8 to the second column (a int8). This would make it clearer that it is the 2nd column that caueses the alignment, not the first one. I had the wrong impression that it was the data type of the first column that caused the alignment until I read it was not the case somewhere else.Long
@Meglio: I see, this can be misleading. I moved the comment between int2 & int8, as the combination materializes alignment padding: A type that does not align at the and, followed by a type that requires alignment.Theretofore
V
17

From this great documentation: https://www.2ndquadrant.com/en/blog/on-rocks-and-sand/

For a table you already have, or perhaps one you're making in development, named my_table, this query will give the optimal order left to right.

SELECT a.attname, t.typname, t.typalign, t.typlen
FROM pg_class c
JOIN pg_attribute a ON (a.attrelid = c.oid)
JOIN pg_type t ON (t.oid = a.atttypid)
WHERE c.relname = 'my_table'
 AND a.attnum >= 0
ORDER BY t.typlen DESC
Villosity answered 1/1, 2020 at 15:43 Comment(2)
This query is really versatile. Does anybody know about a tool that automates the realignment of columns according to this query? wiki.postgresql.org/wiki/Alter_column_positionDevora
4 @Devora github.com/NikolayS/postgres_dba, see my answer for an exapleEstray
S
13

I see nothing to gain (and something to lose) in storing several numeric fields in an array.

The size of each numerical type is clearly documented, you should simply use the smallest sized type compatible with your desired range-resolution; and that's about all you can do.

I don't think (but I'm not sure) if there is some byte alignment requirement for the columns along a row, in that case a reordering of the columns could alter the space used - but I don't think so.

BTW, there is a fix overhead per row, about 23 bytes.

Spinel answered 3/6, 2010 at 14:1 Comment(3)
As of 9.2, it's 24 bytes per row for the row header and 4 bytes for the page offset (stored in the page header), or 28 bytes per row. There are other items that can come in to play, for example 1 byte per 8 columns that support NULL values (NULL values are stored as a bitmask).Amalea
@Sean: This is not quite correct. The row header (HeapTupleHeader) has 23 Bytes, not 24, according to the manual here: There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field. The difference is relevant, the NULL bitmask for tables with up to 8 columns fits into this one spare byte making NULL storage effectively free for these tables.Theretofore
Correct, however due to alignment of data types, between bytes 23 and 24 there is almost certainly a hole, and starting on byte 25 an INT begins. So the header is only 23 bytes, but the space consumed is 24 bytes.Amalea
R
6

Here is a cool tool regarding the Erwin's columns reordering suggestion: https://github.com/NikolayS/postgres_dba

It has the exact command for that -- p1:

enter image description here

Then it automatically shows you the real potential for columns reordering on all of your tables:

enter image description here

Reunionist answered 29/1, 2022 at 12:25 Comment(0)
M
2

After reading the answers from Erwin Brandstetter and jboxxx and the document linked in the latter, I improved slightly on the query to make it more versatile:

-- https://www.postgresql.org/docs/current/catalog-pg-type.html
CREATE OR REPLACE VIEW tabletetris
    AS SELECT n.nspname, c.relname,
        a.attname, t.typname, t.typstorage, t.typalign, t.typlen
    FROM pg_class c
    JOIN pg_namespace n ON (n.oid = c.relnamespace)
    JOIN pg_attribute a ON (a.attrelid = c.oid)
    JOIN pg_type t ON (t.oid = a.atttypid)
    WHERE a.attnum >= 0
    ORDER BY n.nspname ASC, c.relname ASC,
        t.typlen DESC, t.typalign DESC, a.attnum ASC;

Use like this:

SELECT * FROM tabletetris WHERE relname='mytablename';

But you can add a filter on nspname (the schema the table is in).

I also added the storage kind, which is useful information in figuring out which of the -1 ones to inline and/or oder where, and keep the relative order of existing columns with otherwise the same sort key.

Milton answered 4/1, 2023 at 17:25 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.