Calculate tuple size
Asked Answered
T

2

8

I am trying to understand how the column order minimize the table size in PostgreSQL.

Example:

CREATE TABLE test (
 column_1 int
,column_2 int
,column_3 bigint
,column_4 bigint
,column_5 text
,column_6 text
,column_7 numeric(5,2)
,column_8 numeric(5,2)
,column_9 timestamp
,column_10 boolean
,column_11 boolean
);

INSERT INTO test
  VALUES(1,1,1,1,'test','test_1',12,12,current_timestamp,true,false);

SELECT pg_column_size(test.*) FROM test;

 pg_column_size 
----------------
       82
    (1 row)

The tuple size:

23 byte overhead of tuple header + 1 byte for NULL bitmap, so:

24+4+4+8+8+5+7+5+5+8+1+1=80 but the actual tuple size is 82.

Is there any extra overhead of 2 byte?

I understood the example given on below link:
Calculating and saving space in PostgreSQL

If we remove the column_8 numeric(5,2) then also the tuple size remains the same, i.e.: 82.

I have reorder the table to minimize the tuple size and gives 80.

CREATE TABLE test (
 column_3 bigint
,column_4 bigint
,column_9 timestamp
,column_1 int
,column_2 int
,column_10 boolean
,column_11 boolean
,column_7 numeric(5,2)
,column_8 numeric(5,2)
,column_5 text
,column_6 text);

INSERT INTO test
  VALUES(1,1,current_timestamp,1,1,true,false,12,12,'test','test_1');

SELECT pg_column_size(test) FROM test;

 pg_column_size 
----------------
      80

Is there any suggestion for column order in PostgreSQL?

Tolson answered 10/7, 2015 at 5:8 Comment(0)
A
8

You missed another 2 bytes of padding before column_9 timestamp, which needs to start at a multiple of 8 bytes.

24+4+4+8+8+5+7+5+5+8+1+1=80 but the actual tuple size is 82.  
------------------^ <----- 2 bytes of padding here

That's also the reason for this:

If we remove the column_8 numeric(5,2) then also the tuple size remains the same, i.e.: 82.

After taking out that column which occupies 5 bytes you get 7 bytes of padding at the same spot - the worst case.

Also note that this row actually occupies 88 bytes on disk, because the tuple header of the next tuple is left aligned (starts at a multiple of MAXALIGN, typically 8 bytes).

Your modified row ends at a multiple of 8 bytes does not incur extra padding and needs just the 80 bytes.

Both need another 4 bytes for the tuple pointer in the page header, though.

It's a game of "column tetris", the basics of which you seem to have understood already. Typically you won't gain much, don't think too hard about it. There are extreme corner cases, though. Null values change the game per row.

You need to know size, alignment and padding requirements for each data type and the special rules for the NULL bitmap.

Related answer with detailed calculation on dba.SE:

Appositive answered 10/7, 2015 at 7:49 Comment(4)
The fixed size types should be placed like 8-bytes first then 4-byte then 2 byte and then 1 byte. What about variable size type like char(n), varchar(n), text,numeric(p,s). I thinks its numeric(p,s),char(n),varchar(n) and then text. Is my understanding correct?Tolson
@user3756488: You can mix variable size types that don't need to be aligned freely (chose a meaningful order without regard to storage). Only types that need to be aligned make a difference in that alignment may require padding. It also helps performance (not storage) a tiny bit to have fixed-length, not-null columns first.Appositive
Thanks Erwin Brandstetter.Tolson
I would like to know the data types which doesn't require padding?Tolson
W
-2

This is very simple in MS-SQL to calculate using sys.objects and sys.columns, I am sure there must some similar solution in PostgreSQL:

/* Total Size of Tuple */ select sum(c.max_length) [Total Tuple Size] from sys.columns c inner join sys.objects o on c.object_id = o.object_id inner join sys.types t on c.user_type_id = t.user_type_id where o.name = 'your-table-name'

/* Column wise byte size */ select c.name [Column Name] ,t.name [Type] ,c.max_length [Maximum Length in Bytes] from sys.columns c inner join sys.objects o on c.object_id = o.object_id inner join sys.types t on c.user_type_id = t.user_type_id where o.name = 'your-table-name' order by c.column_id

Weird answered 18/8, 2021 at 14:22 Comment(1)
This is invalid for PostgresInartificial

© 2022 - 2024 — McMap. All rights reserved.