Do nullable columns occupy additional space in PostgreSQL?
Asked Answered
R

3

33

I have a table with 7 columns and 5 of them will be null. I will have a null columns on int, text, date, boolean, and money data types. This table will contain millions of rows with many many nulls. I am afraid that the null values will occupy space.

Also, do you know if Postgres indexes null values? I would like to prevent it from indexing nulls.

Reticent answered 27/8, 2012 at 16:23 Comment(0)
K
59

Basically, NULL values occupy 1 bit in the NULL bitmap. But it's not that simple.

The null bitmap (per row) is only allocated if at least one column in that row holds a NULL value. This can lead to a seemingly paradoxic effect in tables with 9 or more columns: assigning the first NULL value to a column can take up more space on disk than writing a value to it. Conversely, removing the last NULL value from the row also removes the NULL bitmap.

Physically, the initial null bitmap occupies 1 byte between the HeapTupleHeader (23 bytes) and actual column data or the row OID (if you should still be using that) - which always start at a multiple of MAXALIGN (typically 8 bytes). This leaves 1 byte of padding that is utilized by the initial null bitmap.

In effect, NULL storage is absolutely free for tables of 8 columns or less (including dropped, but not yet purged columns).
After that, another MAXALIGN bytes (typically 8) are allocated for the next MAXALIGN * 8 columns (typically 64). Etc.

More details in the manual and under these related questions:

Once you understand alignment padding of data types, you can further optimize storage:

But the cases are rare where you can save substantial amounts of space. Normally it's not worth the effort.

@Daniel already covers effects on index size.

Note that dropped columns (though now invisible) are kept in the system catalogs until the table is recreated. Those zombis can force the allocation of an (enlarged) NULL bitmap. See:

Koehler answered 27/8, 2012 at 18:4 Comment(6)
"NULL storage is absolutely free for tables of 8 columns or less." - what if only 1 column holds null? this means that the null bitmap with 1 byte will be created just for holding 1 bit?Electronarcosis
The null bitmap is there if there are any null values, with enough bytes to cover all columns, plus padding to the next multiple of MAXALIGN.Koehler
Thanks Erwin. Still not sure - what if I don't have any null values? would it still be there? (many to many table)Electronarcosis
@Dejell: No null value in the row, then no null bitmap either. That's in the answer above. Follow the links for more details.Koehler
Why does the documentation say that the header takes up 23 bytes (on most machines), but the sum of the lengths in the table there equals 27?Cerberus
Yes, confusing. I suppose t_cid ... (overlays with t_xvac) there means that those two items share the same physical field. Comments in the source code also hint as much: doxygen.postgresql.org/htup__details_8h_source.html line 72.Koehler
W
15

Whether NULL values get to the index or not depend at least on the type of the index. Basically, this would be YES for btree and gist index types, NO for hash, and it seems YES or NO for gin index types depending on PostgreSQL version.

There used to be a boolean column amindexnulls in the pg_catalog.pg_am table that carried that information, but it's gone in 9.1. Probably because indexes have got even more sophisticated among PG improvements.

In the specific case of your data, the best way to know would be to measure the size difference of indexes, using the pg_relation_size('index_name') function, between contents entirely NULL and entirely NOT NULL, with your exact PG version, exact datatype, exact index type and definition. And know that possibly, a future change in any of these parameters may change the outcome.

But in any case, if you "just" want to avoid indexing NULLs, it's always possible to create a partial index:

CREATE INDEX partial_idx(col) ON table WHERE (col is not null)

This is going to take less space, but whether this is going to help or not with the performance of queries depend on these queries.

Whitish answered 27/8, 2012 at 17:16 Comment(2)
+1 insightful. With partial indexes it is important to note that they can only be utilized by the query planner if it can verify that the condition is covered in the query. The query planner is smart, but fails to resolve complex logic in this respect. Match the WHERE clause more or less verbatim to make sure the index can be used (possible AND-ed with more conditions). Details in the manual.Koehler
Cool..Thanks for your insightful answerReticent
F
2

I believe each would use a single bit in the bitmap for the row. See here: http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

Floatation answered 27/8, 2012 at 16:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.