How can I find out the size of each column in a Redshift table?
Asked Answered
P

3

4

While trying out different compression settings in Redshift it would be very useful to know the size of each column. I know how to get the size of a table, but I want to know the size of each individual column in that table.

Photocompose answered 28/10, 2015 at 10:28 Comment(0)
T
8

This query will give you the size (MB) of each column. What it does is that it counts the number of data blocks, where each block uses 1 MB, grouped by table and column.

SELECT
  TRIM(name) as table_name,
  TRIM(pg_attribute.attname) AS column_name,
  COUNT(1) AS size
FROM
  svv_diskusage JOIN pg_attribute ON
    svv_diskusage.col = pg_attribute.attnum-1 AND
    svv_diskusage.tbl = pg_attribute.attrelid
GROUP BY 1, 2

You can read more about the two tables involved in the query here: SVV_DISKUSAGE & pg_attribute.

Taraxacum answered 28/10, 2015 at 10:41 Comment(0)
B
3

A more accurate size of the table would include the hidden system columns deletexid, insertxid, oid (ROW ID), as well. One of my tables was using 752 blocks without including the hidden columns. When i added the hidden columns, it went upto 1063 blocks.

SELECT col, attname, COUNT(*) AS "mbs"
FROM stv_blocklist bl
JOIN stv_tbl_perm perm
  ON bl.tbl = perm.id AND bl.slice = perm.slice    
LEFT JOIN pg_attribute attr ON
  attr.attrelid = bl.tbl
  AND attr.attnum-1 = bl.col  
WHERE perm.name = '<TABLE-NAME>'  
GROUP BY col, attname
ORDER BY col;
Bimestrial answered 14/6, 2017 at 18:20 Comment(1)
Useful, but the premise of the original question is testing the effect of different encodings on user-columns, not the full table size, so hidden columns not that important. For Redshift, it might be better to query the total table info from SVV_TABLE_INFOIndefectible
H
0

I've come up with a view that contains both the table size and the column size - I believe one would usually want to optimize the largest columns of the largest tables:

    with db_stats as (SELECT database, schema, "table", size, tbl_rows, table_id
                  FROM SVV_TABLE_INFO
                  order by size desc),
table_stats as (
SELECT
    tbl,
  TRIM(name) as table_name,
  TRIM(pg_attribute.attname) AS column_name,
  COUNT(1) AS size
FROM
  svv_diskusage JOIN pg_attribute ON
    svv_diskusage.col = pg_attribute.attnum-1 AND
    svv_diskusage.tbl = pg_attribute.attrelid
GROUP BY 1, 2, 3 )

select database, schema, "table", tbl_rows, column_name, db_stats.size as table_size, table_stats.size as column_size
from db_stats left join table_stats on db_stats.table_id = table_stats.tbl
order by db_stats.size desc, table_stats.size desc;
Hales answered 13/11, 2023 at 18:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.