PostgreSQL: Create index on length of all table fields

Asked 7/12, 2015 at 21:0 Answered 7/12, 2015 at 23:1

performance postgresql database-performance database-indexes

I have a table called profile, and I want to order them by which ones are the most filled out. Each of the columns is either a JSONB column or a TEXT column. I don't need this to a great degree of certainty, so typically I've ordered as follow:

SELECT * FROM profile ORDER BY LENGTH(CONCAT(profile.*)) DESC;

However, this is slow, and so I want to create an index. However, this does not work:

CREATE INDEX index_name ON profile (LENGTH(CONCAT(*))

Nor does

CREATE INDEX index_name ON profile (LENGTH(CONCAT(CAST(* AS TEXT))))

Can't say I'm surprised. What is the right way to declare this index?

Emilyemina answered 7/12, 2015 at 21:0 Comment(1)

"does not work" is a poor description of the problem. – Huntsville 7/12, 2015 at 21:10

To measure the size of the row in text representation you can just cast the whole row to text, which is much faster than concatenating individual columns:

SELECT length(profile::text) FROM profile;

But there are 3 (or 4) issues with this expression in an index:

The syntax shorthand profile::text is not accepted in CREATE INDEX, you need to add extra parentheses or default to the standard syntax cast(profile AS text)
Still the same problem that @jjanes already discussed: only IMMUTABLE functions are allowed in index expressions and casting a row type to text does not pass this requirement. You could build a fake IMMUTABLE wrapper function, like Jeff outlined.
There is an inherent ambiguity (that applies to Jeff's answer as well!): if you have a column name that's the same as the table name (which is a common case) you cannot reference the row type in CREATE INDEX since the identifier always resolves to the column name first.
Minor difference to your original: This adds column separators, row decorators and possibly escape characters to the text representation. Shouldn't matter much to your use case.

However, I would suggest a more radical alternative as crude indicator for the size of a row: pg_column_size(). Even shorter and faster and avoids issues 1, 3 and 4:

SELECT pg_column_size(profile) FROM profile;

Issue 2 remains, though: pg_column_size() is also only STABLE. You can create a simple and cheap SQL wrapper function:

CREATE OR REPLACE FUNCTION pg_column_size(profile)
  RETURNS int LANGUAGE sql IMMUTABLE AS
'SELECT pg_catalog.pg_column_size($1)';

and then proceed like @jjanes outlined. More details:

Does PostgreSQL support "accent insensitive" collations?

Note that I created the function with the row type profile as parameter. Postgres allows function overloading, which is why we can use the same function name. Now, when we feed the matching row type to pg_column_size() our custom function matches more closely according to function type resolution rules and is picked instead of the polymorphic system function. Alternatively, use a separate name and possibly make the function polymorphic as well ...

Is there a way to disable function overloading in Postgres

Spellbound answered 7/12, 2015 at 23:1 Comment(0)

You can declare a function which is falsely marked "immutable" and build an index on that.

CREATE OR REPLACE FUNCTION len_immut(record)
 RETURNS int
 LANGUAGE plperl
 IMMUTABLE
AS $function$
  ## This function lies about its immutability.
  ## Use it with care.  It is useful for indexing
  ## entire table rows.
  return length(join ",", values %{$_[0]});
$function$

and then

create index on profile (len_immut(profile));

SELECT * FROM profile ORDER BY len_immut(profile) DESC;

Since the function is falsely labelled as immutable, the index may become out of date if you do things like add or drop columns on the table, or change the types of columns.

Huntsville answered 7/12, 2015 at 21:29 Comment(0)

Recommended topics

Hot tags