I'm (re)confused about tsvector
fields. It's very common to see examples of text being concatenated into a tsvector field, and then indexed. If you look up PG 12 generated columns, you'll find examples showing tsvector
field population quickly.
Since the only purpose of this field is to support the index, why not build the index with an expression and skip the vector column? I asked about this in passing previous (can't find it), and remember hearing back "the column is pointless." Which makes sense to me. Recently, I've been following some comments on this article:
Fine Tuning Full Text Search with PostgreSQL 12
The piece is quite good, BTW. There's a comments like so:
One reason to materialize tsvector is not to evaluate expression at recheck, which could be quite expensive itself.
Check ‘Recheck Cond’ in explain analyze, the condition could be really checked, hence the condition (to_tsvector) will be calculated.
\d t1
Table “public.t1”
Column | Type | Collation | Nullable | Default
--–----+---------+----------–+----------+---
id | integer | | |
t | text | | |
Indexes:
"t1_to_tsvector_idx" gin (to_tsvector('english'::regconfig, t))
explain analyze select * from t1 where to_tsvector(‘english’, t) @@ to_tsquery(‘english’,’foo’);
QUERY PLAN
Bitmap Heap Scan on t1 (cost=8.05..19.02 rows=6 width=36) (actual time=0.019..0.019 rows=0 loops=1)
Recheck Cond: (to_tsvector(‘english’::regconfig, t) @@ ”’foo”’::tsquery)
-> Bitmap Index Scan on t1_to_tsvector_idx (cost=0.00..8.05 rows=6 width=0) (actual time=0.017..0.017 rows=0 loops=1)
Index Cond: (to_tsvector(‘english’::regconfig, t) @@ ”’foo”’::tsquery)
Planning Time: 0.076 ms
Execution Time: 0.051 ms
(6 rows)
And I'm confused. Is there a good case for storing the vector in the row? I'm not fluent enough in this part of Postgres to build a satisfactory test. I'm still researching FTS and fuzzy comparison options in Postgres, but will be needing to build something substantial in the next quarter, which is why I'm keen to get the details right in advance. 35M rows now, increasing rapidly.