Let's add new column of tsvector
type:
alter table sites add column tsvector tsvector;
Now let's create a trigger that will collect lexems, organize them and put to our tsvector. We will use 4 groups (A, B, C, D) -- this a special tsvector's feature that allows you to distinguish lexems later, at search time (see examples in manual https://www.postgresql.org/docs/current/static/textsearch-controls.html; unfortunately, this feature support only up to 4 groups becase developers reserved only 2 bits for that, but we are lucky here, we need only 4 groups):
create or replace function t_sites_tsvector() returns trigger as $$
declare
dic regconfig;
part_a text;
part_b text;
part_c text;
part_d text;
begin
dic := 'simple'; -- change if you need more advanced word processing (stemming, etc)
part_a := coalesce(new.doc->>'identification', '') || ' ' || coalesce(new.doc->>'title', '') || ' ' || coalesce(new.doc->>'address', '');
select into part_b string_agg(coalesce(a, ''), ' ') || ' ' || string_agg(coalesce(b, ''), ' ')
from (
select
jsonb_array_elements((new.doc->'buildings'))->>'identification',
jsonb_array_elements((new.doc->'buildings'))->>'name'
) _(a, b);
select into part_c string_agg(coalesce(c, ''), ' ')
from (
select jsonb_array_elements(b)->>'identification' from (
select jsonb_array_elements((new.doc->'buildings'))->'deposits'
) _(b)
) __(c);
select into part_d string_agg(coalesce(d, ''), ' ')
from (
select jsonb_array_elements(c)->>'sample_id'
from (
select jsonb_array_elements(b)->'audits' from (
select jsonb_array_elements((new.doc->'buildings'))->'deposits'
) _(b)
) __(c)
) ___(d);
new.tsvector := setweight(to_tsvector(dic, part_a), 'A')
|| setweight(to_tsvector(dic, part_b), 'B')
|| setweight(to_tsvector(dic, part_c), 'C')
|| setweight(to_tsvector(dic, part_d), 'D')
;
return new;
end;
$$ language plpgsql immutable;
create trigger t_sites_tsvector
before insert or update on sites for each row execute procedure t_sites_tsvector();
^^ -- scroll it, this snippet is bigger than it looks (especially of you have MacOS w/o scrollbars...)
Now let's create GIN index to speedup search queries (makes sense if you have many rows -- say, more than hundreds or thousands):
create index i_sites_fulltext on sites using gin(tsvector);
And now we insert something to check:
insert into sites select 1, '{
"_id": "123",
"type": "Site",
"identification": "Custom ID",
"title": "SITE 1",
"address": "UK, London, Mr Tom''s street, 2",
"buildings": [
{
"uuid": "12312",
"identification": "Custom ID",
"name": "BUILDING 1",
"deposits": [
{
"uuid": "12312",
"identification": "Custom ID",
"audits": [
{
"uuid": "12312",
"sample_id": "SAMPLE ID"
}
]
}
]
}
]
}'::jsonb;
Check with select * from sites;
– you must see that tsvector
column is filled with some data.
Now let's query it:
select * from sites where tsvector @@ to_tsquery('simple', 'sample');
-- it must return our record. In this case, we search for 'sample'
word and we don't care in which group it will be found.
Let's change it and try to search only in group A ("SITE (identification, title, address)" as you described it):
select * from sites where tsvector @@ to_tsquery('simple', 'sample:A');
-- this must return nothing because word 'sample'
sits only in group D ("AUDIT (sample_id)"). Indeed:
select * from sites where tsvector @@ to_tsquery('simple', 'sample:D');
-- will again return us our record.
Notice, that you need to use to_tsquery(..)
, not plainto_tsquery(..)
to be able to address 4 groups. So you need to sanitize your input yourself (avoid using or remove special characters like &
and |
because they have special meaning in tsquery
values).
And the good news is that you can combine different groups in a single query, like this:
select * from sites where tsvector @@ to_tsquery('simple', 'sample:D & london:A');
The other way to go (e.g. if you have to work with more than 4 groups) is having multiple tsvectors, each one sitting in a separate column, build them using single query, create index (you can create single index on multiple tsvector
columns) and query addressing separate columns. It's similar to what I explained above, but perhaps less efficient.
Hope this helps.
building.identification ||';'||building.title||';'||building.address
etc. (this can be done using postgres' functions as default values, or trigger-based, if your data is modified). Then create GIN indexes on those fields -> and then construct your corresponding full text queries on those fields – Keelson