tsvector only supports English?
Asked Answered
C

2

3

I did the following:

ALTER TABLE blog_entry ADD COLUMN body_tsv tsvector;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON blog_entry 
    FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(body_tsv, 'pg_catalog.english', body);

CREATE INDEX blog_entry_tsv ON blog_entry USING gin(body_tsv);

UPDATE blog_entry SET body_tsv=to_tsvector(body);

Now this is working:

SELECT title FROM blog_entry WHERE body_tsv @@ plainto_tsquery('hello world');

But when trying to search for non-English text, it's not working at all (no results).

I am using v9.2.2

Please help.

Corollary answered 13/1, 2013 at 5:10 Comment(0)
D
10

It's been a while since I played with this, but you need to create the ts_vector in the correct language, not the ts_query.

So when you update your table, use:

UPDATE blog_entry SET body_tsv=to_tsvector('german', body);

You can also extend the functionality and use an ispell dictionary to make stemming better to the text search engine (although it still won't be as sophisticated as e.g. Solr)

To do that, download the ISPELL dictionary that is e.g. contained in the OpenOffice German dictionary

The .oxt file is actually a .zip file, so you can simply extract its content.

Then copy the file de_DE_frami.dic to the PostgreSQL "share/tsearch_data" directory while changing the extension to .dict (which is what PostgreSQL expects.

Then copy the file de_DE_frami.aff to the same directory, changing the extension to .affix.

You need to convert both (text) files to UTF-8 in order for them to work with PostgreSQL

Then register that dictionary using:

CREATE TEXT SEARCH CONFIGURATION de_config (copy=german);

CREATE TEXT SEARCH DICTIONARY german_stem (
    TEMPLATE = snowball,
    Language = german
);

CREATE TEXT SEARCH DICTIONARY german_ispell (
    TEMPLATE = ispell,
    dictfile = de_DE_frami,
    afffile = de_de_frami
);

alter text search configuration de_config 
     alter mapping for asciiword WITH german_ispell, german_stem;

Once that is done, you can create your ts_vector using:

UPDATE blog_entry SET body_tsv=to_tsvector('de_config', body);

This is also described in the manual: http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

Dannadannel answered 13/1, 2013 at 8:48 Comment(1)
horse, I can't find anything for tsvector support on Chinese. Do you know of anything?Alaynaalayne
I
0

I know it's been a while for this question, but I was searching about changing the FTS language and found an other solution. (and better than download a dictionary)

on Postgres CLI you can use the command to get a List of text search configurations: \dF

  • Check your current configuration:

show default_text_search_config;

  • Change your text search configuration to another language:

set default_text_search_config = 'pg_catalog.[language]';

Inquiry answered 13/5, 2019 at 17:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.