Postgres - Full Text Search to accept emojis
Asked Answered
D

1

8

I want to create a Full Text Search that accepts emojis on the query, or another type of index to search on text. For example, I have this text: Playa 🌊🌞🌴 @CobolIquique h' and PostgreSQL parse it weirdly on the emojis.

Debugging, Using SELECT * FROM ts_debug('english','Playa 🌊🌞🌴 @CobolIquique h'); I have the following result:

Results 1

And I don't know why the token is considered an space symbol. If I debug the parser SELECT * FROM ts_parse('default', 'Playa 🌊🌞🌴 @CobolIquique h'); I just get the same tokens and with the tokens types ts_token_type('default') there is not a emoji type (or something similar). So, How can I create a parser to split the string correctly with the spaces and doesn't consider emojis as blank spaces? or How can I create a text index that can use emojis on the queries?

Discourtesy answered 27/9, 2016 at 15:3 Comment(7)
I'm not used to full-text search, but have you tried with different dictionaries (like Snowball)? See postgresql.org/docs/current/static/textsearch-dictionaries.html May be you have to customize a dictionary (see examples in the previous link). – Sublunary
Yes, I have tried with different dictionaris (I have already build one with my needs) but the problem is the step before, that is the parser ): – Discourtesy
Have you tried with CREATE TEXT SEARCH PARSER and ALTER TEXT SEARCH PARSER? postgresql.org/docs/9.6/static/sql-createtsparser.html – Sublunary
I have read that page but I didn't get how I could personalize a parser (or tokens), I will play with it for a while – Discourtesy
I believe parser tokenizes your string correctly, and it just can't convert it to lexeme (which does not surprise). I think your own solution (building a dictionary for your needs) is right solution?.. – Pero
Did you find a solution? I'm facing the same problem right now. – Cindacindee
Sadly no... I never create a parser. On other site, I was hinted that I need to create the parser on C dba.stackexchange.com/questions/156149/… , but I didin't tried – Discourtesy
E
1

To create a new parser, which is different from default one, you should be a C programmer and you should write your own PostgreSQL extension. This extension should define the following functions:

start_function();
gettoken_function();
end_function();
lextypes_function();
headline_function(); // optional

As an example you can examine pg_tsparser module.

Edp answered 25/7, 2017 at 10:53 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.