mysql boolean mode fulltext search with wildcards and literals
Asked Answered
N

2

7

I'm pretty new to MySQL full-text searches and I ran into this problem today:

My company table has a record with "e-magazine AG" in the name column. I have a full-text index on the name column.

When I execute this query the record is not found:

SELECT id, name FROM company WHERE MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);

I need to work with quotes because of the dash and to use the wildcard because I implement a "search as you type" functionality.

When I search for the whole term "e-magazine AG", the record is found.

Any ideas what I'm doing wrong here? I read about adding the dash to the list of word characters (config update needed) but I'm searching for a way to do this programmatically.

Numerable answered 19/8, 2011 at 10:39 Comment(3)
Interesting question. I can replicate it here in a table with a uuid like field. SELECT id, bootid FROM socket WHERE MATCH(bootid) AGAINST('+"18bda775"' IN BOOLEAN MODE); will work and so will if i search for "18bda775-711c", but searching for "18bda" or "18bda775-711" won't work. the bootid field in question contains: 18bda775-711c-4329-9de2-a3d81dc13d06Futility
We also tried changing the boolean control characters to use # instead of - but it had no effect (yes, i repaired the table index after the change): ft_boolean_syntax=+ #><()~*:""&|Numerable
In MySql 8, '+"e-magazi"*' produces a syntax error but moving the * char inside the quotes does not. v8 seems to not like the wildcard after the quotes like the OP has hereNitrometer
G
5

This clause

MATCH(name) AGAINST('+"e-magazi"*' IN BOOLEAN MODE);

Will search for a AND "e" AND NOT "magazi"; i.e. the - inside "e-magazi" will be interpreted as a not even though it is inside quotation marks.
For this reason it will not work as expected.
A solution is to apply an extra having clause with a LIKE.

I know this having is slow, but it will only be applied to the results of the match, so not too many rows should be involved.

I suggest something like:

SELECT id, name 
FROM company 
WHERE MATCH(name) AGAINST('magazine' IN BOOLEAN MODE)
HAVING name LIKE '%e-magazi%';
Gaulish answered 26/9, 2011 at 8:3 Comment(3)
thanks for the info but it doesn't help me because the search string is entered by the user who wants to search for a company. in other words, the backend which executes the query cannot know that "e-magazi" is part of the string "e-magazine". the search field should accept both word parts and full words.Numerable
Where do you see white spaces in front or after the search phrase?! The + is a regular fulltext operator and not a white space!Hotheaded
Note that in the suggestion above IN BOOLEAN MODE isnt even necessary. If no operators are used, MySql docs say it's equivalent to not using boolean mode at all. No harm in having it there either, just can shorten the SQL query for youNitrometer
H
3

MySQL fulltext treats the word e-magazine in a text as a phrase and not as a word. Because of that it results the two words e and magazine. And while it builds the search index it does not add the e to the index because of the ft_min_word_len (default is 4 chars).

The same length limitation is used for the search query. That is the reason why a search for e-magazine returns exactly the same results as a-magazine because a and - is fully ignored.

But now you want to find the exact phrase e-magazine. By that you use the quotes and that is the complete correct way to find phrases, but MySQL does not support operators for phrases, only for words:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

With this modifier, certain characters have special meaning at the beginning or end of words in the search string

Some people would suggest to use the following query:

SELECT id, name 
FROM company 
WHERE MATCH(name) AGAINST('e-magazi*' IN BOOLEAN MODE)
HAVING name LIKE 'e-magazi%';

As I said MySQL ignores the e- and searches for the wildcard word magazi*. After those results are optained it uses HAVING to aditionally filter the results for e-magazi* including the e-. By that you will find the phrase e-magazine AG. Of course HAVING is only needed if the search phrase contains the wildcard operator and you should never use quotes. This operator is used by your user and not you!

Note: As long you do not surround the search phrase with % it will find only fields that start with that word. And you do not want to surround it, because it would find bee-magazine as well. So maybe you need an additional OR HAVING name LIKE ' %e-magazi%' OR HAVING NAME LIKE '\\n%e-magazi%' to make it usable inside of texts.

Trick

But finally I prefer a trick so HAVING isn't needed at all:

  1. If you add texts to your database table, add them additionally to a separate fulltext indexed column and replace words like up-to-date with up-to-date uptodate.
  2. If a user searches for up-to-date replace it in the query with uptodate.

By that you can still find specific in user-specific but up-to-date as well (and not only date).

Bonus

If a user searches for -well-known huge ports MySQL treats that as not include *well*, could include *known* and *huge*. Of course you could solve that with an other extra query variant as well, but with the trick above you remove the hyphen so the search query looks simply like that:

SELECT id
FROM texts
WHERE MATCH(text) AGAINST('-wellknown huge ports' IN BOOLEAN MODE)
Hotheaded answered 29/1, 2017 at 20:23 Comment(1)
I even made a custom build of MariaDB with custom word delimiter/collation. This does not help, hence a dash in the search term is been always interpreted as logical command in the boolean search. So your trick with removing the dashes/dots for the index column is just great - now I can use the indexed search instead of looping through millions of lines with like %foo-bar% - which dramatically decreased my search time. Thanks man!Enrica

© 2022 - 2024 — McMap. All rights reserved.