How to allow fulltext searching with hyphens in the search query
Asked Answered
G

4

16

I have keywords like "some-or-other" where the hyphens matter in the search through my mysql database. I'm currently using the fulltext function.

Is there a way to escape the hyphen character? I know that one option is to comment out #define HYPHEN_IS_DELIM in the myisam/ftdefs.h file, but unfortunately my host does not allow this. Is there another option out there?

Here's the code I have right now:

$search_input = $_GET['search_input'];
$keyword_safe = mysql_real_escape_string($search_input);
$keyword_safe_fix = "*'\"" . $keyword_safe . "\"'*";


$sql = "
    SELECT *,
        MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix') AS score
        FROM table_name
    WHERE MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix')
    ORDER BY score DESC
";
Grubby answered 4/3, 2011 at 10:25 Comment(0)
S
19

From here http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

One solution to find a word with a dashes or hyphens in is to use FULL TEXT SEARCH IN BOOLEAN MODE, and to enclose the word with the hyphen / dash in double quotes.

Or from here http://bugs.mysql.com/bug.php?id=2095

There is another workaround. It was recently added to the manual: " Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes. "

Have not tried it on my own.

Edit: Here is some more additional info from here http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include nonword characters in the phrase. As of MySQL 5.0.3, nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase" in MySQL 5.0.3, but not before.

If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.

Spellbinder answered 4/3, 2011 at 10:54 Comment(9)
I actually noticed the first solution you mentioned. However I was unable to replicate it. Can you give me an example of such a query?Grubby
Sure, try that SELECT * FROM your_table_name WHERE MATCH (your_table_column_name) AGAINST ('"SQL-MySQL"' IN BOOLEAN MODE); Check here for moore info dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.htmlSpellbinder
FANTASTIC!! Worked like a charm! Thank you so much :)Grubby
@Yasen Zhelev Actually...I just did some testing, and noticed that searches for "blah- blah-" or "blah-blah-" (generally 2 hyphens) turns up 0 results. Any idea how to bypass this?Grubby
Could you provide a sample query? It should work no matter how many hyphens are there. Do you have rows with such strings in them?Spellbinder
@Yasen Zhelev: I just copied+pasted the code I have in the post. Right when I did it, I realized that "IN BOOLEAN MODE" doesn't allow scoring...is there a way around this? Because it defeats the purpose of using fulltext as a search engine.Grubby
I am not sure what you mean by scoring, but anyway my knowladge ends here.Spellbinder
@Yasen Zhelev: The "score" portion refers to the fulltext function's ability to rate the relevance of each row in the table against the keywords being searched. According to the documentation for fulltext searching, IN BOOLEAN MODE doesn't seem to allow this...I'm wondering if there's a way to code a score function IN BOOLEAN MODE, while allowing hyphens in the search query?Grubby
i did this and it worked until apt upgrade overwrote my charset modifications. posted a question about this here if anyone has a good idea about how to make changes stick: askubuntu.com/questions/874190/…Lianeliang
T
4

Some people would suggest to use the following query:

SELECT id 
FROM texts
WHERE MATCH(text) AGAINST('well-known' IN BOOLEAN MODE)
HAVING text LIKE '%well-known%';

But by that you need many variants depending on the used fulltext operators. Task: Realize a query like +well-known +(>35-hour <39-hour) working week*. Too complex!

And do not forget the default len of ft_min_word_len so a search for up-to-date returns only date in your results.

Trick

Because of that I prefer a trick so constructions with HAVING etc aren't needed at all:

  1. Instead of adding the following text to your database table:

    "The Up-to-Date Sorcerer" is a well-known science fiction short story.
    copy the hyphen words without hypens to the end of the text inside a comment:
    "The Up-to-Date Sorcerer" is a well-known science fiction short story.<!-- UptoDate wellknown -->
  2. If the users searches for up-to-date remove the hyphen in the sql query:
    MATCH(text) AGAINST('uptodate ' IN BOOLEAN MODE)

By that you're user can find up-to-date as one word instead of getting all results that contain only date (because ft_min_word_len kills up and to).

Of course before you echo the texts you should remove the <!-- ... --> comments.

Advantages

  • the query is simpler
  • the user is able to use all fulltext operators as usual
  • the query is faster.
  • If a user searches for -well-known +science MySQL treats that as not include *well*, could include *known* and must include *science*. This isn't what the user expected. The trick solves that, too (as the sql query searches for -wellknown +science)
Trefoil answered 29/1, 2017 at 20:53 Comment(0)
I
2

Maybe simpler to use the Binary operator.

SELECT * 
FROM your_table_name 
WHERE BINARY your_column = BINARY "Foo-Bar%AFK+LOL"

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html#operator_binary

The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character. This causes the comparison to be case sensitive even if the column is not defined as BINARY or BLOB. BINARY also causes trailing spaces to be significant.

Inarch answered 29/1, 2012 at 4:32 Comment(2)
This caused MySQL Workbench to crash for me.Heaps
This turned a 2 second query into a 12 second queryBarth
U
0

My preferred solution to this is to remove the hyphen from the search term and from the data being searched. I keep two columns in my full-text table - search and return. search contains sanitised data with various characters removed, and is what the users' search terms are compared to, after my code has sanitised those as well.

Then I display the return column.

It does mean I have two copies of the data in my database, but for me that trade-off is well worth it. My FT table is only ~500k rows, so it's not a big deal in my use case.

Unicef answered 18/6, 2019 at 17:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.