Not a direct answer, but if anybody is looking for a PHP code to handle tokenizing of user-input search string for Full Text Searching, can use the following code:
/**
* Method to take an input string and tokenize it into an array of words for Full Text Searching (FTS).
*
* This method is used when an input string can be made up of multiple words (let's say, separated by space characters),
* and we need to use different Boolean operators on each of the words. The tokenizing process is similar to extraction
* of words by FTS parser in MySQL. The operators used for matching in Boolean condition are removed from the input $phrase.
* These characters as of latest version of MySQL (8+) are: +-><()~*:""&|@ (@ is specific for InnoDB)
* We can also execute the following query to get updated list: show variables like 'ft_boolean_syntax';
* Afterwards, the modified string is split into individual words considering either space, comma, and, period (.) characters.
* Details at: https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html
*
* @param string $phrase Input statement/phrase consisting of words
* @return array Tokenized words
* @author Madhur, 2019
*/
function tokenizeStringIntoFTSWords(string $phrase) : array {
$phrase_mod = trim(preg_replace('/[><()~*:"&|@+-]/', ' ', trim($phrase)));
$words_arr = preg_split('/[\s,.]/', $phrase_mod, null, PREG_SPLIT_NO_EMPTY);
// filter out the fulltext stop words and words whose length is less than 3.
$fts_words = array();
$fulltext_stop_words = array(
'about','are','com','for','from','how','that','this','was','what',
'when','where','who','will','with','und','the','www'
);
foreach($words_arr as $word) {
// By default MySQL FULLTEXT index does not store words whose length is less than 3.
// Check innodb_ft_min_token_size Ref: https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_ft_min_token_size
// So we need to ignore words whose length is less than 3.
if(strlen($word) < 3) continue;
// Ignore the fulltext stop words, whose length is greater than 3 or equal to 3.
// Ref: https://dev.mysql.com/doc/refman/8.0/en/fulltext-stopwords.html
if (in_array($word, $fulltext_stop_words)) continue;
$fts_words[] = $word;
}
return $fts_words;
}
Above code will handle Stopwords, minimum word length limit, and Boolean mode operators as well. So, for instance, if user inputs: Search@bar with in Javascript
, it will return an array of (Search, bar, Javascript)
. Afterwards, a Full text query can be written using this array.
'
somewhere already where....'test
closes that string. – Whitehall'
into the mix in the background somewhere. like I said, the sql you've posted above is syntactically perfect. it will NOT screw up on any else's system if they had the proper tables/fields on their system. – Whitehall