We have been using MySQL Fulltext search for several years now, but the requirements have changed. We want to parse the AND/OR/NOT parameters to the form that MySQL does understand. I've written a unit test and it became clear that this is quite complicated.
I'm sure more people run into this problem, so I suppose there must be some kind of library that is able to do this for me. I've tried Google, but unfortunately, I was not able to find such a library. Does anybody know a good one?
The library should be able to handle quotes, parenthesis, AND/OR/NOT operators, and in our case it should default to AND instead of OR (if no operator is set). Here are some of my expected results:
- 'ict' becomes '+ict'
- 'ict it' becomes '+ict +it'
- 'ict OR it' becomes 'ict it'
- 'NOT ict' becomes '-ict'
- 'it NOT ict' becomes '+it -ict'
- 'web AND (ict OR it)' becomes '+web +(ict it)'
- 'ict OR (it AND web)' becomes 'ict (+it +web)'
- 'ict NOT (ict AND it AND web)' becomes '+ict -(+ict +it +web)'
- 'php OR (NOT web NOT embedded ict OR it)' becomes 'php (-web -embedded ict it)'
- '(web OR embedded) (ict OR it)' becomes '+(web embedded) +(ict it)'
- develop AND (web OR (ict AND php))' becomes '+develop +(web (+ict +php))'
- '"ict' becomes '+"ict"'
- '"ict OR it"' stays '+"ict OR it"'
This is the function we used in the last years (which does not work properly):
/**
* Parses search string.
* @param string $s The unparsed search string.
* @return string $s The parsed search string.
*/
public function parseSearchString( $s )
{
// Place a space at the beginning.
$s = ' ' . $s;
// AND - Remove multiple spaces, AND, &.
$s = preg_replace( '/\s\s+/', ' ', $s );
$s = preg_replace( '/\sAND\s/i', ' ', $s );
$s = preg_replace( '/\s&\s/', ' ', $s );
// OR - Make replacements. Execute double, so we replace all occurences.
$s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
$s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
$s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );
$s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );
// NOT
$s = preg_replace( '/\bNOT\s(\w+)/i', '|-\\1', $s );
// Quoted strings.
$s = preg_replace( '/\s"/', ' +"', $s );
// Place + in front of words.
$s = preg_replace( '/\s(\w+)/', ' +\\1', $s );
// Replace | to spaces.
$s = preg_replace( '/\|/', ' ', $s );
return trim( $s );
}