Is there a good PHP library available to parse boolean search operators to MySQL fulltext operators? [closed]
Asked Answered
P

1

6

We have been using MySQL Fulltext search for several years now, but the requirements have changed. We want to parse the AND/OR/NOT parameters to the form that MySQL does understand. I've written a unit test and it became clear that this is quite complicated.

I'm sure more people run into this problem, so I suppose there must be some kind of library that is able to do this for me. I've tried Google, but unfortunately, I was not able to find such a library. Does anybody know a good one?

The library should be able to handle quotes, parenthesis, AND/OR/NOT operators, and in our case it should default to AND instead of OR (if no operator is set). Here are some of my expected results:

  • 'ict' becomes '+ict'
  • 'ict it' becomes '+ict +it'
  • 'ict OR it' becomes 'ict it'
  • 'NOT ict' becomes '-ict'
  • 'it NOT ict' becomes '+it -ict'
  • 'web AND (ict OR it)' becomes '+web +(ict it)'
  • 'ict OR (it AND web)' becomes 'ict (+it +web)'
  • 'ict NOT (ict AND it AND web)' becomes '+ict -(+ict +it +web)'
  • 'php OR (NOT web NOT embedded ict OR it)' becomes 'php (-web -embedded ict it)'
  • '(web OR embedded) (ict OR it)' becomes '+(web embedded) +(ict it)'
  • develop AND (web OR (ict AND php))' becomes '+develop +(web (+ict +php))'
  • '"ict' becomes '+"ict"'
  • '"ict OR it"' stays '+"ict OR it"'

This is the function we used in the last years (which does not work properly):

/**
 * Parses search string.
 * @param string $s The unparsed search string.
 * @return string $s The parsed search string.
 */
public function parseSearchString( $s )
{
    // Place a space at the beginning.
    $s = ' ' . $s;

    // AND - Remove multiple spaces, AND, &.
    $s = preg_replace( '/\s\s+/', ' ', $s );
    $s = preg_replace( '/\sAND\s/i', ' ', $s );
    $s = preg_replace( '/\s&\s/', ' ', $s );

    // OR - Make replacements. Execute double, so we replace all occurences.
    $s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s(?:OR|\|)\s(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );
    $s = preg_replace( '/(\w+)\s*(?:\\\|\\/)\s*(\|?\w+)/i', '|\\1|\\2', $s );

    // NOT
    $s = preg_replace( '/\bNOT\s(\w+)/i', '|-\\1', $s );

    // Quoted strings.
    $s = preg_replace( '/\s"/', ' +"', $s );

    // Place + in front of words.
    $s = preg_replace( '/\s(\w+)/', ' +\\1', $s );

    // Replace | to spaces.
    $s = preg_replace( '/\|/', ' ', $s );

    return trim( $s );
}
Preference answered 15/4, 2013 at 13:41 Comment(3)
Personally, I think this is a MARVELOUS question, and am curious to know if you ever found a solutionVermillion
Back then, I wrote a custom keyword tokenizer and parser in php, which did all the magic. It worked quite well for us, but it wasn't that pretty. We never published the code. We've been using it for a few years, before switching to Elasticsearch.Preference
Interestingly enough, someone else saw your question and wrote a class to answer it - and quite nicely I might add. They even used the same sample data. github.com/DuncanOgle/BooleanSearchParserVermillion
L
0

This probably doesn't answer your question but if you need more advanced full-text search capabilities I'd recommend using something like sphinx or apache solr. https://mcmap.net/q/56391/-choosing-a-stand-alone-full-text-search-server-sphinx-or-solr-closed

Lipography answered 15/4, 2013 at 14:5 Comment(2)
Thank you for your answer. We are aware of solutions like Sfinx, Xapian and Solr. But for now, we would like to stay with the full-text search.Preference
Someone made a couple classes to parse strings for BOOLEAN mysql queries. github.com/DuncanOgle/BooleanSearchParserKayne

© 2022 - 2024 — McMap. All rights reserved.