Search Engine Keywords Parser
Asked Answered
B

5

8

Here is what I want to do:

I need to create a search engine parser that uses the following operators:

  • Apples AND Oranges (AND operator)
  • Apples OR Oranges (OR operator)
  • Apples AND NOT Oranges (AND NOT operator)
  • " Apples " (Quotes operator)
  • Apples AND ( Oranges OR Pears ) (Parentheses operator)
  • Appl* (Star operator)

With some preg_replace, I manage to convert the string into an array and then I parsed this array to get a MySQL query. But I don't like that way and it's very unstable!

I searched the web for some script that does that and I didn't have any luck!

Can someone please help me implement this??

Thanks

Box answered 29/7, 2011 at 11:26 Comment(1)
Normally you first tokenize the input and then you run a parser on the tokenized data. My print_r converter does something similar, however, it has a different grammar.Lait
F
3

Ok, this is going to be a large answer.

I think what you need is a parser generator. A piece of software that generates code to parse text according to a given grammar. These parsers often have 2 main components: a lexer and a parser. The lexer identify TOKENS (words), the parser check whether the token order is right according to your grammar.

In the lexer, you should declare the following tokens

TOKENS ::= (AND, OR, NOT, WORD, WORDSTAR, LPAREN, RPAREN, QUOTE)
WORD ::= '/w+/'
WORDSTAR ::= '/w+\*/'

The grammar should be defined like this:

QUERY ::= word
QUERY ::= wordstar
QUERY ::= lparen QUERY rparen
QUERY ::= QUERY and QUERY
QUERY ::= QUERY or QUERY
QUERY ::= QUERY and not QUERY
QUERY ::= quote MQUERY quote
MQUERY ::= word MQUERY
MQUERY ::= word

This grammar defines a language with all the features your need. Depending on the software you use, you could define functions to handle each rule. That way, you can transform your text-query into a sql where clause.

I'm not really into php, but i searched the web for a parser generator and PHP_ParserGenerator appeared.

Keep in mind that as long as your database grows these queries may become a problem for a structured storage system.

You may want to try a full-text search engine that allows you to perform this and many other features related to text search. This is how IndexTank works

First, you add (or 'index' in search dialect) all your db records (or documents) to IndexTank.

$api = new ApiClient(...);
$index = $api->get_index('my_index');
foreach ($dbRows as $row) {
  $index->add_document($row->id, array('text' => $row->text));
}

After that, you can search in the index with all the operators you want

$index = $api->get_index('my_index');
$search_result = $index->search('Apples AND Oranges');
$search_result = $index->search('Apples OR Oranges');
$search_result = $index->search('Apples AND NOT Oranges');
$search_result = $index->search('"apples oranges"');
$search_result = $index->search('Apples AND ( Oranges OR Pears )');
$search_result = $index->search('Appl*');

I hope I answered your question.

Foregather answered 1/8, 2011 at 19:31 Comment(0)
A
1

Also, this is not exactly what you're looking for, but maybe close: MySQL Full-text searching.

Ai answered 29/7, 2011 at 12:3 Comment(2)
MySQL's built-in text searching is nice, for basic search of natural language text. But if you want to query other kinds of text, allow advanced options, or search for words shorter than four letters, then you usually have to get more creative.Xanthate
IN BOOLEAN MODE does almost exactly what the OP asks for, and setting the minimum word length lower is quite easy. It would be a quite simple replace of operators. +1 to this.Nebulize
N
0

did you look at ANTLR

Nanon answered 29/7, 2011 at 11:41 Comment(0)
A
0

You could homebrew something like the following (IMPORTANT: $search string must first be sanitized or u get hacked) ...

if (substr($search[0]=='*' and substr($search,-1)=='*') {
    // *ppl*
    $query = "SELECT * FROM `table` WHERE `field` LIKE (%'". str_replace('*','',$search) ."%')";
} elseif (substr($search,-1)=='*') {
    // Appl*
    $query = "SELECT * FROM `table` WHERE `field` LIKE ('". str_replace('*','',$search) ."%')";
} elseif ($search[0]=='*') {
    // *Appl
    $query = "SELECT * FROM `table` WHERE `field` LIKE ('%". str_replace('*','',$search) ."')";
} elseif (substr_count($search,'"')==2) {
    // " Apples " ... just remove the "
    $query = 'SELECT * FROM `table` WHERE `field` = "'. str_replace('"','',$search) .'"';
} elseif (strpos($search,')') or strpos($search,'(')) {
    // uh ... something more complex here
    $query = '#idunno';
} else {
    // the rest
    $query = 'SELECT * FROM `table` WHERE `field` = "'. $search .'"';
    $search  = array(
        ' AND ',
        ' OR ',
        ' AND NOT '
        );
    $replace = array(
        '" AND `field` = "',
        '" OR `field` = "',
        '" AND `field != "'
        );
    str_replace($search,$replace,$query);
}
Ai answered 29/7, 2011 at 11:53 Comment(0)
H
-1

Try this: http://www.isearchthenet.com/isearch/index.php

From readme:

  • Searches are normally performed with "may contain" words. A match requires any of the words entered to be present on the page.
  • You can search for pages which contain a specific word by prefixing it with a plus (+) sign. Only pages which contain that word will be shown.
  • You can ignore all pages which contain a specific word by prefixing it with a minus (-) sign. Any page that contains that word will not be displayed in the search results.
  • You can search for a specific phrase by enclosing it in double quotes ("). Only pages that contain that exact phrase will be shown.

It's easy to install and use. Also take a look at http://sphinxsearch.com/ - the most powerful engine, but not for newbies.

Herophilus answered 29/7, 2011 at 11:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.