Getting error on a specific query
Asked Answered
P

3

10

Novice on Lucene here. I'm using it with Hibernate in a java client, and have been getting this error on a particular query:

HSEARCH000146: The query string 'a' applied on field 'name' has no meaningfull tokens to  
be matched. Validate the query input against the Analyzer applied on this field.

Search works fine for all other queries, even with empty resultset. My testing DB does have this record with 'a'. What could be wrong here?

Porky answered 7/12, 2012 at 15:6 Comment(1)
I think you need to surround the code susceptible to generate this exception by a try-catch bloc to catch the EmptyQueryException it works for me when using stopwords.Palliate
J
10

'a' is a stopword, and will be filtered out of your query by the StandardAnalyzer. Stopwords are words which are common enough in the language your searching in, and are not deemed meaningful to generating search results. It's a short list, but 'a' is one of them in English.

Since the Analyzer has got rid of that term, and it was the only term present, you now are sending an empty query, which is not acceptable, and searching fails.

For the curious, these are the standard Lucene english stopwords:

"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"

If you don't want stop words to be removed, then you should set up your Analyzer without a StopFilter, or with an empty stop word set. In the case of StandardAnalyzer, you are able to pass in a custom stop set to the constructor:

Analyzer analyzer = new StandardAnalyzer(CharArraySet.EMPTY_SET);
Jodhpur answered 7/12, 2012 at 17:26 Comment(4)
I too would like a work around. I am attempting to search for vehicles by make in a type ahead field. So when the user types For, it errors, but should return Fords...Affable
@CodeJunkie You can use .ignoreAnalyzer().Smitherman
@Smitherman - Seems a bit like throwing the baby out with the bathwater to me. I think just using an analyzer without stop words is probably what is really needed here. I've added a bit to the answer on how to accomplish that.Jodhpur
if your department is IT, you won't be able to get your result when you put IT as a keywordIkeda
S
1

You can put

@Analyzer(impl=KeywordAnalyzer.class)

to your field to avoid this issue.

Swacked answered 30/6, 2016 at 3:40 Comment(0)
S
1

Proposed Work Around

The reason for this error was already explained by @femtoRgon, this problem also occurs when you try to tokenze the user input into a list of strings and then feed each string into a Hibernate Search Query. When you now have a String which is a stop word, Hibernate does not know what to do with this String.

However you can parse and validate the input with the same analyzer before you send the input to the Hibernate Search query. With this method, you can stem the same words already from the input and avoid the error without changing to an alternative Analyzer class.

Retrieve the current analyzer from your entity class MyModelClass.class

FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search
    .getFullTextEntityManager(entityManager);

QueryBuilder builder = fullTextEntityManager.getSearchFactory()
    .buildQueryBuilder().forEntity(MyModelClass.class).get();

Analyzer customAnalyzer = fullTextEntityManager.getSearchFactory()
    .getAnalyzer(MyModelClass.class);

Input Tokenizer

/**
 * Validate input against the tokenizer and return a list of terms.
 * @param analyzer
 * @param string
 * @return
 */
public static List<String> tokenizeString(Analyzer analyzer, String string)
{
    List<String> result = new ArrayList<String>();
    try
    {
        TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
        stream.reset();
        while (stream.incrementToken())
        {
            result.add(stream.getAttribute(CharTermAttribute.class).toString());
        }
        stream.close();
    } catch (IOException e)
    {
        throw new RuntimeException(e);
    }
    return result;
}

Validate the Input

Now you can simply run your input string through the same Analyzer and receive a list of Strings with is tokenized properly like this:

List<String> keywordsList = tokenizeString(customAnalyzer, "This is a sentence full of the evil stopwords);

and would receive this list

[this, sentence, full, evil, stopwords]

My answer is based on this and this SO posts.

Struble answered 13/4, 2017 at 12:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.