How to search fields with wildcard and spaces in Hibernate Search
Asked Answered
H

2

7

I have a search box that performs a search on title field based on the given input, so the user has recommended all available titles starting with the text inserted.It is based on Lucene and Hibernate Search. It works fine until space is entered. Then the result disapear. For example, I want "Learning H" to give me "Learning Hibernate" as the result. However, this doesn't happen. could you please advice me what should I use here instead.

Query Builder:

QueryBuilder qBuilder = fullTextSession.getSearchFactory()
        .buildQueryBuilder().forEntity(LearningGoal.class).get();
  Query query = qBuilder.keyword().wildcard().onField("title")
        .matching(searchString + "*").createQuery();

  BooleanQuery bQuery = new BooleanQuery();
  bQuery.add(query, BooleanClause.Occur.MUST);
  for (LearningGoal exGoal : existingGoals) {
     Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
     bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
  }
  @SuppressWarnings("unused")
  org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
        query, LearningGoal.class);

Hibernate class:

@AnalyzerDef(name = "searchtokenanalyzer",tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
  @TokenFilterDef(factory = StandardFilterFactory.class),
  @TokenFilterDef(factory = LowerCaseFilterFactory.class),
  @TokenFilterDef(factory = StopFilterFactory.class,params = { 
      @Parameter(name = "ignoreCase", value = "true") }) })
      @Analyzer(definition = "searchtokenanalyzer")
public class LearningGoal extends Node {
Hiero answered 8/3, 2013 at 1:20 Comment(3)
printing the query to output will definitely help you..Lucius
It is useful indeed, but didn't help me to understand why I don't have results. For example, I have learning goal whose title is "Learning Probability Theory". The output of two queries are bQuery:+title:learning p* hibQuery:FullTextQueryImpl(title:learning p*) for input string "learning p". It finds value if the input string is "learning".Hiero
I also tried to replace space with ?, but it didn't give result.Hiero
H
9

I found workaround for this problem. The idea is to tokenize input string and remove stop words. For the last token I created a query using keyword wildcard, and for the all previous words I created a TermQuery. Here is the full code

    BooleanQuery bQuery = new BooleanQuery();
    Session session = persistence.currentManager();
    FullTextSession fullTextSession = Search.getFullTextSession(session);
    Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("searchtokenanalyzer");
    QueryParser parser = new QueryParser(Version.LUCENE_35, "title", analyzer);
    String[] tokenized=null;
    try {
    Query query=    parser.parse(searchString);
    String cleanedText=query.toString("title");
     tokenized = cleanedText.split("\\s");

    } catch (ParseException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    QueryBuilder qBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(LearningGoal.class).get();
    for(int i=0;i<tokenized.length;i++){
         if(i==(tokenized.length-1)){
            Query query = qBuilder.keyword().wildcard().onField("title")
                    .matching(tokenized[i] + "*").createQuery();
                bQuery.add(query, BooleanClause.Occur.MUST);
        }else{
            Term exactTerm = new Term("title", tokenized[i]);
            bQuery.add(new TermQuery(exactTerm), BooleanClause.Occur.MUST);
        }
    }
        for (LearningGoal exGoal : existingGoals) {
        Term omittedTerm = new Term("id", String.valueOf(exGoal.getId()));
        bQuery.add(new TermQuery(omittedTerm), BooleanClause.Occur.MUST_NOT);
    }
    org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(
            bQuery, LearningGoal.class);
Hiero answered 13/3, 2013 at 5:52 Comment(5)
Can you please add more explanations? I do not get it so far. Why are you using a different query for the last token? And please modify your example, that it is clear enough. Why are existingGoals necessary at all?Pointer
Let's say we have title "Hibernate Search". When user entered "Hibernate Se" the first token will be "Hibernate" and we are taking exact term since we know that the user entered the whole word he wanted, as he already started to type another word. For the second word "se", since we know that user might not finished typing, we are using wildcard to be sure that he's not in the middle of the word, which is exactly the case here. So the query for the last word will cover everything starting with "se", and all words user entered before will be used as the exact terms.Hiero
For the second question (existingGoals), this is something very specific to my use case scenario. I wanted to exclude from the search results those titles that user already added to his list of selected items, so these existingGoals are actually titles that should be ignored, and you might not need it in your case.Hiero
That does make a lot of sense here. I just used your loop for my use case. :) Thank you!Pointer
Thanks, bro. Helped a lot.Salaam
C
-2

SQL uses different wildcards than any terminal. In SQL '%' replaces zero or more occurrences of any character (in the terminal you use '*' instead), and the underscore '_' replaces exactly one character (in the terminal you use '?' instead). Hibernate doesn't translate the wildcard characters.

So in the second line you have to replace matching(searchString + "*") with

  matching(searchString + "%")
Cellular answered 8/3, 2013 at 9:38 Comment(3)
Are you sure about this? After this it doesn't give me any results, even without spaces in searchString. Previously (with *) I had some results until the space arise in searchString.I don't know how this HibernateSearch is related to SQL? It performs searching over the Lucene indexes which are not stored in database, so I'm not sure if it uses SQL syntax.Hiero
For Hibernate + SQL I'm sure, but I don't use Lucene, and I don't know what the Lucene engine is doing with the input.Cellular
I see. You thought that this is regular database query. However, Hibernate Search uses Lucene queries to search over lucene indexes and its syntax is not the same as SQL lucenetutorial.com/lucene-query-syntax.htmlHiero

© 2022 - 2024 — McMap. All rights reserved.