Autocomplete using Hibernate Search
Asked Answered
B

3

11

I am trying to build a better autocomplete feature for my website. I want to use Hibernate Search for this but as far as I experimented it only finds full words for me.

So, my question: is it possible to search for some characters only ?

eg. user types 3 letters and using hibernate search to show him all words of my db objects which contains those 3 letter?

PS. right now I am using a "like" query for this...but my db grown a lot and I want also to extend the search functionality over another tables...

Bamako answered 19/3, 2011 at 12:4 Comment(0)
I
7

You could index the field using an NGramFilter as suggested here. For best results you should use the EdgeNgramFilter from Apache Solr that creates ngrams from the beginning edge of a term and can be used in hibernate search as well.

Impatiens answered 19/3, 2011 at 13:17 Comment(0)
A
12

Major edit One year on and I was able to improve on the original code I posted to produce this:

My indexed entity:

@Entity
@Indexed
@AnalyzerDef(name = "myanalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), //
filters = { //
// Normalize token text to lowercase, as the user is unlikely to care about casing when searching for matches
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
// Index partial words starting at the front, so we can provide Autocomplete functionality
@TokenFilterDef(factory = NGramFilterFactory.class, params = { @Parameter(name = "maxGramSize", value = "1024") }),
// Close filters & Analyzerdef
})
@Analyzer(definition = "myanalyzer")
public class Compound extends DomainObject {
public static String[] getSearchFields(){...}
...
}

All @Fields are tokenized and stored in the index; required for this to work:
@Field(index = Index.TOKENIZED, store = Store.YES)

@Transactional(readOnly = true)
public synchronized List<String> getSuggestions(final String searchTerm) {
    // Compose query for term over all fields in Compound
    String lowerCasedSearchTerm = searchTerm.toLowerCase();

    // Create a fullTextSession for the sessionFactory.getCurrentSession()
    FullTextSession fullTextSession = Search.getFullTextSession(getSession());

    // New DSL based query composition
    SearchFactory searchFactory = fullTextSession.getSearchFactory();
    QueryBuilder buildQuery = searchFactory.buildQueryBuilder().forEntity(Compound.class).get();
    TermContext keyword = buildQuery.keyword();
    WildcardContext wildcard = keyword.wildcard();
    String[] searchfields = Compound.getSearchfields();
    TermMatchingContext onFields = wildcard.onField(searchfields[0]);
    for (int i = 1; i < searchfields.length; i++)
        onFields.andField(searchfields[i]);
    TermTermination matching = onFields.matching(input.toLowerCase());
    Query query = matching.createQuery();

    // Convert the Search Query into something that provides results: Specify Compound again to be future proof
    FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query, Compound.class);
    fullTextQuery.setMaxResults(20);

    // Projection does not work on collections or maps which are indexed via @IndexedEmbedded
    List<String> projectedFields = new ArrayList<String>();
    projectedFields.add(ProjectionConstants.DOCUMENT);
    List<String> embeddedFields = new ArrayList<String>();
    for (String fieldName : searchfields)
        if (fieldName.contains("."))
            embeddedFields.add(fieldName);
        else
            projectedFields.add(fieldName);

    @SuppressWarnings("unchecked")
    List<Object[]> results = fullTextQuery.setProjection(projectedFields.toArray(new String[projectedFields.size()])).list();

    // Keep a list of suggestions retrieved by search over all fields
    List<String> suggestions = new ArrayList<String>();
    for (Object[] projectedObjects : results) {
        // Retrieve the search suggestions for the simple projected field values
        for (int i = 1; i < projectedObjects.length; i++) {
            String fieldValue = projectedObjects[i].toString();
            if (fieldValue.toLowerCase().contains(lowerCasedSearchTerm))
                suggestions.add(fieldValue);
        }

        // Extract the search suggestions for the embedded fields from the document
        Document document = (Document) projectedObjects[0];
        for (String fieldName : embeddedFields)
            for (Field field : document.getFields(fieldName))
                if (field.stringValue().toLowerCase().contains(lowerCasedSearchTerm))
                    suggestions.add(field.stringValue());
    }

    // Return the composed list of suggestions, which might be empty
    return suggestions;
}

There's some wrangling I'm doing at the end to handle @IndexedEmbedded fields. If you dont have those you can simplify the code a whole lot merely projecting the searchFields, and leaving out the document & embeddedField handling.

As before: Hopefully this is useful to the next person to encounter this question. Should anyone have any critique or improvements to the above posted code, feel free to edit and do please let me know.


Edit3: The project this code was taken from has since been open sourced; Here are the relevant classes:

https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-core/src/main/java/org/metidb/domain/Compound.java
https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-core/src/main/java/org/metidb/dao/CompoundDAOImpl.java
https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-search/src/main/java/org/metidb/search/text/Autocompleter.java

Adjournment answered 29/6, 2011 at 9:37 Comment(4)
Another issue is,the results appear only for single word if I search for eg: "sprain" there is result but for "sprain of" there is none. Is there a way to handle this?Expander
TermTermination matching = onFields.matching(input.toLowerCase()); I also think the 'input.toLowerCase()' should've been 'lowerCasedSearchTerm'.Expander
@Expander I'm a bit fuzzy because it's been a while since I've worked on this, but regarding the phrase search "sprain of" you might have to adjust the WhitespaceTokenizerFactory to something else, as I think that chops up individual words.Adjournment
@Expander Regarding your second comment about the lowerCasedSearchTerm: That's probably valid, feel free to update or post your own code as well!Adjournment
I
7

You could index the field using an NGramFilter as suggested here. For best results you should use the EdgeNgramFilter from Apache Solr that creates ngrams from the beginning edge of a term and can be used in hibernate search as well.

Impatiens answered 19/3, 2011 at 13:17 Comment(0)
E
2

Tim's answer is brilliant and helped me get over the difficult part. It worked only for a single word query for me. In case if anybody want it to make it work for phrase searches. Just replace all the 'Term' instances with their corresponding 'Phrase' classes. Here are the replacement lines for Tim's code that did the trick for me.

// New DSL based query composition
            //org.hibernate.search.query.dsl
            SearchFactory searchFactory = fullTextSession.getSearchFactory();
            QueryBuilder buildQuery = searchFactory.buildQueryBuilder().forEntity(MasterDiagnosis.class).get();
            PhraseContext keyword = buildQuery.phrase();
            keyword.withSlop(3);
            //WildcardContext wildcard = keyword.wildcard();
            String[] searchfields = MasterDiagnosis.getSearchfields();
            PhraseMatchingContext onFields = keyword.onField(searchfields[0]);
            for (int i = 1; i < searchfields.length; i++)
                onFields.andField(searchfields[i]);
            PhraseTermination matching = onFields.sentence(lowerCasedSearchTerm);
            Query query = matching.createQuery();
 // Convert the Search Query into something that provides results: Specify Compound again to be future proof
Expander answered 8/5, 2013 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.