Hibernate Search 5.0 Numeric Lucene Query HSEARCH000233 issue
Asked Answered
D

1

5

Issue: how do we provide hibernate search with a raw lucene query string that includes numeric and non-numeric fields?

Background: we recently upgraded to HibernateSearch 5.0 and many of our queries are now failing because of a change in the HibernateSearch Query Parser (pre-lucene) with the following error:

The specified query contains a string based sub query which targets the numeric encoded field(s)

In most cases, we use lucene's text syntax along with a MultiFieldQueryParser to pass queries into HibernateSearch due to the complexity of the queries that we're running. Up until HibernateSearch 5.0, these worked quite well. In upgrading, we've encountered exceptions thrown from HibernateSearch that prevent our app from running queries that used to work. We don't understand why the exceptions are being thrown or the best way to move forward.

In trying to track down the issue, I've tried to simplify what works and what doesn't in the most raw form. (this is built of HibernateSearch's QueryValidationTest).

Examples:

Given the following Entity class:

@Entity
@Indexed
public static class B {
    @Id
    @GeneratedValue
    private long id;

    @Field
    private long value;

    @Field
    private String text;
}

Test 1 (how we write queries for hibernate search: FAILURE):

        QueryParser parser = new MultiFieldQueryParser(new String[]{"id","value","num"},new StandardAnalyzer());
        Query query = parser.parse("+(value:1 text:test)");
        FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( query, B.class );
        fullTextQuery.list();

results in:

org.hibernate.search.exception.SearchException: HSEARCH000233: The specified query '+(value:1 text:test)' contains a string based sub query which targets the numeric encoded field(s) 'value'. Check your query or try limiting the targeted entities.
at org.hibernate.search.query.engine.impl.LazyQueryState.validateQuery(LazyQueryState.java:163)
at org.hibernate.search.query.engine.impl.LazyQueryState.search(LazyQueryState.java:102)
at org.hibernate.search.query.engine.impl.QueryHits.updateTopDocs(QueryHits.java:227)
at org.hibernate.search.query.engine.impl.QueryHits.<init>(QueryHits.java:122)
at org.hibernate.search.query.engine.impl.QueryHits.<init>(QueryHits.java:94)
at org.hibernate.search.query.engine.impl.HSQueryImpl.getQueryHits(HSQueryImpl.java:436)
at org.hibernate.search.query.engine.impl.HSQueryImpl.queryEntityInfos(HSQueryImpl.java:257)
at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.list(FullTextQueryImpl.java:200)
at org.hibernate.search.test.query.validation.QueryValidationTest.testRawLuceneWithNumericValue(QueryValidationTest.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.hibernate.testing.junit4.ExtendedFrameworkMethod.invokeExplosively(ExtendedFrameworkMethod.java:62)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.hibernate.testing.junit4.FailureExpectedHandler.evaluate(FailureExpectedHandler.java:58)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.hibernate.testing.junit4.BeforeClassCallbackHandler.evaluate(BeforeClassCallbackHandler.java:43)
at org.hibernate.testing.junit4.AfterClassCallbackHandler.evaluate(AfterClassCallbackHandler.java:42)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

Test 2: (Using a numeric range variation fails the same way: FAILURE):

        QueryParser parser = new MultiFieldQueryParser(new String[]{"id","value","text"},new StandardAnalyzer());
        Query query = parser.parse("+(value:[1 TO 1] text:test)");
        FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( query, B.class );
        fullTextQuery.list();

Test 3: (using lucene Terms: SUCCESS)

        TermQuery query = new TermQuery( new Term("text", "bar") );
    TermQuery nq = new TermQuery( new Term("value", "1") );

    BooleanQuery bq = new BooleanQuery();
    bq.add(query, Occur.SHOULD);
    bq.add(nq, Occur.SHOULD);

    FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( bq, B.class );

note: a full version of the test cases with tests that illustrate what we're seeing is here: https://github.com/abrin/hibernate-search/blob/3fdcc8229f0bfa00329b9d977172fd218d82cac2/orm/src/test/java/org/hibernate/search/test/query/validation/QueryValidationTest.java

thanks

Dietz answered 25/1, 2015 at 15:54 Comment(0)
C
7

First off, the reason for your problem is that as of Search 5, numeric types are indexed as Lucene numeric fields (as opposed to string-based fields). Apart from performance gains, it also allows, for example, to sort on numeric fields without the need for padding. The Search 5 documentation says the following:

Prior to Search 5, numeric field encoding was only chosen if explicitly requested via @NumericField. As of Search 5 this encoding is automatically chosen for numeric types. To avoid numeric encoding you can explicitly specify a non numeric field bridge via @Field.bridge or @FieldBridge. The package org.hibernate.search.bridge.builtin contains a set of bridges which encode numbers as strings, for example org.hibernate.search.bridge.builtin.IntegerBridge.

So, if you want to stick to your old behaviour you need to make sure that your numeric values are still indexed as strings. In your example value needs to be indexed with org.hibernate.search.bridge.builtin.LongBridge. You can achieve this with the @FieldBridge annotation (you can ignore the id case, since document ids are indexed as strings anyway):

@Field
@FieldBridge(impl = LongBridge.class)
private long value;

Some comments regarding your test scenarios:

  • Test1: The query parser does only create string-based queries. Lucene has no knowledge about which fields are indexed numerically on this level. A numeric field can only be targeted/searched using the appropriate NumericRangeQuery. If you still want to use a query parser you need to provide your own subclass and handle numeric fields yourself. See also - How do I make the QueryParser in Lucene handle numeric ranges?
  • Test 2: Same problem. Even so you are using the range syntax value:[1 TO 1], it just creates a text/string range query.
  • Test 3: I don't think this actually works. It might not throw an exception, but I am pretty sure that if you look at several search outcomes, you would notice that the value term is ignored. A TermQuery is string based and won't be able to find matches in an numerically encoded field. See also Lucene 3.0.3 Numeric term query
Coefficient answered 26/1, 2015 at 10:31 Comment(1)
Hardy, This helps quite a bit. I think the critical aspect of what you've help me to understand is what you mention in the description of part of Test1 "the query parser does only create string based queries." It looks like we will likely have to refactor or "revert." Thanks for helping to explain.Dietz

© 2022 - 2024 — McMap. All rights reserved.