How do I make the QueryParser in Lucene handle numeric ranges?
Asked Answered
S

5

7
new QueryParser(.... ).parse (somequery);

it works only for string indexed fields. Say i have a field called count where count is a integer field (while indexing the field I considered the data type)

new QueryParser(....).parse("count:[1 TO 10]");

The above one is not works. Instead If i used "NumericRangeQuery.newIntRange" which is working. But, i need the above one only...

Sinistrodextral answered 17/2, 2011 at 7:27 Comment(0)
O
9

Had the same issue and solved it, so here I share my solution:

To create a custom query parser that will parse the following query "INTFIELD_NAME:1203" or "INTFIELD_NAME:[1 TO 10]" and handle the field INTFIELD_NAME as an Intfield, I overrided newTermQuery with the following:

public class CustomQueryParser extends QueryParser {

public CustomQueryParser(String f, Analyzer a) {
    super(f, a);
}

protected Query newRangeQuery(String field, String part1, String part2, boolean startInclusive,
    boolean endInclusive) {

    if (INTFIELD_NAME.equals(field)) {
    return NumericRangeQuery.newIntRange(field, Integer.parseInt(part1), Integer.parseInt(part2),
        startInclusive, endInclusive);
    }
    return (TermRangeQuery) super.newRangeQuery(field, part1, part2, startInclusive, endInclusive);
}


protected Query newTermQuery(Term term) {
    if (INTFIELD_NAME.equals(term.field())) {

    BytesRefBuilder byteRefBuilder = new BytesRefBuilder();
    NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()), 0, byteRefBuilder);
    TermQuery tq = new TermQuery(new Term(term.field(), byteRefBuilder.get()));

    return tq;
    } 
    return super.newTermQuery(term);

}
}

I took the code quoted in that thread from http://www.mail-archive.com/[email protected]&q=subject:%22Re%3A+How+do+you+properly+use+NumericField%22&o=newest&f=1 and made 3 modifications :

  • rewrote newRangeQuery a little more nicely

  • replaced in newTermQuery method NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()),NumericUtils.PRECISION_STEP_DEFAULT)));

    by NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()), 0, byteRefBuilder);

when I used this method for the first time in a filter on the same numeric field, I put 0 as I found it as a default value in a lucene class and it just worked.

  • replaced on newTermQuery

    TermQuery tq = new TermQuery(new Term(field,

by TermQuery tq = new TermQuery(new Term(term.field(),

using "field" is wrong, because if your query has several clauses (FIELD:text OR INTFIELD:100), it is taking the first or previous clause field.

Oconnor answered 4/3, 2015 at 17:4 Comment(2)
I know it's been a while since you answered this, but I'm still having issues even after overriding the newTermQuery method. I still get the specified query 'longField:bytes' contains a string based sub query which targets numeric encoded field(s).Lennalennard
sorry, haven't used lucene since then, won't be able to help, please raise a new issue.Oconnor
C
2

You need to inherit from QueryParser and override GetRangeQuery(string field, ...). If field is one of your numeric field names, return an instance of NumericRangeQuery, otherwise return base.GetRangeQuery(...).

There is an example of such an implementation in this thread: http://www.mail-archive.com/[email protected]/msg29062.html

Clayborne answered 16/8, 2011 at 12:18 Comment(3)
It is not working! Seems like, getRangeQuery is not overridden.Mongo
Worked for me, though this question is over three years old and regrettably I don't remember any specifics anymore...Clayborne
That's okey, I have managed to solve my problem using booleanquery. But this method is not working for me in Lucene4.Mongo
C
1

QueryParser won't create a NumericRangeQuery as it has no way to know whether a field was indexed with NumericField. Just extend the QueryParser to handle this case.

Cide answered 18/2, 2011 at 6:58 Comment(0)
H
1

In Lucene 6, the protected method QueryParser#getRangeQuery still exists with the argument list (String fieldName, String low, String high, boolean startInclusive, boolean endInclusive), and overriding it to interpret the range as a numeric range is indeed possible, as long as that information is indexed using one of the new Point fields.

When indexing your field:

document.add(new FloatPoint("_point_count", value)); // index for efficient range based retrieval
document.add(new StoredField("count", value)); // if you need to store the value itself

At your custom query parser (extending queryparser.classic.QueryParser), override the method with something like this:

@Override
protected Query getRangeQuery(String field, String low, String high, boolean startInclusive, boolean endInclusive) throws ParseException
{
    if («isNumericField»(field)) // context dependent
    {
        final String pointField = "_point_" + field;
        return FloatPoint.newRangeQuery(pointField,
                Float.parseFloat(low),
                Float.parseFloat(high));
    }

    return super.getRangeQuery(field, low, high, startInclusive, endInclusive);
}
Hindquarter answered 22/9, 2016 at 13:44 Comment(0)
D
0

I adapted Jeremies answer for C# and Lucene.Net 3.0.3. I also needed type double instead of int. This is my code:

using System.Globalization;
using Lucene.Net.Analysis;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Util;
using Version = Lucene.Net.Util.Version;

namespace SearchServer.SearchEngine
{
    internal class SearchQueryParser : QueryParser
    {
        public SearchQueryParser(Analyzer analyzer)
            : base(Version.LUCENE_30, null, analyzer)
        {
        }

        private const NumberStyles DblNumberStyles = NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign | NumberStyles.AllowDecimalPoint;

        protected override Query NewRangeQuery(string field, string part1, string part2, bool inclusive)
        {
            if (field == "p")
            {
                double part1Dbl;
                if (!double.TryParse(part1, DblNumberStyles, CultureInfo.InvariantCulture, out part1Dbl))
                    throw new ParseException($"Error parsing value {part1} for field {field} as double.");
                double part2Dbl;
                if (!double.TryParse(part2, DblNumberStyles, CultureInfo.InvariantCulture, out part2Dbl))
                    throw new ParseException($"Error parsing value {part2} for field {field} as double.");
                return NumericRangeQuery.NewDoubleRange(field, part1Dbl, part2Dbl, inclusive, inclusive);
            }
            return base.NewRangeQuery(field, part1, part2, inclusive);
        }

        protected override Query NewTermQuery(Term term)
        {
            if (term.Field == "p")
            {
                double dblParsed;
                if (!double.TryParse(term.Text, DblNumberStyles, CultureInfo.InvariantCulture, out dblParsed))
                    throw new ParseException($"Error parsing value {term.Text} for field {term.Field} as double.");
                return new TermQuery(new Term(term.Field, NumericUtils.DoubleToPrefixCoded(dblParsed)));
            }
            return base.NewTermQuery(term);
        }
    }
}

I improved my code to also allow queries like greater than and lower than when an asterisk is passed. E.g. p:[* TO 5]

...
    double? part1Dbl = null;
    double tmpDbl;
    if (part1 != "*")
    {
        if (!double.TryParse(part1, DblNumberStyles, CultureInfo.InvariantCulture, out tmpDbl))
            throw new ParseException($"Error parsing value {part1} for field {field} as double.");
        part1Dbl = tmpDbl;
    }
    double? part2Dbl = null;
    if (part2 != "*")
    {
        if (!double.TryParse(part2, DblNumberStyles, CultureInfo.InvariantCulture, out tmpDbl))
            throw new ParseException($"Error parsing value {part2} for field {field} as double.");
        part2Dbl = tmpDbl;
    }
    return NumericRangeQuery.NewDoubleRange(field, part1Dbl, part2Dbl, inclusive, inclusive);
...
Daglock answered 28/6, 2017 at 13:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.