Why is this Lucene query a "contains" instead of a "startsWith"?
Asked Answered
N

3

6
string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");

will result in query being a prefixQuery :company:a*

Still I will get results like "Fleet Africa" where it is rather obvious that the A is not at the start and thus gives me undesired results.

Query query = new TermQuery(new Term("company", q+"*"));

will result in query being a termQuery :company:a* and not returning any results. Probably because it interprets the query as an exact match and none of my values are the "a*" literal.

Query query = new WildcardQuery(new Term("company", q+"*"));

will return the same results as the prefixquery;

What am I doing wrong?

Nasalize answered 3/3, 2009 at 10:37 Comment(0)
T
6

StandardAnalyzer will tokenize "Fleet Africa" into "fleet" and "africa". Your a* search will match the later term.

If you want to consider "Fleet Africa" as one single term, use an analyzer that does not break up your string on whitespaces. KeywordAnalyzer is an example, but you may still want to lowercase your data so queries are case insensitive.

Tugboat answered 30/3, 2011 at 11:20 Comment(0)
L
0

The short answer: all your queries do not constrain the search to the start of the field. You need an EdgeNGramTokenFilter or something like it. See this question for an implementation of autocomplete in Lucene.

Lyre answered 3/3, 2009 at 10:53 Comment(4)
Surely the example is too farfeched, right? Isn't it possible to create a startswith like query without all the fuzz?Nasalize
Not that I know of. startswith is tricky. If you manage to do this, please let me know. From what I see, PrefixQuery means looking for the start of any term, not just the first.Lyre
This surprises me actually. Startswith must be the most easy query to do, not?Nasalize
I have exactly the opposite problem, for me Lucene performs StartsWith by default, but I want a Containsand I don't know how to achieve this. What Version/Analyzer are you using? I'm using 2.9/StandardAnalyzer. Also my question is located at: #5485465Wordless
J
0

Another solution could be to use StringField to store the data for ex: "Fleet Africa" Then use a WildCardQuery.. Now f* or F* would give results but A* or a* won't.

StringField is indexed but not tokenized.

Jackqueline answered 16/11, 2021 at 12:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.