TermQuery not returning on a known search term, but WildcardQuery does
Asked Answered
F

1

8

Am hoping someone with enough insight into the inner workings of Lucene might be able to point me in the right direction =)

I'll skip most of the surrounding irellevant code, and cut right to the chase. I have a Lucene index, to which I am adding the following field to the index (variables replaced by their literal values):

document.Add( new Field("Typenummer", "E5CEB501A244410EB1FFC4761F79E7B7", 
                        Field.Store.YES , Field.Index.UN_TOKENIZED));

Later, when I search my index (using other types of queries), I am able to verify that this field does indeed appear in my index - like when looping through all Fields returned by Document.GetFields()

Field: Typenummer, Value: E5CEB501A244410EB1FFC4761F79E7B7

So far so good :-)

Now the real problem is - why can I not use a TermQuery to search against this value and actually get a result.

This code produces 0 hits:

// Returns 0 hits
bq.Add( new TermQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

But if I switch this to a WildcardQuery (with no wildcards), I get the 1 hit I expect.

// returns the 1 hit I expect
bq.Add( new WildcardQuery( new Term( "Typenummer", 
        "E5CEB501A244410EB1FFC4761F79E7B7" ) ), BooleanClause.Occur.MUST );

I've checked field lengths, I've checked that I am using the same Analyzer and so on and I am still on square 1 as to why this is.

Can anyone point me in a direction I should be looking?

Fixing answered 24/2, 2012 at 12:53 Comment(1)
I am not very familiar with Lucene.net, but in case it uses the same index structure as the java version, you could use Luke code.google.com/p/luke to check that your index structure meets your expectations. Alternatively, could you check whether TermsIndex#seekExact manages to find your term?Occasionalism
F
9

I finally figured out what was going on. I'm expanding the tags for this question as it, much to my surprise, actually turned out to be an issue with the CMS this particular problem exists in. In summary, the problem came down to this:

  1. The field is stored UN_TOKENIZED, meaning Lucene will store it excactly "as-is"
  2. The BooleanQuery I pasted snippets from gets sent to the Sitecore SearchManager inside a PreparedQuery wrapper
  3. The behaviour I expected from this was, that my query (having already been prepared) would go - unaltered - to the Lucene API
  4. Turns out I was wrong. It passes through a RewriteQuery method that copies my entire set of nested queries as-is, with one exception - all the Term arguments are passed through a LowercaseStrategy()
  5. As I indexed an UPPERCASE Term (UN_TOKENIZED), and Sitecore changes my PreparedQuery to lowercase - 0 results are returned

Am not going to start an argument of whether this is "by design" or "by design flaw" implementation of the Lucene Wrapper API - I'll just note that rewriting my query when using the PreparedQuery overload is... to me... unexpected ;-)

Further teachings from this; storing the field as TOKENIZED will eliminate this problem too, as the StandardAnalyzer by default will lowercase all tokens.

Fixing answered 1/3, 2012 at 8:25 Comment(3)
Just ran into a similar issue with Lucene.NET. This post saved me a lost of time thanks.Shoshana
@mark Cassidy Wow. Your the only one discussing this problem. I have similar issue. I am trying to filter the value before passing it to Search. Filter filter = new QueryWrapperFilter(new WildcardQuery(new Term(field1, "005NWVOXDYN3U0V6")));. The text values are not working in that filter. However, the numbers are working. I even tried TermQuery but same result. Can you help me with this problem please?Angola
I implemented a custom keyword analyzer. It does the same as the built-in keyword analyzer, except I also pass the "tokens" through the built-in lowercase filter. That means I pass nearly all my fields through a tokenizer (even if most are just passed through "as-is") but it allows case-insensitive matching on all fields.Sandiesandifer

© 2022 - 2024 — McMap. All rights reserved.