Exact Meaning of "Slop" in Lucene SpanNearQuery (or slop in ElasticSearch span_near)
Asked Answered
R

2

22

Question 1: In Lucene's SpanNearQuery (or span_near in ElasticSearch), what is the exact meaning of slop? Is it the number of words separating the two matching words, or is it the separating number of words plus 1?

For example, suppose your indexed text is: foo bar biz

Which queries would match this text: "foo biz"~0, "foo biz"~1, "foo biz"~2

I would expect that the first wouldn't match and the last would. But what about the middle?

Question 2: Now a second and more complex corollary question: how is slop handled if there are more than two search clauses? Is it applied to each pair of clauses or any pair of clauses.

For example, suppose you construct a SpanNearQuery with three clauses: foo, bar, biz. What slop is needed to match the same indexed text above? I would expect a slop of 2 definitely would, but what about 0 or 1?

Similarly, with the same three clause query, what slop is needed to match the text: foo bar ble biz

Rollins answered 14/2, 2014 at 13:54 Comment(2)
You have a question where you can get the exact answer by trying it out.Perri
Ya, I kind of figured that... but sometimes writing it down on SO crystallizes the problem in your head.Rollins
R
27

Question 1: Slop is the number of words separating the span clauses. So slop 0 would mean they are adjacent. In the example I gave, slop of 1 would match.

Question 2: When there are more than two span near clauses, each clause must be connected to at least one other clause by no more than slop words separating them AND all of the clauses must be connected to each other through a chain. However, each clause need not be separated by slop words to every other clause.

For the first example in question 2: slop of 0, 1, and 2 would all match. Slop of zero matches even though foo and biz are separated by more than one because there is a chain through all clauses.

For the second example in question 2: slop of 0 would not match because biz is separated from all other clauses by more than 0 slop. Slop of 1 would match because foo and bar are separated by 0 slop, in addition bar and biz are separated by 1 slop. It matches even though foo and biz are separated by more than one because there is a chain through all clauses. Slop of 2 would obviously match.

Rollins answered 14/2, 2014 at 15:19 Comment(2)
Thanks for your explanation. I'm using match_phrase with slop=0 and it works as you describe. Let's assume I need to boost score for adjacent terms (slop=0) only, so that terms with more unmatched positions will be less relevant like foo biz (higher score) while foo biz dev lower score, how to achieve that?Irreverent
@Irreverent I guess, you simply need to boost the part of your query with lower slop by wrapping it into a BoostQuery.Johannessen
D
0

it's explained in Span near query

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order. The span near query maps to Lucene SpanNearQuery.

Official document -https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-span-near-query.html

Example You want to match Mr. Bush and get details about them . Since there are two words which are not similar ,slop value is 2

Mr. Jeorge Willam Bush, Mr Sean Willam Bush, Mr James Kane Bush

Sample DSL request-

  GET school/_search
    {
     "query":{
       "match_phrase": {
         "EmpName":
         {
           "query": "Mr. Bush",
           "slop":2
         }
        
       }
     }
    }
Dingess answered 11/10, 2020 at 15:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.