Question 1: In Lucene's SpanNearQuery
(or span_near
in ElasticSearch), what is the exact meaning of slop
? Is it the number of words separating the two matching words, or is it the separating number of words plus 1?
For example, suppose your indexed text is: foo bar biz
Which queries would match this text: "foo biz"~0
, "foo biz"~1
, "foo biz"~2
I would expect that the first wouldn't match and the last would. But what about the middle?
Question 2: Now a second and more complex corollary question: how is slop
handled if there are more than two search clauses? Is it applied to each pair of clauses or any pair of clauses.
For example, suppose you construct a SpanNearQuery
with three clauses: foo
, bar
, biz
. What slop is needed to match the same indexed text above? I would expect a slop of 2
definitely would, but what about 0
or 1
?
Similarly, with the same three clause query, what slop is needed to match the text: foo bar ble biz