ElasticSearch partial phrase matching
Asked Answered
D

2

8

I'm an ElasticNoob, but I've been playing around with some simple phrase matching as follows:

query: {
  match_phrase: {
    my_field: {
      query: "silly dogs playing about",
      slop:  100
    }
  }
}

But this only matches entries that have all 4 terms (silly, dogs, playing, about). Ideally it'd still match something like "silly dogs that are playing" which doesn't have the "about" keyword (it would get a lower score because of this).

This seems like a very common use case for a text search engine so I figured my Google-fu must be weak because I can't find anything about partial phrase matching in elastic search.

Can someone point me in the right direction here? Just to be clear:

  • The order of keywords matters (match_phrase and slop allow us to do this)
  • The number of keywords matched matters (match_phrase simply excludes items if any keywords are missing - this is not ideal for my situation)

Thanks!

Despairing answered 9/6, 2016 at 17:3 Comment(0)
C
11

Recomended solution is:

Instead of using proximity matching as an absolute requirement, we can use it as a signal—as one of potentially many queries, each of which contributes to the overall score for each document (see Most Fields).

Here you have article describing it: https://www.elastic.co/guide/en/elasticsearch/guide/current/proximity-relevance.html

So your query would look like:

  query: {
    bool: {
      must: {
        match: {
          my_field: {
            query: "silly dogs playing about",
            minimum_should_match: "30%"
          }
        }
      },
      should: {
        match_phrase: {
          my_field: {
            query: "silly dogs playing about",
            slop:  50
          }
        }
      }
    }
  }
Cohby answered 9/6, 2016 at 17:17 Comment(1)
Ah, exactly what I'm looking for - thanks! Works a charm.Despairing
I
-3

You can use the variable call minimum_should_match to either specify the percentage of the words that needs to match or to specify how many words should match.

query: {
  match_phrase: {
    my_field: {
      query: "silly dogs playing about",
      slop:  100,
      minimum_should_match: "75%"
    }
  }
}

This would mean at least 3 of the 4 words would need to match for it to be a hit.

Issykkul answered 9/6, 2016 at 18:33 Comment(1)
This won't work. match_phrase does not support minimum_should_match parameter.Hemichordate

© 2022 - 2024 — McMap. All rights reserved.