Elasticsearch match phrase prefix not matching all terms
Asked Answered
P

1

11

I am having an issue where when I use the match_phrase_prefix query in Elasticsearch, it is not returning all the results I would expect it to, particularly when the query is one word followed by one letter.

Take this index mapping (this is a contrived example to protect sensitive data):

http://localhost:9200/test/drinks/_mapping

returns:

{
  "test": {
    "mappings": {
      "drinks": {
        "properties": {
          "name": {
            "type": "text"
          }
        }
      }
    }
  }
}

And amongst millions of other records are these:

{
    "_index": "test",
    "_type": "drinks",
    "_id": "2",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Black Label"
    }
},
{
    "_index": "test",
    "_type": "drinks",
    "_id": "1",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Blue Label"
    }
}

The following query, which is one word followed by two letters:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker Bl"
        }
    }
}

returns this:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "test",
                "_type": "drinks",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "name": "Johnnie Walker Black Label"
                }
           },
           {
               "_index": "test",
               "_type": "drinks",
               "_id": "1",
               "_score": 0.5753642,
               "_source": {
                   "name": "Johnnie Walker Blue Label"
                }
            }
        ]
    }
}

Whereas this query with one word and one letter:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker B"
        }
    }
}

returns no results. What could be happening here?

Parnassian answered 8/11, 2017 at 14:20 Comment(0)
P
16

I will assume that you are working with Elasticsearch 5.0 and above. I think it might have to be because of the max_expansions default value.

As seen in the documentation here, the max_expansions parameters is used to control how many prefixes the last term will be expanded with. The default value is 50 and it might explain why you find "black" and "blue" with the two first letters B and L, but not with the B only.

The documentation is pretty clear about it:

The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears

I wouldn't be able to tell you if it's ok to increase this parameter above 50 if you are looking for good performances since I never tried myself.

Peneus answered 8/11, 2017 at 15:33 Comment(4)
One question, the documentation says that it will looks for the "quick" followed by "brown". Will then elasticsearch look for the following 50 (by default) terms that begin with "f" and are preceded by "quick" and "brown" in that order? Or just any term beginning with "f"? And in any of the previous cases, why doesn't return a result if at least there are 2 terms beginning with (in this question) "b" ("blue" and "black") I was expecting to see the first 50 terms at least those two or other ones should be shown. Or I'm getting all wrong.Dustheap
Maybe the 50 terms are from a built-in dictionary based on the language used by your cluster (since ES supports custom languages), and not from the different document you have in your index. The would explain why it is a custom parameter for match_phrase_prefix querie and different than just using the "size" parameterPeneus
In my case none of my fields have a language property on the mapping. And as soon I introduce a new letter then the results are shown more accurate. I'm using a multi_match query "type": "phrase_prefix" so the max_expansions parameter can't be used.Dustheap
This does appear to be the issue. Index-time search-as-you-type fixes the problem for me: elastic.co/guide/en/elasticsearch/guide/current/…Parnassian

© 2022 - 2024 — McMap. All rights reserved.