Elastic Search multi_match cross_fields prefix
Asked Answered
A

1

12

I have a multi_match query of type cross_fields, which I want to improve with prefix matching.

{
  "index": "companies",
  "size": 25,
  "from": 0,
  "body": {
    "_source": {
      "include": [
        "name",
        "address"
      ]
    },
    "query": {
      "filtered": {
        "query": {
          "multi_match": {
            "type": "cross_fields",
            "query": "Google",
            "operator": "and",
            "fields": [
              "name",
              "address"
            ]
          }
        }
      }
    }
  }
}

It is matching perfectly on queries such as google mountain view. The filtered array is there because I dynamically need to add geo filters.

{
  "id": 1,
  "name": "Google",
  "address": "Mountain View"
} 

Now I want to allow prefix matching, without breaking cross_fields.

Queries such as these should match:

  • goog
  • google mount
  • google mountain vi
  • mountain view goo

If I change the multi_match.type to phrase_prefix, it matches the whole query against a single field, so it matches only against mountain vi but not against google mountain vi

How do I solve this?

Atwekk answered 21/2, 2015 at 22:57 Comment(0)
H
6

As there are no answers and someone might see this, I had the same problem and here is a solution:

Using the edgeNGrams tokenizer.

You need to change the index settings and the mappings.

Here's an example for the settings:

"settings" : {
  "index" : {
    "analysis" : {
      "analyzer" : {
        "ngram_analyzer" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop", "ngram_filter" ],
          "tokenizer" : "standard"
        },
        "default" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop" ],
          "tokenizer" : "standard"
        }
      },
      "filter" : {
        "no_stop" : {
          "type" : "stop",
          "stopwords" : "_none_"
        },
        "ngram_filter" : {
          "type" : "edgeNGram",
          "min_gram" : "2",
          "max_gram" : "20"
        }
      }
    }
  }
}

Of course, you should adapt the analyzers for your own use case. You might want to leave the default analyzer untouched or add the ngram filter to it so you don't have to change the mappings. That last solution would mean that all fields in your index will get the ngram filter.

And for the mapping:

"mappings" : {
  "patient" : {
    "properties" : {
      "name" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      },
      "address" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      }
    }
  }
}

Declare every field you want to autocomplete with the ngram_analyzer. Then the queries in your question should work. If you used something else, I'd be happy to hear about it.

Heckle answered 13/7, 2015 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.