Elasticsearch: Find substring match
Asked Answered
C

3

70

I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I search for "en's shaver" then also I should be able to find "men's shaver" in the result. I using following settings and mappings:

Index settings:

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

Mappings:

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

Insert records:

POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "men's shaver" }
{ "index": { "_id": 2            }}
{ "name": "women's shaver" }

Query:

1. To search by exact phrase match --> "men's"

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "men's"
        }
    }
}

Above query returns "men's shaver" in the return result.

2. To search by Partial word match --> "en's"

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "en's"
        }
    }
}

Above query DOES NOT return anything.

I have also tried following query

POST /my_index/my_type/_search
{
    "query": {
        "wildcard": {
           "name": {
              "value": "%en's%"
           }
        }
    }
}

Still not getting anything. I figured it is because of "edge_ngram" type filter on Index which is not able to find "partial word/sbustring match". I tried "n-gram" type filter as well but it is slowing down the search alot.

Please suggest me how to achieve both excact phrase match and partial phrase match using same index setting.

Consecration answered 23/4, 2014 at 12:11 Comment(0)
R
87

To search for partial field matches and exact matches, it will work better if you define the fields as "not analyzed" or as keywords (rather than text), then use a wildcard query.

See also this.

To use a wildcard query, append * on both ends of the string you are searching for:

POST /my_index/my_type/_search
{
"query": {
    "wildcard": {
       "name": {
          "value": "*en's*"
       }
    }
}
}

To use with case insensitivity, use a custom analyzer with a lowercase filter and keyword tokenizer.

Custom Analyzer:

"custom_analyzer": {
    "tokenizer": "keyword",
    "filter": ["lowercase"]
}

Make the search string lowercase

If you get search string as AsD: change it to *asd*

Rochelrochell answered 23/4, 2014 at 12:29 Comment(8)
Thanks. I'm able to search now.Consecration
Just to quote ElasticSearch's documentation: "Warning: Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined" elastic.co/guide/en/elasticsearch/reference/1.x/…Chaparajos
@Chaparajos 's link is broken, but as he says, ElasticSearch recommends " Avoid using a pattern that starts with a wildcard (for example, *foo or, as a regexp, .*foo)". elastic.co/guide/en/elasticsearch/guide/current/…Holomorphic
It does not work for case insensitive. How can we use it for case insensitive?Gathard
@Rochelrochell - I am facing issues with case-sensitive. POST newindswe { "settings":{ "custom_analyzer": { "tokenizer": "keyword", "filter": ["lowercase"] } } } PUT /newindswe/newtyp/1 { "name":"Soundarya Thyagu", "street":"kutchery road" } POST /newindswe/newtyp/_search { "query": { "match":{ "value": "SoU" } } } But i am not getting any result. Can you help me herePectize
@Rochelrochell - I dont get you - How do I lowercase the query? even if I put the value SoU - it should lowercase as per the filter right.. POST /newindswe/newtyp/_search { "query": { "match":{ "name": "SoU" } } }Pectize
If I have a name field with value "Soundarya Thyagu" - if I search with SoU also, it should lowercase the field and return rightPectize
@SoundaryaThiagarajan u either lowercase the search string or use search time analyzer to lowercase...Rochelrochell
G
6

The answer given by @BlackPOP will work, but it uses the wildcard approach, which is not preferred as it has a performance issue and if abused can create a huge domino effect (performance issue) in the Elastic cluster.

I have written a detailed blog on partial search/autocomplete covering the latest options available in Elasticsearch as of today (Dec 2020) with performance in mind. For more trade-off information please refer to this answer.

IMHO a better approach will be to use the customized n-gram tokenizer according to use-case, which will have already tokens needed for search term so it will be faster, although it will have a bigger index size, but you size is not that costly and speed will be better with more control on how exactly you want substring search to work.

Also size can be controlled if you are conservative in defining the min and max gram in tokenizer setting.

Greenlee answered 23/12, 2020 at 10:53 Comment(0)
D
-4

By searching with any string or substring Use:

query: {
    or: [{
      match_phrase_prefix: {
            name: str
     }
    }, {
        match_phrase_prefix: {
            surname: str
        }
    }]
}

Happy coding with Elastic Search....

Detailed answered 11/4, 2016 at 8:58 Comment(1)
He's not looking for matching a prefix though.Fortyfour

© 2022 - 2024 — McMap. All rights reserved.