How to deal with compound words in Elasticsearch
Asked Answered
G

1

8

I know there is a good Compound Word Token Filter in elasticsearch but my problem is kind of different. I am wondering how search engines like google deal with open form compound words like "post office" or "living room". if you type "postoffice" instead of "post office" you still get the same result. I want to have such feature in my search engine with elasticsearch. what is the solution to this problem? should I tokenize post office as one token? if it is true, HOW?

Gallium answered 18/11, 2017 at 8:43 Comment(0)
S
0

You should add an analyzer to search query

See mapping and documents in my answer

Query with compound "something"

GET /decompounder/_search?filter_path=hits.hits
{
        "query": {
                "multi_match" : {
                        "query": "something",
                        "analyzer": "lowercase_english_decompounder_standard_analyzer", 
                        "fields": ["name"]
                }
        }
}

Response

{
    "hits" : {
        "hits" : [
            {
                "_index" : "decompounder",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.23911434,
                "_source" : {
                    "name" : "something sea"
                }
            },
            {
                "_index" : "decompounder",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 0.23911434,
                "_source" : {
                    "name" : "something tea"
                }
            },
            {
                "_index" : "decompounder",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 0.23911434,
                "_source" : {
                    "name" : "something seaside"
                }
            }
        ]
    }
}

Query with two words "some thing"

GET /decompounder/_search?filter_path=hits.hits
{
        "query": {
                "multi_match" : {
                        "query": "some thing",
                        "analyzer": "lowercase_english_decompounder_standard_analyzer", 
                        "fields": ["name"]
                }
        }
}

Response is the same

Shotten answered 14/4 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.