I know there is a good Compound Word Token Filter in elasticsearch but my problem is kind of different. I am wondering how search engines like google deal with open form compound words like "post office" or "living room". if you type "postoffice" instead of "post office" you still get the same result. I want to have such feature in my search engine with elasticsearch. what is the solution to this problem? should I tokenize post office as one token? if it is true, HOW?
How to deal with compound words in Elasticsearch
Asked Answered
You should add an analyzer to search query
See mapping and documents in my answer
Query with compound "something"
GET /decompounder/_search?filter_path=hits.hits
{
"query": {
"multi_match" : {
"query": "something",
"analyzer": "lowercase_english_decompounder_standard_analyzer",
"fields": ["name"]
}
}
}
Response
{
"hits" : {
"hits" : [
{
"_index" : "decompounder",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.23911434,
"_source" : {
"name" : "something sea"
}
},
{
"_index" : "decompounder",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.23911434,
"_source" : {
"name" : "something tea"
}
},
{
"_index" : "decompounder",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.23911434,
"_source" : {
"name" : "something seaside"
}
}
]
}
}
Query with two words "some thing"
GET /decompounder/_search?filter_path=hits.hits
{
"query": {
"multi_match" : {
"query": "some thing",
"analyzer": "lowercase_english_decompounder_standard_analyzer",
"fields": ["name"]
}
}
}
Response is the same
© 2022 - 2024 — McMap. All rights reserved.