I am trying to build a suggester based on arrays of strings in my documents, it is similar to this one but with several differences : the completion suggester
from Elasticsearch is not exactly doing what I want (in terms of filtering and prefix matching), as I need an edge ngram that would work on any word of the sentence, accent-insensitive. Let me clarify with an example.
Assume I have the following indexed documents. I want to suggest "tags" based on a query q
(I don't care about the document themselves, only the tag
s that match my query)
[
{ "tags": [ "société générale", "consulting" ] },
{ "tags": [ "big data", "big", "data"] },
{ "tags": [ "data" ] },
{ "tags": [ "data engineering" ] }
{ "tags": [ "consulting and management of IT" ] }
]
I want to match prefix with accent tolerance, and the following query/responses highlight what I need
- (1)
q = "societe"
orq = "societe generale"
should return[ "société générale" ]
--> accent insensitive - (2)
q = "big data"
should return[ "big data" ]
--> both prefixes "big" and "data" must be in the string - (3)
q = "data"
should return[ "big data", "data", "data engineering" ],
--> anywhere in the sentence (but as a prefix) - (4)
q = "ata"
should not return anything (not a prefix) - (5)
q = "IT consulting"
should return[ "consulting and management of IT" ]
--> both prefixes ofq
should match regardless of order
If I use a regular completion
mapper+suggester,
# assuming a mapping of "tags", of type 'completion' is configured in my ES
{
suggest: {
text: "big data",
tags: {
completion: {
field: "tags",
},
},
almost none of these cases work apart from (2), (4) and 1/3 results from (3)
Can I build a custom suggester or a custom search query that would satisfy my requirements and the examples given above ?