Returning partial matches in Azure Search
Asked Answered
D

1

5

A while ago I set up a search index for a web application. One of the requirements was to return partial matches of the search terms. For instance, searching for Joh should find John Doe. The most straightforward way to implement this was to append a * to each search term before posting the query to Azure Search. So if a user types Joh, we actually ask Azure Search to search for Joh*.

One limitation of this approach is that all the matches of Joh* have the same search score. Because of this, sometimes a partial match appears higher in the results than an exact match. This is documented behavior, so I guess there is not much I can do about it. Or can I?

While my current way to return partial matches seems like a hack, it has worked well enough in practice that I didn't matter finding out how to properly solve the problem. Now I have the time to look into it and my instinct says there must be a "proper" way to do this. I have read the word "ngrams" here and there, and it seems to be part of the solution. I could probably find a passable solution after some of hours of hacking on it, but if there is any "standard way" to achieve what I want, I would rather follow that path instead of using a home-grown hack. Hence this question.

So my question is: is there a standard way to retrieve partial matches in Azure Search, while giving exact matches a higher score? How should I change the code below to make Azure Search return the search results I need?

The code

Index definition, as returned by the Azure API:

{
    "name": "test-index",
    "defaultScoringProfile": null,
    "fields": [
        {
            "name": "id",
            "type": "Edm.String",
            "searchable": false,
            "filterable": true,
            "retrievable": true,
            "sortable": false,
            "facetable": false,
            "key": true,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null,
            "synonymMaps": []
        },
        {
            "name": "name",
            "type": "Edm.String",
            "searchable": true,
            "filterable": false,
            "retrievable": true,
            "sortable": true,
            "facetable": false,
            "key": false,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null,
            "synonymMaps": []
        }
    ],
    "scoringProfiles": [],
    "corsOptions": null,
    "suggesters": [],
    "analyzers": [],
    "tokenizers": [],
    "tokenFilters": [],
    "charFilters": []
}

The documents, as posted to the Azure API:

{
    "value": [
        {
            "@search.action": "mergeOrUpload",
            "id": "1",
            "name": "Joh Doe"
        },
        {
            "@search.action": "mergeOrUpload",
            "id": "2",
            "name": "John Doe"
        }
    ]
}

Search query, as posted to the Azure API:

{
    search: "Joh*"
}

Results, where the exact match appears second, while we would like it to appear first:

{
    "value": [
        {
            "@search.score": 1,
            "id": "2",
            "name": "John Doe"
        },
        {
            "@search.score": 1,
            "id": "1",
            "name": "Joh Doe"
        }
    ]
}
Dial answered 5/6, 2019 at 8:47 Comment(0)
C
7

This is a very good question and thanks for providing a detailed explanation. The easiest way to achieve that would be to use term boosting on the actual term and combine it with a wildcard query. You can modify the query in your post to -

search=Joh^10 OR Joh*&queryType=full

This will score the documents that match Joh exactly higher. If you have more complicated requirements, you can look at constructing a custom analyzer with ngrams to search on them to support partial search.

Clemmie answered 5/6, 2019 at 18:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.