How to do a case-insensitive search on a keyword field in Elasticsearch without reindexing?
Asked Answered
A

2

11

I have a keyword field like this:

"address": {
   "type": "keyword"
}

The value is written in camel-case, as it's intended for display. e.g "1/10 Somewhere Rd, Somewhere AAA 3333"

Now, i want to do a case-insensitive search on that field, ideally without reindexing the entire index.

What i've found so far is that match can be used for case-insensitive search, but only works on text fields.

Is my only option reindexing, e.g via a text field, or using a lowercase/custom normalizer/analyser? Any way i can do it without reindexing?

Acidhead answered 6/8, 2020 at 22:20 Comment(0)
O
14

You are right that match queries can be used for case-insensitive search as it applied the same analyzer which was used at index time but works only for text field.

Problem here is that while indexing, as you have used keyword field, so tokens in elasticsearch inverted index, which is used to match the tokens of search query are not lowercased so it is not possible at all to provide case-insensitive search.

Let's understand above statement using an example:

Let's suppose you have Foo BAR in your document, which you indexed using keyword field. Please note the case of each character, so inverted index will have below token.

Foo BAR, now at query time by some hook or crook, you can convert search term to either all uppercase or lowercase but in this case still it won't match the tokens, so you will still have a lot of issues in search results.

I would suggest, add a new field which uses text and using reindex API create a fresh index and implement it in clean manner, also using reindex API, you can build a new index from old index, and its much faster to build a new index, rather than build from source of truth(SQL in most cases).

Overstride answered 7/8, 2020 at 0:47 Comment(1)
it's possible to add a new field, let's say "lowercase", to your keyword field with a custom analyzer which takes the lowercase token filter. You would then base your queries on this lowercase field rather than the main keyword fieldMilo
O
1

As of version 7.10, the Term query supports a case_insensitive parameter

GET /_search
{
  "query": {
    "term": {
      "user.id": {
        "value": "kimchy",
        "case_insensitive": true
      }
    }
  }
}

However, as of v8.10 the Terms query does not support case_insensitive.

The best approach I've found is this that uses a normalizer which lowercases the string at index time. That way you can use any query that supports normalization and get case insensitive matches.

Oxysalt answered 19/10, 2023 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.