how to ignore apostrophes in elasticsearch?
Asked Answered
P

1

5

How to ignore apostrophes in elasticsearch?

Let's say that I'm looking for a string Paul's. I want to be able to match it when sending terms like: pauls or paul's.

This is how config for my index look like: (I've tried to do this with a custom analyzer, but it doesn't work):

{
    settings: {
        analysis: {
            analyzer: {
                my_analyzer: {
                    tokenizer: 'standard',
                    filter: ['standard', 'lowercase', 'my_stemmer'],
                },
            },
            filter: {
                my_stemmer: {
                    type: 'stemmer',
                    name: 'possessive_english',
                },
            },
        },
    },
    mappings: {
        my_type: {
            properties: {
                description: { type: 'text' },
                title: { type: 'text', analyzer: 'my_analyzer' },
        },
    },
}
Pedaiah answered 15/3, 2018 at 11:13 Comment(0)
V
6

The stemmer doesn't help you when it comes to searching pauls. For that you trully need to ignore the apostrophe '. Below I added a new sub-field to your title field that uses a char_filter to ignore the apostrophe. But in the search itself you need to use both the main field - title - and the sub-field - title.no_stemmer:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "my_stemmer"
          ]
        },
        "no_stemmer_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase"
          ],
          "char_filter": "my_char_filter"
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "possessive_english"
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "'=>"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "description": {
          "type": "text"
        },
        "title": {
          "type": "text",
          "analyzer": "my_analyzer",
          "fields": {
            "no_stemmer": {
              "type": "text",
              "analyzer": "no_stemmer_analyzer"
            }
          }
        }
      }
    }
  }
}

POST test/my_type/_bulk
{"index":{}}
{"title":"Paul's"}
{"index":{}}
{"title":"Paul"}
{"index":{}}
{"title":"Pauls"}

GET test/_search
{
  "query": {
    "multi_match": {
      "fields": ["title", "title.no_stemmer"],
      "query": "Paul's"
    }
  }
}
Vote answered 15/3, 2018 at 15:11 Comment(3)
would a query_string also be a solution to this requirement by searching for "paul's" OR "pauls" ?Nationalism
@EiriniGraonidou yes, something like the following but still using the analyzers I mentioned and the sub-field: "query": { "query_string": { "default_field": "title*", "query": "paul's" } }Vote
Thanks, that works like a charm. Very clever with additional field!Pedaiah

© 2022 - 2024 — McMap. All rights reserved.