Elasticsearch insensitive search accents
Asked Answered
S

1

6

I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.

For example: I have two words. "Camión" and "Camion". When a user search for "camion" I'd like the two results show up.

Creating index:

es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])

es.indices.create(index='name', ignore=400)

es.index(
    index="name",
    doc_type="producto",
    id=p.pk,
    body={
        'title': p.titulo,
        'slug': p.slug,
        'summary': p.summary,
        'description': p.description,
        'image': foto,
        'price': p.price,
        'wholesale_price': p.wholesale_price,
        'reference': p.reference,
        'ean13': p.ean13,
        'rating': p.rating,
        'quantity': p.quantity,
        'discount': p.discount,
        'sales': p.sales,
        'active': p.active,
        'encilleria': p.encilleria,
        'brand': marca,
        'brand_title': marca_titulo,
        'sellos': sellos_str,
        'certificados': certificados_str,
        'attr_naturales': attr_naturales_str,
        'soluciones': soluciones_str,
        'categories': categories_str,
        'delivery': p.delivery,
        'stock': p.stock,
        'consejos': p.consejos,
        'ingredientes': p.ingredientes,
        'es_pack': p.es_pack,
        'temp': p.temp,
        'relevancia': p.relevancia,
        'descontinuado': p.descontinuado,
    }

Search:

    from elasticsearch import Elasticsearch
    es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])

    resul = es.search(
        index="name",
        body={
            "query": {
                "query_string": {
                    "query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
                    "analyze_wildcard": False
                }
            },
            "size": "9999",
        }
    )
    print resul

I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.

Surprise answered 19/7, 2016 at 8:8 Comment(4)
What's the mapping for those fields you use in your query?Cutright
Do you mean in the database? All strings. Do I have to declare anything on the query?Surprise
What database? :-)Cutright
I'm sorry, i'm new with elastic. I mean the index. I've updated the question with all my code. =)Surprise
C
13

You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.

Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "fields": {
            "folded": {
              "type": "string",
              "analyzer": "folding"
            }
          }
        }
      }
    }
  }
}

And the query:

  "query": {
    "query_string": {
      "query": "\\*.folded:camion"
    }
  }

Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html

Cutright answered 19/7, 2016 at 8:24 Comment(3)
I've seen something like this. But where do I put this code? Before the body{} on es.index()?Surprise
I don't know Python. Sorry. That code I provided creates the index with those settings and that mapping. So, the existent index needs to be deleted, that code I provided used to create a new index, data needs to be reindexed.Cutright
@AndreiStefan thanks this seems still to be valid in ES 7.14, but the doc is out dated.Curtis

© 2022 - 2024 — McMap. All rights reserved.