ElasticSearch query_string fails to parse query with some characters
Asked Answered
M

5

8

I'm using the ElasticSearch (2.4) and the official Python client to perform simple queries. My code:

from elasticsearch import Elasticsearch

es_client = Elasticsearch("localhost:9200")
index = "indexName"
doc_type = "docType"

def search(query, search_size):
    body = {
        "fields": ["title"],
        "size": search_size,
        "query": {
            "query_string": {
                "fields": ["file.content"],
                "query": query
            }
        }
    }
    response = es_client.search(index=index, doc_type=doc_type, body=body)
    return response["hits"]["hits"]

search("python", 10) # Works fine.

The problem is when my query contains unbalanced parenthesis or brackets. For example with search("python {programming", 10) ES throws:

elasticsearch.exceptions.RequestError: TransportError(400, u'search_phase_execution_exception', u'Failed to parse query [python {programming}]')

Is that the expected behavior of ES? Doesn't it use a tokenizer to remove all those characters?

Note: This happens to me using Java too.

Myiasis answered 2/11, 2016 at 16:25 Comment(0)
M
9

I was reading the documentation and the query_string is more strict. The following are reserved characters: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

So, like jhilden said, I would have to escape them or use simple_query_string instead.

Docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

Myiasis answered 2/11, 2016 at 21:42 Comment(0)
C
11

I know I am late enough but I am posting here and I hope it'll help others. As we know from the Elasticsearch documentation here ES has some reserved characters.

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

So, now you've two possible solutions to fix it. These are working perfectly for me when I encountered special character issue

Solution 1: Wrap your special characters with \\

"query": {
    "bool": {
      "must": [
        {
          "match": {
            "country_code.keyword": "IT"
          }
        },
        {
          "query_string": {
            "default_field": "display",
            "query": "Magomadas \\(OR\\), Italy"
          }
        }
      ]
    }
  }

Solution 2: Use simple_query_string with no change on your query but it doesn't support default_field, so you can use fields instead.

  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "country_code.keyword": "IT"
          }
        },
        {
          "simple_query_string": {
            "fields": ["display"], 
            "query": "Magomadas (OR), Italy"
          }
        }
      ]
    }
  }
Calvin answered 15/8, 2018 at 6:44 Comment(0)
M
9

I was reading the documentation and the query_string is more strict. The following are reserved characters: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

So, like jhilden said, I would have to escape them or use simple_query_string instead.

Docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

Myiasis answered 2/11, 2016 at 21:42 Comment(0)
H
6

As mentioned in prev answers, some characters need to be escaped;

+ - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

💡 "query": "my:name*&&" should be "query": "my\\:name\\*\\&&"


Regex to the rescue ✨

With the help of a simple regex, we can easily escape these characters

Python

import re

def escape_elasticsearch_query(query):
    return re.sub('(\+|\-|\=|&&|\|\||\>|\<|\!|\(|\)|\{|\}|\[|\]|\^|"|~|\*|\?|\:|\\|\/)', '\\\\\\1', query)


query = 'my:name*&&'
escaped_query = escape_elasticsearch_query(query)
print(escaped_query)

output:

my\:name\*\&&

Javascript

function escapeElasticsearchQuery(query) {
    return query.replace(/(\+|\-|\=|&&|\|\||\>|\<|\!|\(|\)|\{|\}|\[|\]|\^|"|~|\*|\?|\:|\\|\/)/g, '\\$&');
}


let query = 'my:name*&&';
let escapedQuery = escapeElasticsearchQuery(query);
console.log(escapedQuery);

output:

my\:name\*\&&
Herbalist answered 21/12, 2019 at 13:41 Comment(1)
Thanks for the regex. I think I found a slight bug, though. I needed to add an addition backslash to get backslashes properly escaped (only did this in Python): re.sub('(\+|\-|\=|&&|\|\||\>|\<|\!|\(|\)|\{|\}|\[|\]|\^|"|~|\*|\?|\:|\\\|\/)', '\\\\\\1', query)Roughandtumble
G
2

When using query_string in ES it's a bit weird. You need to escape it with a double backslash.

The following fails:

GET index1/job/_search
{
  "query": {
    "query_string": {
      "fields": ["jobNumber"],
      "query": "827950 { foo"
    }
  }
}

The following works

GET index1/job/_search
{
  "query": {
    "query_string": {
      "fields": ["jobNumber"],
      "query": "827950 \\{ foo"
    }
  }
}

Note: if you were using a terms query or something else like that you would not need to escape that {

Georgia answered 2/11, 2016 at 16:54 Comment(0)
U
0

As to the current Elasticsearch version (8.10) there is an undocumented flag "escape" which escapes the query string for you (https://github.com/elastic/elasticsearch/issues/77604)

So you can write this type of request without escaping special symbols yourself:

{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "escape": true,
          "query": "elf bar/eb design",
        }
      }
    }
  }
}
Unavailing answered 28/9, 2023 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.