Elasticsearch Query on indexes whose name is matching a certain pattern
Asked Answered
S

4

6

I have a couple of indexes in my Elasticsearch DB as follows

Index_2019_01

Index_2019_02

Index_2019_03

Index_2019_04

.
.

Index_2019_12

Suppose I want to search only on the first 3 Indexes. I mean a regular expression like this:

select count(*) from Index_2019_0[1-3] where LanguageId="English"

What is the correct way to do that in Elasticsearch?

Stormproof answered 5/2, 2019 at 13:10 Comment(5)
May you please describe what is your use case? The shard names look like timestamps. Do you need to rotate this data somehow? Maybe it is similar to the index rollover case?Boehmer
@NikolayVasiliev I think it is clear! I want to select some documents from some shards in which shards' names are dynamic! – m.r226 1 hour agoStormproof
Seems like you are meaning something different from what Elasticsearch calls a shard. One can't control names of ES shards. As far as I understand you want to query several indexes whose name is matching a certain pattern, correct?Boehmer
@NikolayVasiliev, yes it is. You are right, I changed my question body. Thanks.Stormproof
Thanks for the clarifications, I've posted my answer.Boehmer
B
2

How can I query several indexes with certain names?

This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:

POST /index_2019_01,index_2019_02/_search
{
  "query": {
    "match": {
      "LanguageID": "English"
    }
  }
}

Or, using URI search:

curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'

More details are available here. Note that Elasticsearch requires index names to be lowercase.

Can I use a regex to specify index name pattern?

In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:

The _index is exposed as a virtual field — it is not added to the Lucene index as a real field. This means that you can use the _index field in a term or terms query (or any query that is rewritten to a term query, such as the match, query_string or simple_query_string query), but it does not support prefix, wildcard, regexp, or fuzzy queries.

For instance, the query from above can be rewritten as:

POST /_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "_index": [
              "index_2019_01",
              "index_2019_02"
            ]
          }
        },
        {
          "match": {
            "LanguageID": "English"
          }
        }
      ]
    }
  }
}

Which employs a bool and a terms queries.

Hope that helps!

Boehmer answered 7/2, 2019 at 19:13 Comment(2)
plus 1 for provided links.Stormproof
We can use wildcards like index_2019_0*Accrual
B
3

Why use POST when you are not adding any additional data to it. I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,

GET /index_2019_*/_search
{
  "query": {
    "match": {
      "LanguageID": "English"
    }
  }
}

OR in a URL

curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'
Bremser answered 8/2, 2019 at 11:43 Comment(0)
B
2

How can I query several indexes with certain names?

This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:

POST /index_2019_01,index_2019_02/_search
{
  "query": {
    "match": {
      "LanguageID": "English"
    }
  }
}

Or, using URI search:

curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'

More details are available here. Note that Elasticsearch requires index names to be lowercase.

Can I use a regex to specify index name pattern?

In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:

The _index is exposed as a virtual field — it is not added to the Lucene index as a real field. This means that you can use the _index field in a term or terms query (or any query that is rewritten to a term query, such as the match, query_string or simple_query_string query), but it does not support prefix, wildcard, regexp, or fuzzy queries.

For instance, the query from above can be rewritten as:

POST /_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "_index": [
              "index_2019_01",
              "index_2019_02"
            ]
          }
        },
        {
          "match": {
            "LanguageID": "English"
          }
        }
      ]
    }
  }
}

Which employs a bool and a terms queries.

Hope that helps!

Boehmer answered 7/2, 2019 at 19:13 Comment(2)
plus 1 for provided links.Stormproof
We can use wildcards like index_2019_0*Accrual
I
2

While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.

You can look at the docs here

As an example, lets say you wish the last 3 months from those indices that means that if we have index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02

GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>  

I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)

Inhabit answered 12/1, 2021 at 12:28 Comment(0)
U
0

Using a regex to search for index names might be impossible, as is listed in another post in this question, but it is possible to use wildcard to search for index by names using indices.get request, like GET /index-prefix*.

See docs here

Upland answered 11/7, 2023 at 1:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.