Elasticsearch 2.1: Result window is too large (index.max_result_window)
Asked Answered
S

9

106

We retrieve information from Elasticsearch 2.1 and allow the user to page thru the results. When the user requests a high page number we get the following error message:

Result window is too large, from + size must be less than or equal to: [10000] but was [10020]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter

The elastic docu says that this is because of high memory consumption and to use the scrolling api:

Values higher than can consume significant chunks of heap memory per search and per shard executing the search. It’s safest to leave this value as it is an use the scroll api for any deep scrolling https://www.elastic.co/guide/en/elasticsearch/reference/2.x/breaking_21_search_changes.html#_from_size_limits

The thing is that I do not want to retrieve large data sets. I only want to retrieve a slice from the data set which is very high up in the result set. Also the scrolling docu says:

Scrolling is not intended for real time user requests https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-request-scroll.html

This leaves me with some questions:

1) Would the memory consumption really be lower (any if so why) if I use the scrolling api to scroll up to result 10020 (and disregard everything below 10000) instead of doing a "normal" search request for result 10000-10020?

2) It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?

3) Are there any other options to solve my problem?

Sodamide answered 4/2, 2016 at 16:30 Comment(0)
S
30

The following pages in the elastic documentation talk about deep paging:

https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html

Depending on the size of your documents, the number of shards, and the hardware you are using, paging 10,000 to 50,000 results (1,000 to 5,000 pages) deep should be perfectly doable. But with big-enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory, and bandwidth. For this reason, we strongly advise against deep paging.

Sodamide answered 5/2, 2016 at 10:35 Comment(4)
So here, we should abandon deep pagination, right? Basically there is no meaning of paging 4000 pages for a single viewer. Lets say, google search, hardly we scroll to page 8 or 9 to check results. Usually we only take care of the top 3-5 pages that Google gives us.Vahe
Can we use scroll API in case we need deep pagination?Install
But when we enable sort feature, lets say on an eCommerce site. when user want to see items with highest price. The result will be different when we sort by highest price compare to when we sort by lowest page but go to last page right? since we limit the number of result can be accessed. any work around for this?Microsurgery
Sometimes users navigate to the 'last' page of results, if that's page 4000 then so be it. I wish there was a better solution than scroll, it's really basic and not much use if you are grabbing a specific page in the middle of a set.Laceylach
F
90

If you need deep pagination, one possible solution is to increase the value max_result_window. You can use curl to do this from your shell command line:

curl -XPUT "http://localhost:9200/my_index/_settings" -H 'Content-Type: application/json' -d '{ "index" : { "max_result_window" : 500000 } }'

I did not notice increased memory usage, for values of ~ 100k.

Fara answered 5/2, 2016 at 11:29 Comment(4)
I have the same error 'Result window is too large, from + size must be less than or equal to: [10000] but was [47190]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.') It said it has 4719 pages(every page 10 results). and i think your suggestion works.Vahe
This is a good solution for small amount of documents less than 500000Reynoso
I'm using ES v2.2.0 and I had to change the payload to { "max_result_window" : 500000 } for this to work. So the curl command became - curl -XPUT "http://localhost:9200/my_index/_settings" -d '{ "max_result_window" : 500000 }'Saltire
for those who get header error with this command for newer version of elasticsearch, you need to pass header as well, curl -XPUT "localhost:9200/my_index/_settings" -H "Content-Type: application/json" -d '{"index":{"max_result_window":50000}}'Dissatisfied
E
45

The right solution would be to use scrolling.
However, if you want to extend the results search returns beyond 10,000 results, you can do it easily with Kibana:

Go to Dev Tools and just post the following to your index (your_index_name), specifing what would be the new max result window

enter image description here

PUT your_index_name/_settings
{ 
  "max_result_window" : 500000 
}

If all goes well, you should see the following success response:

{
  "acknowledged": true
}
Estrous answered 11/6, 2017 at 23:38 Comment(2)
I tried following the way of doing this in the elasticsearch code (put_settings etc..) and reached many errors. This save me hours! Thank you!Statuesque
With GET your_index_name/_settings you'll get your current settingsKnowhow
S
30

The following pages in the elastic documentation talk about deep paging:

https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html

Depending on the size of your documents, the number of shards, and the hardware you are using, paging 10,000 to 50,000 results (1,000 to 5,000 pages) deep should be perfectly doable. But with big-enough from values, the sorting process can become very heavy indeed, using vast amounts of CPU, memory, and bandwidth. For this reason, we strongly advise against deep paging.

Sodamide answered 5/2, 2016 at 10:35 Comment(4)
So here, we should abandon deep pagination, right? Basically there is no meaning of paging 4000 pages for a single viewer. Lets say, google search, hardly we scroll to page 8 or 9 to check results. Usually we only take care of the top 3-5 pages that Google gives us.Vahe
Can we use scroll API in case we need deep pagination?Install
But when we enable sort feature, lets say on an eCommerce site. when user want to see items with highest price. The result will be different when we sort by highest price compare to when we sort by lowest page but go to last page right? since we limit the number of result can be accessed. any work around for this?Microsurgery
Sometimes users navigate to the 'last' page of results, if that's page 4000 then so be it. I wish there was a better solution than scroll, it's really basic and not much use if you are grabbing a specific page in the middle of a set.Laceylach
A
3

Use the Scroll API to get more than 10000 results.

Scroll example in ElasticSearch NEST API

I have used it like this:

private static Customer[] GetCustomers(IElasticClient elasticClient)
{
    var customers = new List<Customer>();
    var searchResult = elasticClient.Search<Customer>(s => s.Index(IndexAlias.ForCustomers())
                          .Size(10000).SearchType(SearchType.Scan).Scroll("1m"));

    do
    {
        var result = searchResult;
        searchResult = elasticClient.Scroll<Customer>("1m", result.ScrollId);
        customers.AddRange(searchResult.Documents);
    } while (searchResult.IsValid && searchResult.Documents.Any());

    return customers.ToArray();
}
Appointee answered 11/1, 2017 at 12:51 Comment(0)
F
0

If you want more than 10000 results then in all the data nodes the memory usage will be very high because it has to return more results in each query request. Then if you have more data and more shards then merging those results will be inefficient. Also es cache the filter context, hence again more memory. You have to trial and error how much exactly you are taking. If you are getting many requests in small window you should do multiple query for more than 10k and merge it by urself in the code, which is supposed to take less application memory then if you increase the window size.

Fiat answered 11/3, 2017 at 7:42 Comment(0)
I
0

2) It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?

--> You can define this value in index templates , es template will be applicable for new indexes only ,so you either have to delete old indexes after creating template or wait for new data to be ingested in elasticsearch .

{ "order": 1, "template": "index_template*", "settings": { "index.number_of_replicas": "0", "index.number_of_shards": "1", "index.max_result_window": 2147483647 },

Intellectualism answered 25/4, 2017 at 21:19 Comment(0)
M
0

In my case it looks like reducing the results via the from & size prefixes to the query will remove the error as we don't need all the results:

GET widgets_development/_search
{
  "from" : 0, 
  "size": 5,
  "query": {
    "bool": {}
  },
  "sort": {
    "col_one": "asc"
  }
}
Mouton answered 22/1, 2020 at 19:3 Comment(0)
E
0

Despite answers mentioning scrolling as the right solution this isn't so in later versions any longer:

We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).

Paginate search results > Scroll search results

Experientialism answered 3/5, 2023 at 8:48 Comment(0)
A
0

Simple Solution The default limit is 10000

my_index/_settings
 {
   "index" : { 
      "max_result_window" : 20000 
   } 
}

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

Aleppo answered 28/9, 2023 at 13:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.