Optimize API for reducing the segments and eliminating ES deleted docs not working
Asked Answered
D

2

3

This is in continuation of my previous question Does huge number of deleted doc count affects ES query performance related to deleted docs in my ES index.

As pointed in the answer, I used optimize API as I am using the ES 1.X version where force merge API is not available but after reading about optimize API github link(provided earlier as couldn't find it on ES site) by Say Bannon founder of elastic, looks like it does the same work.

I got the success message for my index after running the optimize API, but I don't see total count of deleted docs decreasing and I am worried as when I checked the segments of my index using segments API, I see there are more than 25 segments for each shard and every shard is holding 250-1 gb of data in memory and almost 500k docs, while I see there are some shards where there is few deleted docs.

So my question are:

  1. My index is having multiple shards across multiple data nodes and when I ran optimize API using only 1 node URL, then does it only merges the segments on that node?
  2. In segment API result it shows the node-id like "node": "f2hsqeamadnaskda", while I am using KOPF plugin and have custom names for my data nodes, so How can I relate this cryptic node name to my human readable node name to identify whether statement 1 is correct or not?
  3. As there is no documentation available on optimize API, is it possible to merge segments on all shards across all nodes in single shot? and do I need to make index read-only before applying it?
Distort answered 13/2, 2020 at 9:34 Comment(0)
E
1

@Nirmal has answered your first two questions, so:

  1. As there is no documentation available on optimize API, is it possible to merge segments on all shards across all nodes in single shot? and do I need to make index read-only before applying it?

There is documentation available for 1.x: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-optimize.html. You are probably looking for calls like these:

  • GET <index_pattern>/_cat/segments: List all segments in all the shards (can be thousands). Also lists deleted docs.
  • POST <index_pattern>/_optimize?max_num_segments=1: Force merge all segments to 1 single segment per shard. Do this when the index is no longer being written to. It helps to reduce load on CPU/RAM on the data nodes.
  • POST <index_pattern>/_optimize?only_expunge_deletes=true: only remove deleted docs

Finally, you can use * as <index_pattern> to just do all indices on the whole cluster.

Evoke answered 17/2, 2020 at 12:38 Comment(0)
V
0

force_merge or optimize call gets applied to entire index, you dont have to do them at node level.

You can use _cat api to find out nodeid:Ip mapping.In case your version does not support _cat api ( < 1.0) , use cluster state api

Vespine answered 16/2, 2020 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.