Does huge number of deleted doc count affects ES query performance
Asked Answered
W

1

3

I have few read heavy indices(started seeing performance issues on these indices) in my ES cluster which has ~50 million docs and noticed most of them have around 25% of total documents as deleted, I know that these deleted document count decrease over time when background merge operation happens, But in my case these count is always around ~25% of total documents and I have below questions/concerns:

  1. Will these huge no of deleted count affects the search performance as they are still part of lucene immutable segments and search happens to all the segments and latest version of document is returned, so size of immutable segments would be high as they contains huge number of deleted docs and then another operation to figure out the latest version of doc.
  2. Will periodic merge operation would take lot of time and inefficient if huge number of deleted documents are there?
  3. is there is any way to delete these huge number of deleted docs in one shot as looks like background merge operation is not able to keep up with huge number?

Thanks

Waterfowl answered 12/2, 2020 at 7:20 Comment(2)
a quick option could be to soft delete record and then may be a nightly job which hard deletes the records.Partheniaparthenocarpy
@AshishModi can you explain what do you mean by hard deleting them ? u mean first do a soft delete using a flag in ES index and then actually delete operation ?Waterfowl
D
2

your deleted documents are still part of the index so they impact the search performance ( but I can't tell you if its a huge impact ).

For the periodic merge, Lucene is "reluctant" to merge heavy segments as it requires some disk space and generates a lot of IO.

You can get some precious insight on your segments thanks to the Index Segments API

If you have segments close to the 5GB limit, it is probable that they won't be merged automatically until they are mostly constituted with deleted docs.

You can force a merge on your index with the force merge API

Remember a force merge can generate some stress on a cluster for huge indices. An option exists to only delete documents, that should reduce the burden.

only_expunge_deletes (Optional, boolean) If true, only expunge segments containing document deletions. Defaults to false.

In Lucene, a document is not deleted from a segment; just marked as deleted. During a merge, a new segment is created that does not contain those document deletions.

Regards

Disjointed answered 12/2, 2020 at 10:41 Comment(1)
Thanks for useful information, I would check all these options and get back to youWaterfowl

© 2022 - 2024 — McMap. All rights reserved.