I wanted to know more about elastic delete, it's Java high level delete api & weather it's feasible to perform bulk delete.
Following are the config information
- Java: 8
- Elastic Version: 7.1.1
Elastic dependencies added:
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.1.1</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>7.1.1</version> </dependency>
In my case daily around 10K records are added into the index dev-answer
.
I want to trigger delete operation (this can be triggered daily or once in a week or once in a month) which will basically delete all documents from above index
if specific condition is satisfied. (Which I'll give in DeleteByQueryRequest
)
For delete there is an api as given in latest doc which I'm referring.
DeleteByQueryRequest request = new DeleteByQueryRequest("source1", "source2");
While reading the documentation I came across following queries which I'm unable to understand.
As in doc:
It’s also possible to limit the number of processed documents by setting size. request.setSize(10);
What does processed document means ? Will it delete only 10 documents ?What batch size I should set ?
request.setBatchSize(100);
it's performance is based on how many documents we are going to delete ?Should I first make a call to
get no of documents
& based on thatsetBatchSize
should be changed ?request.setSlices(2);
Slices should be depend on how many cores executor machine have or on no of cores in elastic cluster ?In documentation the method
setSlices(2)
is given which I'm unable to find in classorg.elasticsearch.index.reindex.DeleteByQueryRequest
. What I'm missing here ?Let's consider if I'm executing this delete query in async mode which is taking 0.5-1.0 sec, meanwhile if I'm doing get request on this index, will it give some exception ? Also in the same time if I inserted new document & retrieving the same, will it be able to give response ?
_delete_by_query
endpoint not the_bulk
endpoint ? If you are asking about the_delete_by_query
endpoint, can you rename the question to avoid misunderstanding because_bulk
also allows to delete documents. – Craniopublic final void deleteByQueryAsync(DeleteByQueryRequest deleteByQueryRequest, RequestOptions options, ActionListener<BulkByScrollResponse> listener)
from classorg.elasticsearch.client.RestHighLevelClient
. So it gonna make a bulk request or delete by query ? Also in case of deleting more than 10K in very few cases close to 1K records which will be good ?delete_by_query
or_bulk
? – CharlenacharlenedeleteByQuery
which returnsBulkByScrollResponse
so the confusion raised where it's_bulk
delete ordelete_by_query
– Charlenacharlene_delete_by_query
endpoint will internally performsbulk
requests to delete documents efficiently, but they are definitely different endpoints. – Craniodelete_by_query
instead ofdelete
in the title to avoid confusion and help other users finding this question ?delete
is also another endpoint. I'm writing a complete answer to your question. – Cranio