How does MarkLogic's "xdmp:collection-delete" work?
Asked Answered
D

2

5

I have a scenario where most of the documents i want to delete are in a collection called "expired". I do not want to overload my servers by running a long running process which would iterate over documents and delete them one by one i would rather do them in batch size using document-delete.

So my question is how does xdmp:collection-delete work ?

Does it iterate over documents and delete them ?

or

Does it do something like DROP Table in SQL and its "instantaneous" ?

I want to know what is the background process for xdmp:collection-delete. I wonder if anyone can draw the flow of how this function handles document for deletion as i want to understand the process in more depth than just overview of what it does.

Demit answered 19/9, 2016 at 10:43 Comment(4)
Keep in mind that dropping a table is not quite the same thing as deleting a collection of documents..Propitiate
Can you elaborate what you are after exactly? It essentially comes down to iterating over docs, locking them, and deleting them, all in one transaction. In certain circumstances it can take a few short-cuts, but it still needs to do all that, just as described below..Propitiate
we are looking to delete millions of documents in our database without overloading the server as there are other processes running on it as well and we have a limited cpu and memory which is why we have built our own purger which takes in batch sizes and does the purging in small chunks. I was wondering if this function was a better approach over our custom purger.Demit
one more detail, all those documents are in a single collection called "expired"Demit
B
8

xdmp:collection-delete() will delete all documents in the collection in a single transaction. While it's not instantaneous, it should be fast, as it just needs to set the deletion timestamp of each document.

Betoken answered 19/9, 2016 at 11:26 Comment(6)
in the case of million documents does fast mean seconds ? Will it hold locks on documents for a long time? ( which is basically we are trying to avoid. )Demit
There are a few criteria to meet before xdmp:collection-delete will be executed in so-called fast mode, but even in fast mode it has to put locks for what is being deleted for ACID compliancy.Propitiate
i am actually going to run this on a LIVE server so i want to know what is the background process for xdmp:collection-delete. I wonder if anyone can draw the flow of how this function handles document for deletion as i want to understand the process in more depth than just overview of what it does.Demit
Another way to potentially improve overall performance wrt updates is to set your app server to run in nonblocking MVCC mode instead of the default contemporaneous mode. Depending on the requirements of your application, however, this may not be feasible. Essentially, it means that an update in the commit phase will not block a read/query transaction, which will simply read from the most recent timestamp for which all transactions are known to have committed, instead of waiting for the commit to complete.Usk
Curious - if these are in a collection called 'expired', if you write your queries to ignore items in the expired collection >> cts:not-query(cts:collection-query('expired')) <<, then does it matter as much if they take a bit of time to delete of they are already isolated from your active queries?Cotswold
I'd say yes, ML still needs to lock the docs for deletion, unless you disable locking as suggested by wst..Propitiate
D
1

You may try to use corb to delete documents one by one. You may increase threads though for parallel processing.

Derwon answered 19/9, 2016 at 12:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.