Delete Documents from CosmosDB based on condition through Query Explorer
Asked Answered
R

6

35

What's the query or some other quick way to delete all the documents matching the where condition in a collection?
I want something like DELETE * FROM c WHERE c.DocumentType = 'EULA' but, apparently, it doesn't work.

Note: I'm not looking for any C# implementation for this.

Reiterant answered 27/6, 2017 at 7:46 Comment(5)
As Gaurav Mantri mentioned, it is not supported currently. There is a feedback under review.Abecedarian
Interestingly this is possible using the Graph APIs as you can do .drop() on the result of a traversal to remove all selected docs....Balderdash
@JesseCarter could u describe in a bit detailReiterant
@JerryGoyal Unfortunately unless you're fully embracing graph I don't think it will work for your usecase as Cosmos expects a specific document format. But for example you could do something like g.V().has('DocumentType', eq('EULA')).drop() and clear out all docs that matchedBalderdash
where do i run this queryReiterant
P
21

This is a bit old but just had the same requirement and found a concrete example of what @Gaurav Mantri wrote about.

The stored procedure script is here:

https://social.msdn.microsoft.com/Forums/azure/en-US/ec9aa862-0516-47af-badd-dad8a4789dd8/delete-multiple-docdb-documents-within-the-azure-portal?forum=AzureDocumentDB

Go to the Azure portal, grab the script from above and make a new stored procedure in the database->collection you need to delete from.

Then right at the bottom of the stored procedure pane, underneath the script textarea is a place to put in the parameter. In my case I just want to delete all so I used:

SELECT c._self FROM c

I guess yours would be:

SELECT c._self FROM c WHERE c.DocumentType = 'EULA'

Then hit 'Save and Execute'. Viola, some documents get deleted. After I got it working in the Azure Portal I switched over the Azure DocumentDB Studio and got a better view of what was happening. I.e. I could see I was throttled to deleting 18 a time (returned in the results). For some reason I couldn't see this in the Azure Portal.

Anyway, pretty handy even if limited to a certain amount of deletes per execution. Executing the sp is also throttled so you can't just mash the keyboard. I think I would just delete and recreate the Collection unless I had a manageable number of documents to delete (thinking <500).

Props to Mimi Gentz @Microsoft for sharing the script in the link above.

HTH

Potshot answered 29/12, 2017 at 11:46 Comment(1)
Thanks for this, it is helpful when you need to run it against a single partition. But where the query itself is cross partition the stored procedure option is not feasible as you can only run a store proc against a partition currently.Cinquefoil
B
19

I want something like DELETE * FROM c WHERE c.DocumentType = 'EULA' but, apparently, it doesn't work.

Deleting documents this way is not supported. You would need to first select the documents using a SELECT query and then delete them separately. If you want, you can write the code for fetching & deleting in a stored procedure and then execute that stored procedure.

Butterfish answered 27/6, 2017 at 7:50 Comment(1)
i have code for select now that selected data i want to delete from cosmos how to do that please guide with code snippetMeditation
R
7

I wrote a script to list all the documents and delete all the documents, it can be modified to delete the selected documents as well.

var docdb = require("documentdb");
var async = require("async");

var config = {
  host: "https://xxxx.documents.azure.com:443/",
  auth: {
    masterKey: "xxxx"
  }
};

var client = new docdb.DocumentClient(config.host, config.auth);

var messagesLink = docdb.UriFactory.createDocumentCollectionUri("xxxx", "xxxx");

var listAll = function(callback) {
  var spec = {
    query: "SELECT * FROM c",
    parameters: []
  };

  client.queryDocuments(messagesLink, spec).toArray((err, results) => {
    callback(err, results);
  });
};

var deleteAll = function() {
  listAll((err, results) => {
    if (err) {
      console.log(err);
    } else {
      async.forEach(results, (message, next) => {
        client.deleteDocument(message._self, err => {
          if (err) {
            console.log(err);
            next(err);
          } else {
            next();
          }
        });
      });
    }
  });
};

var task = process.argv[2];
switch (task) {
  case "listAll":
    listAll((err, results) => {
      if (err) {
        console.error(err);
      } else {
        console.log(results);
      }
    });
    break;
  case "deleteAll":
    deleteAll();
    break;

  default:
    console.log("Commands:");
    console.log("listAll deleteAll");
    break;
}
Rudolf answered 26/7, 2018 at 8:19 Comment(1)
Welcome to Stack Overflow! While links are great way of sharing knowledge, they won't really answer the question if they get broken in the future. Add to your answer the essential content of the link which answers the question. In case the content is too complex or too big to fit here, describe the general idea of the proposed solution. Remember to always keep a link reference to the original solution's website. See: How do I write a good answer?Renee
S
1

And if you want to do it in C#/Dotnet Core, this project may help: https://github.com/lokijota/CosmosDbDeleteDocumentsByQuery. It's a simple Visual Studio project where you specify a SELECT query, and all the matches will be a) backed up to file; b) deleted, based on a set of flags.

Sporty answered 11/3, 2021 at 14:23 Comment(0)
U
0

create stored procedure in collection and execute it by passing select query with condition to delete. The major reason to use this stored proc is because of continuation token which will reduce RUs to huge extent and will cost less.

Unclasp answered 10/11, 2021 at 9:50 Comment(0)
R
-1
##### Here is the python script which can be used to delete data from Partitioned Cosmos Collection #### This will delete documents Id by Id based on the result set data.

Identify the data that needs to be deleted before below step

res_list = "select id from id_del"
res_id = [{id:x["id"]} 
             for x in sqlContext.sql(res_list).rdd.collect()]
config = {
   "Endpoint" : "Use EndPoint"
  "Masterkey" : "UseKey", 
      "WritingBatchSize" : "5000",
    'DOCUMENTDB_DATABASE': 'Database',
    'DOCUMENTDB_COLLECTION': 'collection-core'
}; 

for row in res_id:
# Initialize the Python DocumentDB client
  client = document_client.DocumentClient(config['Endpoint'], {'masterKey': config['Masterkey']})

# use a SQL based query to get   documents

## Looping thru partition to delete

  query = { 'query': "SELECT c.id FROM c where c.id = "+ "'" +row[id]+"'"   }
  print(query)
  options = {}
  options['enableCrossPartitionQuery'] = True
  options['maxItemCount'] = 1000
  result_iterable = client.QueryDocuments('dbs/Database/colls/collection-core', query, options)
  results = list(result_iterable)
  print('DOCS TO BE DELETED : ' + str(len(results)))
  if len(results) > 0 :
      for i in range(0,len(results)):
      #  print(results[i]['id'])
          docID = results[i]['id']
          print("docID :" + docID)
          options = {}
          options['enableCrossPartitionQuery'] = True
          options['maxItemCount'] = 1000
          options['partitionKey'] = docID
          client.DeleteDocument('dbs/Database/colls/collection-core/docs/'+docID,options=options)
          print ('deleted Partition:' +  docID)
Retentivity answered 9/11, 2019 at 3:5 Comment(1)
Can you clarify what this is? This looks like Spark code (.rdd), not plain Python.Schlep

© 2022 - 2024 — McMap. All rights reserved.