Delete All Azure Table Records
Asked Answered
I

8

25

I have an Azure Storage Table and it has 3k+ records.

What is the most efficient way to delete all the rows in the table?

Issuable answered 12/10, 2014 at 14:53 Comment(0)
B
31

For 3000 records, easiest way would be to delete the table. However please note that when you delete the table, it is not deleted at that time but is put in some kind of queue to be deleted and is actually deleted some time later. This time depends on the load on the system + number of entities in the table. During this time, you will not be able to recreate this table or use this table.

If it is important for you to keep using the table, the only other option is to delete entities. For faster deletes, you can look at deleting entities using Entity Batch Transactions. But for deleting entities, you would need to first fetch the entities. You can speed up the fetching process by only fetching PartitionKey and RowKey attributes of the entities instead of fetching all attributes as only these two attributes are required for deleting an entity.

Bade answered 12/10, 2014 at 15:5 Comment(2)
Hi Gaurav - would you know how much time it takes before the table actually gets deleted ?Fractostratus
#15509017 - that post says that it takes atleast 40 seconds to delete a table - but also indicates that it might be much longer if the table is large :-/Fractostratus
T
10

For someone finding this later, the problem with the accepted answer "just deleted the table" is that while it works great in the storage emulator, it will fail randomly in production. If your app/service requires regenerating tables regularly then you'll find that you'll have failures due to either conflicts or deletion still in progress.

Instead, I found the fastest and most error proof EF friendly approach to be deleting all rows within a segmented query. Below is a simple drop-in example that I'm using. Pass in your client, table name, and a type that implements ITableEntity.

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;
                                         
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);  
            foreach (var row in result)
            {
                var op = TableOperation.Delete(row);
                tableref.ExecuteAsync(op);
            }
            token = result.ContinuationToken;
        } while (token != null);  
        
    }

Example Usage:

table = client.GetTableReference("TodayPerformanceSnapshot");
created = await table.CreateIfNotExistsAsync();

if(!created)
{ 
    // not created, table already existed, delete all content
   await DeleteAllRows<TodayPerformanceContainer>("TodayPerformanceSnapshot", client);
   log.Information("Azure Table:{Table} Purged", table);
}

A batched approach takes significantly more effort since you have to handle both the "only same partition keys in a batch" and "only 100 rows" limitations. The following version of DeleteAllRows does this.

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;            
        TableBatchOperation batchops = new TableBatchOperation();
        Dictionary<string, Stack<TableOperation>> pendingOperations = new Dictionary<string, Stack<TableOperation>>();
        
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
            foreach (var row in result)
            {
               var op = TableOperation.Delete(row);
                if (pendingOperations.ContainsKey(row.PartitionKey))
                {
                    pendingOperations[row.PartitionKey].Push(op);
                }
                else
                {
                    pendingOperations.Add(row.PartitionKey, new Stack<TableOperation>() );
                    pendingOperations[row.PartitionKey].Push(op);
                }                                    
            }
            token = result.ContinuationToken;
        } while (token != null);

        // order by partition key            
        foreach (var key in pendingOperations.Keys)
        {                
            log.Information($"Deleting:{key}");                
            var rowStack = pendingOperations[key];
            int max = 100;
            int current = 0;

            while (rowStack.Count != 0)
            {
                // dequeue in groups of 100
                while (current < max && rowStack.Count > 0)
                {
                    var op = rowStack.Pop();
                    batchops.Add(op);
                    current++;
                }

                //execute and reset
                _ = await tableref.ExecuteBatchAsync(batchops);
                log.Information($"Deleted batch of size:{batchops.Count}");
                current = 0;
                batchops.Clear();
            }
        }                       
    }
Tempietempla answered 1/7, 2021 at 2:55 Comment(0)
J
2

I use something like this. We partition key by date, your case may be different:

async Task Main()
{
    var startDate = new DateTime(2011, 1, 1);
    var endDate = new DateTime(2012, 1, 1);

    var account = CloudStorageAccount.Parse("connString");
    var client = account.CreateCloudTableClient();
    var table = client.GetTableReference("TableName");

    var dates = Enumerable.Range(0, Math.Abs((startDate.Month - endDate.Month) + 12 * (startDate.Year - endDate.Year)))
        .Select(offset => startDate.AddMonths(offset))
        .ToList();

    foreach (var date in dates)
    {
        var key = $"{date.ToShortDateString()}";

        var query = $"(PartitionKey eq '{key}')";
        var rangeQuery = new TableQuery<TableEntity>().Where(query);

        var result = table.ExecuteQuery<TableEntity>(rangeQuery);
        $"Deleting data from {date.ToShortDateString()}, key {key}, has {result.Count()} records.".Dump();

        var allTasks = result.Select(async r =>
        {
            try
            {
                await table.ExecuteAsync(TableOperation.Delete(r));
            }
            catch (Exception e) { $"{r.RowKey} - {e.ToString()}".Dump(); }
        });
        await Task.WhenAll(allTasks);
    }
}
Jetton answered 14/9, 2018 at 3:47 Comment(0)
F
1

This depends on the structure of your data, but if you can compose a query for all records, you can add each to a TableBatchOperation and execute them all at once.

Here's an example that just gets all the results inside the same partition key, adapted from How to get started with Azure Table storage and Visual Studio connected services.

// query all rows
CloudTable peopleTable = tableClient.GetTableReference("myTableName");
var query = new TableQuery<MyTableEntity>();
var result = await remindersTable.ExecuteQuerySegmentedAsync(query, null);

// Create the batch operation.
TableBatchOperation batchDeleteOperation = new TableBatchOperation();

foreach (var row in result)
{
    batchDeleteOperation.Delete(row);
}

// Execute the batch operation.
await remindersTable.ExecuteBatchAsync(batchDeleteOperation);
Fard answered 14/11, 2018 at 5:18 Comment(1)
I use something similar as KyleMit, but TableBatchOperations can contain max 100 items, so at the end of the foreach-loop I would check the count of batchDeleteOperation and ExecuteBatchAsync for each batch of 100 items.Anatolic
T
1

I use the following function to first put all partitions keys in a queue and then loop through the key to delete all rows in batches of 100.

Queue queue = new Queue();
            queue.Enqueue("PartitionKeyTodelete1");
            queue.Enqueue("PartitionKeyTodelete2");
            queue.Enqueue("PartitionKeyTodelete3");

            while (queue.Count > 0)
            {
                string partitionToDelete = (string)queue.Dequeue();

                TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
                  .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionToDelete))
                  .Select(new string[] { "PartitionKey", "RowKey" });

                TableContinuationToken continuationToken = null;

                do
                {
                    var tableQueryResult = await myTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);

                    continuationToken = tableQueryResult.ContinuationToken;

                    // Split into chunks of 100 for batching
                    List<List<TableEntity>> rowsChunked = tableQueryResult.Select((x, index) => new { Index = index, Value = x })
                        .Where(x => x.Value != null)
                        .GroupBy(x => x.Index / 100)
                        .Select(x => x.Select(v => v.Value).ToList())
                        .ToList();

                    // Delete each chunk of 100 in a batch
                    foreach (List<TableEntity> rows in rowsChunked)
                    {
                        TableBatchOperation tableBatchOperation = new TableBatchOperation();
                        rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));

                        await myTable.ExecuteBatchAsync(tableBatchOperation);
                    }
                }
                while (continuationToken != null);
            }
Toleration answered 26/1, 2020 at 9:14 Comment(0)
R
1

I recently wrote a library that can do exactly that.

Source/docs: https://github.com/pflajszer/AzureTablesLifecycleManager

for your use case, the code would look something like this:

// inject ITableManager in the constructor:

private readonly ITableManager _api;

public MyClass(ITableManager api)
{
    _api = api;
}
/// <summary>
/// Delete all data from a single table
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<T>> DeleteTableDataAsync<T>(string tableName) where T : class, ITableEntity, new()
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    // this query will return all the data from the table:
    Expression<Func<T, bool>> dataQuery = x => true;
             
    // ... but you can use LINQ to filter results too, like:
    // Expression<Func<T, bool>> anotherExampleOfdataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);

    return _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
}

... or, as Gaurav Mantri suggested, you can just delete the table itself:

/// <summary>
/// Delete a single table
/// </summary>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<TableItem>> DeleteTableAsync(string tableName)
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    return _api.DeleteTablesAsync(tableQuery);
}
Ricebird answered 30/10, 2021 at 11:58 Comment(2)
Holy shit I have been waiting for a year for a tool like this !!! Your delete data from table based on LINQ and drop table are muchhhhh needed functionality that Ive been missing.Teeter
Thanks for your kind words @shelbaz. I'm glad you found it useful. Feel free to flag any issues you encounter!Ricebird
B
1

Rather late to the show but this may be useful for people (like me) who come across this. There is a way to do this in Azure Storage Explorer.

Firstly open the table that you want to clear.

If needed run a query so that only the records you want to delete are returned.

Click on the double right arrows on the navigator under the data

Navigator

This will cache the data.

You can now use the Drop down on the Select all button to use the Select All Cached option

Select All button

Finally just click the Delete button.

Baghdad answered 29/3, 2023 at 16:17 Comment(0)
A
0

Here's my solution using the new(er) Azure.Data.Tables SDK with the following enhancements:

  • Getting 1000 rows per page
  • Getting only PartitonKey & RowKey for each row
  • Grouping rows to delete into batches of 100s max by PartitionKey
  • Written as extension methods to the TableClient so it's easily reusable

Note: I'm using the System.Linq.Async nuget package to make the code a bit more readable.

/// <summary>
/// Deletes all rows from the table
/// </summary>
/// <param name="tableClient">The authenticated TableClient</param>
/// <returns></returns>
public static async Task DeleteAllEntitiesAsync(this TableClient tableClient)
{
    // Only the PartitionKey & RowKey fields are required for deletion
    AsyncPageable<TableEntity> entities = tableClient
        .QueryAsync<TableEntity>(select: new List<string>() { "PartitionKey", "RowKey" }, maxPerPage: 1000);

    await entities.AsPages().ForEachAwaitAsync(async page => {
        // Since we don't know how many rows the table has and the results are ordered by PartitonKey+RowKey
        // we'll delete each page immediately and not cache the whole table in memory
        await BatchManipulateEntities(tableClient, page.Values, TableTransactionActionType.Delete).ConfigureAwait(false);
    });
}

/// <summary>
/// Groups entities by PartitionKey into batches of max 100 for valid transactions
/// </summary>
/// <returns>List of Azure Responses for Transactions</returns>
public static async Task<List<Response<IReadOnlyList<Response>>>> BatchManipulateEntities<T>(TableClient tableClient, IEnumerable<T> entities, TableTransactionActionType tableTransactionActionType) where T : class, ITableEntity, new()
{
    var groups = entities.GroupBy(x => x.PartitionKey);
    var responses = new List<Response<IReadOnlyList<Response>>>();
    foreach (var group in groups)
    {
        List<TableTransactionAction> actions;
        var items = group.AsEnumerable();
        while (items.Any())
        {
            var batch = items.Take(100);
            items = items.Skip(100);

            actions = new List<TableTransactionAction>();
            actions.AddRange(batch.Select(e => new TableTransactionAction(tableTransactionActionType, e)));
            var response = await tableClient.SubmitTransactionAsync(actions).ConfigureAwait(false);
            responses.Add(response);
        }
    }
    return responses;
}
Abroms answered 29/3, 2022 at 18:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.