Windows Azure - Cleaning Up The WADLogsTable
Asked Answered
I

7

15

I've read conflicting information as to whether or not the WADLogsTable table used by the DiagnosticMonitor in Windows Azure will automatically prune old log entries.

I'm guessing it doesn't, and will instead grow forever - costing me money. :)

If that's the case, does anybody have a good code sample as to how to clear out old log entries from this table manually? Perhaps based on timestamp? I'd run this code from a worker role periodically.

Indohittite answered 1/8, 2011 at 22:27 Comment(0)
V
9

The data in tables created by Windows Azure Diagnostics isn't deleted automatically.

However, Windows Azure PowerShell Cmdlets contain cmdlets specifically for this case.

PS D:\> help Clear-WindowsAzureLog

NAME Clear-WindowsAzureLog

SYNOPSIS Removes Windows Azure trace log data from a storage account.

SYNTAX Clear-WindowsAzureLog [-DeploymentId ] [-From ] [-To ] [-StorageAccountName ] [-StorageAccountKey ] [-UseD evelopmentStorage] [-StorageAccountCredentials ] []

Clear-WindowsAzureLog [-DeploymentId <String>] [-FromUtc <DateTime>] [-ToUt
c <DateTime>] [-StorageAccountName <String>] [-StorageAccountKey <String>]
[-UseDevelopmentStorage] [-StorageAccountCredentials <StorageCredentialsAcc
ountAndKey>] [<CommonParameters>]

You need to specify -ToUtc parameter, and all logs before that date will be deleted.

If cleanup task needs to be performed on Azure within the worker role, C# cmdlets code can be reused. PowerShell Cmdlets are published under permissive MS Public License.

Basically, there are only 3 files needed without other external dependencies: DiagnosticsOperationException.cs, WadTableExtensions.cs, WadTableServiceEntity.cs.

Vennieveno answered 2/2, 2012 at 9:49 Comment(4)
i can't seem to find this command? I've installed the CLI and PS commands from windowsazure.com/en-us/downloads/?fb=en-us but I alwyas get an 'not recognized cmdlet' when I try to run that. Even the help fails to return any info (and suggest I update help, which I have). HELP!Someplace
The codeplex link attached to the post does not seem to be working.Hinny
Azure Powershell cmdlets were moved to Github. Updated link accordingly.Aconcagua
I'm new to azure and powershell. Do I just grab the cs files from github and drop them in a folder somewhere? Whats the process to get this working in powershell?Jurdi
B
5

Updated function of Chriseyre2000. This provides much more performance for those cases where you need to delete many thousands records: search by PartitionKey and chunked step-by-step process. And remember that the best choice it is to run it near storage (in cloud service).

public static void TruncateDiagnostics(CloudStorageAccount storageAccount, 
    DateTime startDateTime, DateTime finishDateTime, Func<DateTime,DateTime> stepFunction)
{
        var cloudTable = storageAccount.CreateCloudTableClient().GetTableReference("WADLogsTable");

        var query = new TableQuery();
        var dt = startDateTime;
        while (true)
        {
            dt = stepFunction(dt);
            if (dt>finishDateTime)
                break;
            var l = dt.Ticks;
            string partitionKey =  "0" + l;
            query.FilterString = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThan, partitionKey);
            query.Select(new string[] {});
            var items = cloudTable.ExecuteQuery(query).ToList();
            const int chunkSize = 200;
            var chunkedList = new List<List<DynamicTableEntity>>();
            int index = 0;
            while (index < items.Count)
            {
                var count = items.Count - index > chunkSize ? chunkSize : items.Count - index;
                chunkedList.Add(items.GetRange(index, count));
                index += chunkSize;
            }
            foreach (var chunk in chunkedList)
            {
                var batches = new Dictionary<string, TableBatchOperation>();
                foreach (var entity in chunk)
                {
                    var tableOperation = TableOperation.Delete(entity);
                    if (batches.ContainsKey(entity.PartitionKey))
                        batches[entity.PartitionKey].Add(tableOperation);
                    else
                        batches.Add(entity.PartitionKey, new TableBatchOperation {tableOperation});
                }

                foreach (var batch in batches.Values)
                    cloudTable.ExecuteBatch(batch);
            }
        }
}
Brout answered 31/7, 2013 at 16:34 Comment(0)
C
3

You could just do it based on the timestamp but that would be very inefficient since the whole table would need to be scanned. Here is a code sample that might help where the partition key is generated to prevent a "full" table scan. http://blogs.msdn.com/b/avkashchauhan/archive/2011/06/24/linq-code-to-query-windows-azure-wadlogstable-to-get-rows-which-are-stored-after-a-specific-datetime.aspx

Clinic answered 18/1, 2012 at 7:32 Comment(0)
L
2

Here is a solution that trunctates based upon a timestamp. (Tested against SDK 2.0)

It does use a table scan to get the data but if run say once per day would not be too painful:

    /// <summary>
    /// TruncateDiagnostics(storageAccount, DateTime.Now.AddHours(-1));
    /// </summary>
    /// <param name="storageAccount"></param>
    /// <param name="keepThreshold"></param>
    public void TruncateDiagnostics(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
        try
        {

            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

            CloudTable cloudTable = tableClient.GetTableReference("WADLogsTable");

            TableQuery query = new TableQuery();
            query.FilterString = string.Format("Timestamp lt datetime'{0:yyyy-MM-ddTHH:mm:ss}'", keepThreshold);
            var items = cloudTable.ExecuteQuery(query).ToList();

            Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
            foreach (var entity in items)
            {
                TableOperation tableOperation = TableOperation.Delete(entity);

                if (!batches.ContainsKey(entity.PartitionKey))
                {
                    batches.Add(entity.PartitionKey, new TableBatchOperation());
                }

                batches[entity.PartitionKey].Add(tableOperation);
            }

            foreach (var batch in batches.Values)
            {
                cloudTable.ExecuteBatch(batch);
            }

        }
        catch (Exception ex)
        {
            Trace.TraceError(string.Format("Truncate WADLogsTable exception {0}", ex), "Error");
        }
    }
Lasso answered 4/5, 2013 at 15:15 Comment(0)
T
2

Here's my slightly different version of @Chriseyre2000's solution, using asynchronous operations and PartitionKey querying. It's designed to run continuously within a Worker Role in my case. This one may be a bit easier on memory if you have a lot of entries to clean up.

static class LogHelper
{
    /// <summary>
    /// Periodically run a cleanup task for log data, asynchronously
    /// </summary>
    public static async void TruncateDiagnosticsAsync()
    {
        while ( true )
        {
            try
            {
                // Retrieve storage account from connection-string
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
                    CloudConfigurationManager.GetSetting( "CloudStorageConnectionString" ) );

                CloudTableClient tableClient = storageAccount.CreateCloudTableClient();

                CloudTable cloudTable = tableClient.GetTableReference( "WADLogsTable" );

                // keep a weeks worth of logs
                DateTime keepThreshold = DateTime.UtcNow.AddDays( -7 );

                // do this until we run out of items
                while ( true )
                {
                    TableQuery query = new TableQuery();
                    query.FilterString = string.Format( "PartitionKey lt '0{0}'", keepThreshold.Ticks );
                    var items = cloudTable.ExecuteQuery( query ).Take( 1000 );

                    if ( items.Count() == 0 )
                        break;

                    Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
                    foreach ( var entity in items )
                    {
                        TableOperation tableOperation = TableOperation.Delete( entity );

                        // need a new batch?
                        if ( !batches.ContainsKey( entity.PartitionKey ) )
                            batches.Add( entity.PartitionKey, new TableBatchOperation() );

                        // can have only 100 per batch
                        if ( batches[entity.PartitionKey].Count < 100)
                            batches[entity.PartitionKey].Add( tableOperation );
                    }

                    // execute!
                    foreach ( var batch in batches.Values )
                        await cloudTable.ExecuteBatchAsync( batch );

                    Trace.TraceInformation( "WADLogsTable truncated: " + query.FilterString );
                }
            }
            catch ( Exception ex )
            {
                Trace.TraceError( "Truncate WADLogsTable exception {0}", ex.Message );
            }

            // run this once per day
            await Task.Delay( TimeSpan.FromDays( 1 ) );
        }
    }
}

To start the process, just call this from the OnStart method in your worker role.

// start the periodic cleanup
LogHelper.TruncateDiagnosticsAsync();
Toddler answered 6/11, 2013 at 23:35 Comment(0)
S
1

If you don't care about any of the contents, just delete the table. Azure Diagnostics will just recreate it.

Silin answered 3/12, 2014 at 2:47 Comment(2)
it isn't created after I deleted it.Unnecessarily
If you delete the table there will be a delay before Azure can fully recreate it under the same name.Stjohn
A
0

Slightly updated Chriseyre2000's code:

  • using ExecuteQuerySegmented instead of ExecuteQuery

  • observing TableBatchOperation limit of 100 operations

  • purging all Azure tables

    public static void TruncateAllAzureTables(CloudStorageAccount storageAccount, DateTime keepThreshold)
    {
       TruncateAzureTable(storageAccount, "WADLogsTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADCrashDump", keepThreshold);
       TruncateAzureTable(storageAccount, "WADDiagnosticInfrastructureLogsTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADPerformanceCountersTable", keepThreshold);
       TruncateAzureTable(storageAccount, "WADWindowsEventLogsTable", keepThreshold);
    }
    
    public static void TruncateAzureTable(CloudStorageAccount storageAccount, string aTableName, DateTime keepThreshold)
    {
       const int maxOperationsInBatch = 100;
       var tableClient = storageAccount.CreateCloudTableClient();
    
       var cloudTable = tableClient.GetTableReference(aTableName);
    
       var query = new TableQuery { FilterString = $"Timestamp lt datetime'{keepThreshold:yyyy-MM-ddTHH:mm:ss}'" };
       TableContinuationToken continuationToken = null;
       do
       {
          var queryResult = cloudTable.ExecuteQuerySegmented(query, continuationToken);
          continuationToken = queryResult.ContinuationToken;
    
          var items = queryResult.ToList();
          var batches = new Dictionary<string, List<TableBatchOperation>>();
          foreach (var entity in items)
          {
             var tableOperation = TableOperation.Delete(entity);
    
             if (!batches.TryGetValue(entity.PartitionKey, out var batchOperationList))
             {
                batchOperationList = new List<TableBatchOperation>();
                batches.Add(entity.PartitionKey, batchOperationList);
             }
    
             var batchOperation = batchOperationList.FirstOrDefault(bo => bo.Count < maxOperationsInBatch);
             if (batchOperation == null)
             {
                batchOperation = new TableBatchOperation();
                batchOperationList.Add(batchOperation);
             }
             batchOperation.Add(tableOperation);
          }
    
          foreach (var batch in batches.Values.SelectMany(l => l))
          {
             cloudTable.ExecuteBatch(batch);
          }
       } while (continuationToken != null);
    }
    
Armament answered 1/7, 2020 at 8:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.