Azure instances from 0 to 3 not writing diagnostics data in WadPerformanceCountersTable
Asked Answered
L

1

6

I am trying to query data from Azure WadPerformanceCountersTable.

I am trying to get the last 5 minutes of data.

The problem is that I only get data from instances nr. 4,5 and 6, but not from 0,1,2 and 3.

The script I am using to pull de data is this:

Microsoft.WindowsAzure.CloudStorageAccount storageAccount = Microsoft.WindowsAzure.CloudStorageAccount.Parse(AppDefs.CloudStorageAccountConnectionString);
            CloudTableClient cloudTableClient = storageAccount.CreateCloudTableClient();
            TableServiceContext serviceContext = cloudTableClient.GetDataServiceContext();
            IQueryable<PerformanceCountersEntity> traceLogsTable = serviceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable");
            var selection = from row in traceLogsTable
                            where row.PartitionKey.CompareTo("0" + DateTime.UtcNow.AddMinutes(-timespanInMinutes).Ticks) >= 0
                            && row.DeploymentId == deploymentId
                            && row.CounterName == @"\Processor(_Total)\% Processor Time"

                            select row;
            CloudTableQuery<PerformanceCountersEntity> query = selection.AsTableServiceQuery<PerformanceCountersEntity>();
            IEnumerable<PerformanceCountersEntity> result = query.Execute();
            return result;

My diagnostics.wadcfg file is this:

<?xml version="1.0" encoding="utf-8" ?>
<DiagnosticMonitorConfiguration xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration" configurationChangePollInterval="PT1M" overallQuotaInMB="4096">
  <PerformanceCounters bufferQuotaInMB="0" scheduledTransferPeriod="PT5M">
    <PerformanceCounterConfiguration counterSpecifier="\Memory\Available Bytes" sampleRate="PT60S" />
    <PerformanceCounterConfiguration counterSpecifier="\Processor(_Total)\% Processor Time" sampleRate="PT60S" />    
  </PerformanceCounters>
</DiagnosticMonitorConfiguration>

EDIT: Also, I have this code deployed on a test environment in azure, and it works just fine.

EDIT 2: Update to include Service Definitions XML:

<ServiceDefinition name="MyApp.Azure" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2012-05.1.7">
  <WebRole name="MyApp.Website" vmsize="ExtraSmall">
    <Sites>
      <Site name="Web">
        <Bindings>
          <Binding name="Endpoint1" endpointName="Endpoint1" />
        </Bindings>
      </Site>
    </Sites>
    <Endpoints>
      <InputEndpoint name="Endpoint1" protocol="http" port="80" />
    </Endpoints>
    <Imports>
      <Import moduleName="Diagnostics" />
    </Imports>
  </WebRole>
  <WorkerRole name="MyApp.Cache" vmsize="ExtraSmall">
    <Imports>
      <Import moduleName="Diagnostics" />
      <Import moduleName="Caching" />
    </Imports>
    <LocalResources>
      <LocalStorage name="Microsoft.WindowsAzure.Plugins.Caching.FileStore" sizeInMB="1000" cleanOnRoleRecycle="false" />
    </LocalResources>
  </WorkerRole>
</ServiceDefinition>

After I have read user @Igorek 's answer I have included my ServiceDefinition.csdef configuration XML. I am still unaware of how I must configure the LocalResources > LocalStorage part of the configuration. The configuration must be set for "MyApp.Website".

EDIT 3: I have made these changes to the test azure account.

I have set this in ServiceDefinitions.csdef

<LocalResources>
    <LocalStorage name="DiagnosticStore" sizeInMB="4096" cleanOnRoleRecycle="false"/>
</LocalResources>    

And I have lowered the OverallQuota and BufferQuota in diagnostics.wadcfg In the end, in the WAD-control-container I have this configuration per instance: http://pastebin.com/aUywLUfE

I will have to put this on the live account to see the results.

FINAL EDIT: Apparently the overall Quota was the problem, even though I cannot guarantee it.

In the end, after a new publish I noticed this:

  • a role instance had the configuration XML in wad-control-container with an overall quota of 1024MB and BufferQuotaInMB of 1024MB --> this was correct,
  • another 2 role instances had an overall quota of 4080MB and BufferQuotaInMB of 500MB --> this was incorrect, they were not writing in WADPerformanceCounters table.
  • both of the XML configuration files(that were in wad-control-container) belonging to each role instance were deleted prior to the new publish.
  • the configuration file diagnostics.wadcfg was configured correctly: 1024MB everywere

So I think there is a problem with their publisher.

Two solutions were tried:

  1. I deleted 1 incorrect XML from 'wad-control-container' and rebooted the machine. The XML was rewritten and the role instance started to write in the WADPerfCountTable.

  2. I used the script below on the other incorrect instance and the incorrect role instance started to write in the WADPerfCountTable.

            var storageAccount = CloudStorageAccount.Parse(AppDefs.CloudStorageAccountConnectionString);
    
            DeploymentDiagnosticManager diagManager = new DeploymentDiagnosticManager(storageAccount, deploymentId);
    
            IEnumerable<RoleInstanceDiagnosticManager> instanceManagers = diagManager.GetRoleInstanceDiagnosticManagersForRole(roleName);
    
            foreach (var roleInstance in instanceManagers)
            {
                DiagnosticMonitorConfiguration currentConfiguration = roleInstance.GetCurrentConfiguration();
                TimeSpan configurationChangePollInterval = TimeSpan.FromSeconds(60);
                if (!IsCurrentConfigurationCorrect(currentConfiguration, overallQuotaInMb, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1)))
                {
                    // Add a performance counter for processor time.
                    PerformanceCounterConfiguration pccCPU = new PerformanceCounterConfiguration();
                    pccCPU.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
                    pccCPU.SampleRate = TimeSpan.FromSeconds(60);
    
                    // Add a performance counter for available memory.
                    PerformanceCounterConfiguration pccMemory = new PerformanceCounterConfiguration();
                    pccMemory.CounterSpecifier = @"\Memory\Available Bytes";
                    pccMemory.SampleRate = TimeSpan.FromSeconds(60);
    
                    currentConfiguration.ConfigurationChangePollInterval = TimeSpan.FromSeconds(60);
                    currentConfiguration.OverallQuotaInMB = overallQuotaInMb;
                    currentConfiguration.PerformanceCounters.BufferQuotaInMB = overallQuotaInMb;
                    currentConfiguration.PerformanceCounters.DataSources.Add(pccCPU);
                    currentConfiguration.PerformanceCounters.DataSources.Add(pccMemory);
                    roleInstance.SetCurrentConfiguration(currentConfiguration);
                }
    
            }
    

Also, I keep receiving this error from time to time The configuration file is missing a diagnostic connection string for one or more roles.

In the end I will choose the current response as the answer, because I have found the problem. Unfortunately, I have not found the cause of the problem. At every publish I risk getting a changed confguration XML.

Latishalatitude answered 10/6, 2013 at 9:57 Comment(4)
We're experiencing a similar problem where two of our 5 instances don't write performance counters and the rest do. Hope someone can answer this! I think it's a problem with the diagnostics DLL. What version of the SDK are you using?Drud
Hello, we are using 1.7.Latishalatitude
Interesting - we're currently using 1.6 - I'm wondering how the SDK handles errors it encounters when writing the diagnostics and whether it "carries on trying" when it encounters an error. Are you in a position to update to the latest SDK version?Drud
No, we cannot currently upgrade to 2.0 because we do not have the time.Latishalatitude
D
3

Seeing how your first instances are not transferring data to diagnostics while the later instances do, one possible reason is as follows:

The local diagnostic store on your servers is filled up with diagnostic data and Azure can no longer transfer data out of your local store to storage. Be sure that that space allocated to DiagnosticStore in Role configuration (under Local Storage) is bigger than the amount of buffer quota allocated in diagnostics.wadcfg

Detailed explanation: I've experienced this first-hand with a number of customers, so the following is my own interpretation based on comments from Microsoft support. Azure Diagnostics API does not clean up local storage according to the BufferQuota until that quota is exceeded. DiagnosticStore in cloud project defaults to the same size as the BufferQuota used in all of the examples (4096). What's happening is that your BufferQuota gets awfully close to 4096megs but not equal to the limit and your Diagnostic API does not kick in a purge process. At the same time, your capture of diagnostic data can no longer run properly because local storage is nearly full and Azure host stops ability of the app to write to DiagnosticStore.

Your other servers should stop writing diagnostic data as soon as their local storage fills up as well.

Hope this makes sense.

Editing my reply to precisely point out the changes for anyone reading later:

Simplest approach is to tone down the need for OverallQuotaInMb specified in the diagnostics.wadcfg to be something like 4000 (do make sure that all other buffers combined do not exceed this number)

Alternatively, or additionally, one can manually specify the space allocated to diagnostic store on the VM using LocalStorage setting in the .CSDEF file. This link shows how: http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.diagnostics.diagnosticmonitorconfiguration.overallquotainmb.aspx

Disforest answered 10/6, 2013 at 15:15 Comment(4)
I have updated my question to include the ServiceDefinition.csdef XML. What name do I have to put inside LocalResources > LocalStorage in order for the configuration to be ok? I have not found anything relevant on the internet in order to configure the XML.Latishalatitude
I have been looking around and I have found this link: msdn.microsoft.com/en-us/library/… . Is this the correct setting I have to put ?Latishalatitude
@Dragos, that last link is correct. You can manually configure local storage called "DiagnosticStore" to be larger than 4096 or drop down your OverallQuotaInMb to be something like 4000 to be on the safe side.Disforest
I have accepted your answer as the solution. I have updated my question to reflect my findings. Please take a look and see if you have anything else to add. Thank you!Latishalatitude

© 2022 - 2024 — McMap. All rights reserved.