Windows Azure Worker Role not getting past first line of code
Asked Answered
W

5

10

I have a worker role that works perfectly in development but doesn't work when deployed. "Doesn't work" is rather vague, but that's really all I have to go on as I'm not seeing any errors or anything (in the event log anyway - maybe there is somewhere else I can look). I have added some trace statements to my code and I'm seeing the first one come out, but none of the others.

WorkerRole Code:

public class WorkerRole : RoleEntryPoint
{
    #region Member variables

    private IWindsorContainer _container;

    private IJob[] _jobs;

    #endregion

    #region Methods

    public override bool OnStart()
    {
        ConfigureDiagnostics();

        Trace.WriteLine("WorkerRole.OnStart()");

        try
        {
            Initialize();

            Trace.WriteLine("Resolving jobs...");
            _jobs = _container.ResolveAll<IJob>();

            StartJobs();

            return base.OnStart();
        }
        catch (Exception ex)
        {
            TraceUtil.TraceException(ex);
            throw;
        }
        finally
        {
            Trace.WriteLine("WorkerRole.OnStart - Complete");
            Trace.Flush();
        }
    }

    /// <summary>
    /// Sets up diagnostics.
    /// </summary>
    private void ConfigureDiagnostics()
    {
        DiagnosticMonitorConfiguration dmc =
            DiagnosticMonitor.GetDefaultInitialConfiguration();

        dmc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);
        dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;

        DiagnosticMonitor.Start(Constants.DiagnosticsConnectionString, dmc);
    }

    /// <summary>
    /// Sets up the IoC container etc.
    /// </summary>
    private void Initialize()
    {
        Trace.WriteLine("WorkerRole.Initialize()");

        try
        {
            Trace.WriteLine("Configuring AutoMapper...");
            AutoMapperConfiguration.Configure();

            Trace.WriteLine("Configuring Windsor...");
            _container = new WindsorContainer();

            Trace.WriteLine(string.Format("Installing assemblies from directory...{0}", 
                Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot)));

            _container.Install(FromAssembly.InDirectory(
                new AssemblyFilter(Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot))));

            Trace.WriteLine(string.Format("Setting the default connection limit..."));
            ServicePointManager.DefaultConnectionLimit = 12;
        }
        finally
        {
            Trace.WriteLine("WorkerRole.Initialize - Complete");
        }
    }

    /// <summary>
    /// Starts all of the jobs.
    /// </summary>
    private void StartJobs()
    {
        Trace.WriteLine("WorkerRole.StartJobs()");

        try
        {
            foreach (IJob job in _jobs)
            {
                job.Start();
            }
        }
        finally
        {
            Trace.WriteLine("WorkerRole.StartJobs - Complete");
        }
    }

    public override void OnStop()
    {
        Trace.WriteLine("WorkerRole.OnStop()");

        try
        {
            foreach (IJob job in _jobs)
            {
                job.Stop();
            }
            _container.Dispose();
        }
        finally
        {
            Trace.WriteLine("WorkerRole.OnStop - Complete");
        }
    }

    #endregion

    #region Private util classes

    public static class AutoMapperConfiguration
    {
        public static void Configure()
        {
            Mapper.Initialize(x => x.AddProfile<ModelProfile>());
        }
    }

    #endregion
}

TraceUtil Code:

public static class TraceUtil
{
    public static void TraceException(Exception ex)
    {
        StringBuilder buffer = new StringBuilder();

        while (ex != null)
        {
            buffer.AppendFormat("{0} : ", ex.GetType());
            buffer.AppendLine(ex.Message);
            buffer.AppendLine(ex.StackTrace);

            ex = ex.InnerException;
        }
        Trace.TraceError(buffer.ToString());
    }
}

Config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  ...
  <system.diagnostics>
    <trace autoflush="true">
      <listeners>
        <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"
             name="AzureDiagnostics">
          <filter type="" />
        </add>
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

Once the worker has started, if I look in the WADLogsTable, all I see is "WorkerRole.OnStart()" and nothing else!

Any ideas on what the problem could be or how to troubleshoot this would be appreciated.

Update: If I stop the role, I don't see any of the debug statements from the OnStop() method either.

Update: I must have something configured incorrectly with my diagnostics. I thought I was seeing my debug come out correctly when debugging locally, but it turns out I'm not. I see everything in the output window, but I don't see everything in the storage table. I am seeing the following entries in development:

WorkerRole.OnStart()
WorkerRole.Initialize()
Configuring AutoMapper...

I realize that the trace output is only periodically uploaded, but I've waited 5 minutes or so, so I think this should be long enough, since I have it set to 1 minute.

Update: As suggested by @kwill in the comments section I have tried adding a file trace listener as follows:

  <system.diagnostics>
    <trace autoflush="true">
      <listeners>
        <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener, Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"
             name="AzureDiagnostics">
        </add>
        <add name="File" type="System.Diagnostics.TextWriterTraceListener" initializeData="C:\TextWriterOutput.log" />
      </listeners>
    </trace>
  </system.diagnostics>

This works fine in my development environment and seems more reliable as well as I get all of the debug out that I would expect. When I deploy it to staging however, the TextWriterOutput.log file isn't even created!

I really need a reliable way of getting debug out of my worker role so that I can troubleshoot the ultimate problem, which is that my jobs aren't working - at this point I still have no idea what they are even trying to do as I can't get any debug out!

Update: I am pretty sure that the missing dll idea suggested by most people is not the problem. To hopefully prove this, I have overridden the run method as shown below and I am seeing the "Heartbeat..." debug come out. It seems to me that either the diagnostics functionality or at least the way I have it configured is unreliable, which is preventing me from investigating why on earth my jobs aren't running.

    public override void Run()
    {
        Trace.WriteLine("LearningMiles.JobProcessor.WorkerRole.Run()", "Information");

        try
        {
            while (true)
            {
                Thread.Sleep(10000);
                Trace.WriteLine("Heartbeat...", "Verbose");
            }
        }
        catch (Exception ex)
        {
            TraceUtil.TraceException(ex);
            throw;
        }
        finally
        {
            Trace.WriteLine("LearningMiles.JobProcessor.WorkerRole.Run() - Complete", "Information");
        }
    }

Update: I have now cross posted this problem on the Windows Azure MSDN forum.

Update: As suggested in the comments, I have now tried removing all 'useful' code. In development this resulted in all of the debug being output. I then tried just removing the call to AutomapperConfiguration.Configure() since previously I was seeing nothing come out after that call. This resulted in some of the trace statements not coming out again. Importantly however, I was seeing the trace statements that I have put in the "jobs". Since it's the jobs not running that I ultimately want to resolve, I deployed that version of the code to staging, but there I just see the OnStart() trace and the "heartbeat" trace. I don't think this really helps, but maybe it will give someone some ideas.

Waterproof answered 23/2, 2012 at 9:31 Comment(3)
Have you properly setup your storage account for diagnostics? Some folks forget to remove the LocalStorage settingRiggins
I believe so. I am getting one entry in the table which seems to prove that theory. Also, I did make that mistake originally, but VS wouldn't build the package with it left like that.Waterproof
What happens if you remove all 'useful' code from all methods and leave only Trace.WriteLine calls and deploy that to staging? Still no messages except the first one?Ancylostomiasis
W
1

Thanks to an answer on the MSDN forum, I have been able to troubleshoot and resolve my problem.

The reason why my jobs were not being executed was due to the line below:

_container.Install(FromAssembly.InDirectory(
                    new AssemblyFilter(Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot))));

The role root on staging is E:. Path.Combine() has an obscure implementation which you can read more about in this SO answer. What this meant was that Castle was searching for assemblies in E:approot rather than E:\approot as I expected. I am now constructing the approot path with the method below:

    private string GetAppRoot()
    {
        string root = Environment.GetEnvironmentVariable(Constants.RoleRoot);

        if (root.EndsWith(Path.VolumeSeparatorChar.ToString()))
            root += Path.DirectorySeparatorChar;

        return Path.Combine(root, Constants.AppRoot);
    }

That has resolved my main issue and I am now seeing the jobs execute as expected.

I was only able to troubleshoot this issue by running the worker role in an elevated execution context so that my trace data could be written to a file. I still don't know why, and would like to hear of any ideas why, the trace statements were not being transferred to storage correctly.

Waterproof answered 6/3, 2012 at 15:29 Comment(1)
It appears that now the worker is running in an elevated context, I am now seeing all of the debug appear in storage as well. I will aim to confirm this as soon as I have the time.Waterproof
T
5

Given the OnStart() trace is called, but not the Initialize(), my guess is that one of the assemblies referenced by the code in Initialize() is not being copied to the deployment. Remember that .Net JIT-compiles one method at a time, and because of that behavior it would make sense that the OnStart trace message shows up (as there's little other than the Windows Azure and standard .Net framework assemblies referenced up to that point). However, when the CLR goes to JIT the Initialize method, it then tries to load several third-party assemblies (AutoMapper and Windsor) that may not be packaged correctly, but could be GACced or otherwise available locally when the emulator runs.

A couple of things to try:

  1. Manually "Package" your deployment from Visual Studio and take a careful look at the build output. Many times, VS will catch your missing assemblies and tell you (unfortunately as a warning, not an error) that you're missing something.
  2. If you don't see anything in the output that looks obvious, take a look at the cspkg file itself (remember it's just a ZIP file with more ZIP files in it) and make sure any referenced assemblies your application/role needs are in there. Alternately, connect to the VM and check the approot for those assemblies.
  3. You may be able to find an entry in the VM's event log that shows that your application failed to load an assembly.
Toothache answered 28/2, 2012 at 14:4 Comment(3)
I think that this is the right answer. Make sure that the "Copy Local" options is set to true on the third part referenced assemblies like AutoMapper or Windows. Also check that the dinamically loaded types (from Windsor) are correctly deployed.Broca
This is not the correct answer - I really wish it was though. I have verified that all of the dll's required are deployed to the VM. Also, if this was the problem I would surely see the exception output to the logs. As @sami suggests I have also tried adding a sleep in my exception handler in case the debug from there wasn't being pushed to storage, but to no avail.Waterproof
I believe the latest version of the SDK is pretty good at picking problems like this up as well. When I first tried to create the deployment package I had a number of "reference" errors that I had to resolve which didn't occur during a normal build.Waterproof
P
3

More often than not, the root cause of problems like this is missing dependencies. There are already good suggestions on this front in previous answers.

The trace logs are transferred to Azure storage once per minute as per your configuration. If your worker process crashes, you might lose some of the last trace messages. To work around this, try adding a Thread.Sleep(TimeSpan.FromMinutes(2)) in your exception handlers to ensure that the exception log gets flushed to storage.

Finally, if everything else fails, I suggest you try debugging your role with WinDbg. Enable Remote Desktop for your role. Log in to the role and switch off IE safe browsing so that you can install stuff. Then download and install Debugging Tools for Windows from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8279 . This package contains the entire Windows SDK, but you can select to install the Debugging Tools for Windows only.

Then run WinDbg and attach to WaWorkerHost.exe. In WinDbg, do

.loadby sos clr   // load the SOS extension that allows you to do managed debugging
sxe clr           // break on CLR exceptions
g                 // continue

WinDbg should now break execution when there is an CLR exception. When it breaks, execute

!PrintException

to see exception details. You may want to add another Thread.Sleep call in your role startup to give you time to attach the debugger before the process exits.

Pytlik answered 3/3, 2012 at 19:38 Comment(4)
This was a great suggestion that I really thought would help. Unfortunately I don't see any exceptions. Will this break on exceptions on any thread or just the main thread?Waterproof
Should also have added that after issuing the g command, it seems to run normally and I now see the "Heatbeat..." debug that I have added (see update to original question).Waterproof
Using the !threads command prints the current threads. You could also try putting a breakpoint somewhere in your jobs code to see if it gets hit.Pytlik
Debugging with WinDbg is the last resort though. You really do need to get the logs. Have you tried adding CrashDumps.EnableCollection(true); to your trace initialization code? If the worker process exits due to an exception, this should log it.Pytlik
W
1

Thanks to an answer on the MSDN forum, I have been able to troubleshoot and resolve my problem.

The reason why my jobs were not being executed was due to the line below:

_container.Install(FromAssembly.InDirectory(
                    new AssemblyFilter(Path.Combine(Environment.GetEnvironmentVariable(Constants.RoleRoot), Constants.AppRoot))));

The role root on staging is E:. Path.Combine() has an obscure implementation which you can read more about in this SO answer. What this meant was that Castle was searching for assemblies in E:approot rather than E:\approot as I expected. I am now constructing the approot path with the method below:

    private string GetAppRoot()
    {
        string root = Environment.GetEnvironmentVariable(Constants.RoleRoot);

        if (root.EndsWith(Path.VolumeSeparatorChar.ToString()))
            root += Path.DirectorySeparatorChar;

        return Path.Combine(root, Constants.AppRoot);
    }

That has resolved my main issue and I am now seeing the jobs execute as expected.

I was only able to troubleshoot this issue by running the worker role in an elevated execution context so that my trace data could be written to a file. I still don't know why, and would like to hear of any ideas why, the trace statements were not being transferred to storage correctly.

Waterproof answered 6/3, 2012 at 15:29 Comment(1)
It appears that now the worker is running in an elevated context, I am now seeing all of the debug appear in storage as well. I will aim to confirm this as soon as I have the time.Waterproof
M
0

Are you saying that you get the trace

   "WorkerRole.OnStart()"

but not the trace

   "WorkerRole.Initialize()"

That seems highly unlikely since the two trace statements execute one right after the other.

Have you tried to RDP to the VM to see if the WaWorkerHost.exe process is crashing?

Metamerism answered 23/2, 2012 at 21:50 Comment(4)
That's exactly what I'm saying - I know it seems unlikely, but that is all I am seeing in the WADLogs table! I will RDP and check the WaWorkerHost process tomorrow.Waterproof
I have now checked and the WaWorkerHost.exe process seems to be up.Waterproof
A few things I would suggest: 1. Add a file trace listener in addition to the DiagnosticMonitorTraceListener. This will tell you if the problem is your app, or the problem is Azure diagnostics. 2. Use something like DebugView (blog.toddysm.com/2011/05/…).Metamerism
I've tried adding the file trace listener as you suggested, but that hasn't really helped I'm afraid. I have updated the question with more information.Waterproof
S
0

I believe that Doug Rohrer has the right answer. There is a high chance that you have missing DLLs in the project which can be verified by examining the package. Be aware that the package needs to be created unencrypted if you are using a version earlier that the 1.6 SDK.

I would like to add two points.

  1. Setting “Copy Local” to true, in some cases, only works if the project file is edited manually and explicitly given the full path of the assembly. (This occurs when the assembly is also present in the GAC of the local machine).

  2. If the dependencies referenced are in an assembly which, is in turned referenced by the Azure role assembly, the dependencies do not get copies over. In which case these dependencies also need to be added to the role assembly even though they are not used by it. (This is kind of difficult to believe but I did run into this issue).

Syndesis answered 1/3, 2012 at 8:31 Comment(3)
Unfortunately @Doug Rohrer does not have the right answer - see the comment I added in response to his answer.Waterproof
Thanks for the feedback. As a last question on the same line of thought, and maybe you have already verified this, but is it possible that you have a DLL in your local GAC than is not being copies over?Syndesis
The only references to the GAC that I can see are to standard .NET libraries such as System.dll. I have added another update to my original questions which I think proves this is not the issue, unfortunately.Waterproof

© 2022 - 2024 — McMap. All rights reserved.