Azure WorkerRole Stopping Immediately After Starting
Asked Answered
F

2

9

I have an Azure WorkerRole that is stopping (no exceptions are being thrown) for no apparent reason. It stops in the same spot every time, but the code is simply executing a process that takes about 20 seconds to run. Can anyone postulate as to why this is happening? Is there a timeout on the OnStart() method that I'm not aware of?

Here's a breakdown of what is happening in my worker role:

OnStart() -> Diagnostics Configured

Run() ->

  1. A timer is set (60) to trigger the meat of the application
  2. A new thread is started to load some default settings (takes ~30 seconds)

The code never gets to the meat of #1.

For #1 above, I've tried it with and without a timer (no difference). For #2 above, I've tried it with and without starting a new thread (no difference).

Here's the Debug output for my worker role:

WaWorkerHost.exe Information: 0 : deployment(108).ApiAzure.Workers.0 - Workers.OnStart()
Microsoft.WindowsAzure.ServiceRuntime Information: 202 : Role entrypoint . COMPLETED OnStart()
The thread 'Role Initialization Thread' (0x29fc) has exited with code 0 (0x0).
Microsoft.WindowsAzure.ServiceRuntime Information: 203 : Role entrypoint . CALLING   Run()
'WaWorkerHost.exe' (Managed (v4.0.30319)): Loaded 'C:\Users\Jason A. Kiesel\Projects\FS_CITYSOURCED\WorkersAzure\bin\Stage\WorkersAzure.csx\roles\Workers\approot\FreedomSpeaks.Logging.dll', Symbols loaded.
Microsoft.WindowsAzure.ServiceRuntime Warning: 204 : Role entrypoint . COMPLETED Run() ==> ROLE RECYCLING INITIATED
Microsoft.WindowsAzure.ServiceRuntime Information: 503 : Role instance recycling is starting
The thread 'Role Start Thread' (0x1fa0) has exited with code 0 (0x0).
The thread '<No Name>' (0x1624) has exited with code 0 (0x0).
'WaWorkerHost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll'
'WaWorkerHost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_64\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll'
'WaWorkerHost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_64\System.EnterpriseServices\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.EnterpriseServices.dll'
'WaWorkerHost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Numerics\v4.0_4.0.0.0__b77a5c561934e089\System.Numerics.dll', Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
Microsoft.WindowsAzure.ServiceRuntime Information: 205 : Role entrypoint . CALLING   OnStop()
WaWorkerHost.exe Information: 0 : deployment(108).ApiAzure.Workers.0 - Workers.OnStop()
Microsoft.WindowsAzure.ServiceRuntime Information: 206 : Role entrypoint . COMPLETED OnStop()
The thread 'Role Stop Thread' (0x2dac) has exited with code 0 (0x0).
The program '[12228] WaWorkerHost.exe: Managed (v4.0.30319)' has exited with code -66053 (0xfffefdfb).
Finedrawn answered 1/12, 2010 at 21:42 Comment(3)
I figured out why the application was crashing and put in a fix, but it still doesn't make sense to me as to why the worker role would crash in the first place. The "config" section of the app that was triggered on start had a method that took a considerable amount of time to run. I moved that section of the "config" code to be run on demand (lazy loaded). That seemed to fix the problem.Finedrawn
How did you figure out why it was crashing? I seem to have the same problem, but without an exception it is quite hard to debug :/Tullius
Trial and error. Some of the startup methods took longer than others. I commented out all but the first and added in the others one by one. When it crashed on one that took a long time (roughly 30s), I moved those longer methods to a lazy-loading style. This fixed the issue.Finedrawn
S
10

Loop in Run() is not required at least in the emulator version 1.6 or higher. However I got the same issue today. I spent a few hours to find out what the reason could be and found that my project uses references to Microsoft.Windows.Azure assemblies version 1.7 and the emulator I use is from the October version (1.8). Web projects work just fine but process worker roles are starting and immediately stopping like you describe. OnStart, Run and OnStop are just not called. When I referenced my worker role to 1.8 assemblies it started to work again. Another few hours wasted, thanks Microsoft...

Sibyl answered 8/11, 2012 at 18:26 Comment(4)
You just saved me a couple of hours of investigation @Alexey. Thanks! :) #19253515Bunny
Welcome, it is a real headache with Azure SDK versions.Sibyl
I observed this running locally a downloaded sample from MSDN. The Microsoft.WindowsAzure.ServiceRuntime reference was 2.5.0.0 while 2.7.0.0 is available. Upgrading the reference made the issue go away. I would further observe that before the fix breakpoints in OnStart and Run were not hit.Because
Same here today, wrong reference to Microsoft.WindowsAzure.ServiceRuntime after upgrading the SDK.Socman
G
8

Without seeing the code, it sounds like your Run method is exiting. If the run method ever exits, the role will stop. The way the default worker role that is created when you add to a cloud project in Visual Studio does this is to put an infinite loop at the end of the method. So your code might look similar to this:

public override void Run()
{
    StartMyTimer();
    LoadDefaultSettings();

    while (true)
    {
        CheckToMakeSureSpawnedThreadsAreRunningOK();
        System.Threading.Thread.Sleep(10000);
    }
}

As mentioned by smarx in the comments, it would also be possible to use System.Threading.Thread.Sleep(Timeout.Infinite) instead of the loop.

Glosso answered 1/12, 2010 at 22:47 Comment(1)
Mainly because I didn't think of it. The infinite loop as that's what is in the basic role when you add one. Granted it also has a trace message that gets written out every X seconds to let you know that the role is still running. When I was building one of my early worker roles I did also look at the loop and think "what a waste of time" and deleted it. Causing a problem like what is being experienced here.Glosso

© 2022 - 2024 — McMap. All rights reserved.