Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Categories

On this page

Azure: Why did my role crash?

Archive

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Sign In

# Tuesday, 03 August 2010
Tuesday, 03 August 2010 01:48:17 (Central Daylight Time, UTC-05:00) ( .NET | Azure | Development | Logging )

One thing you might encounter when you start your development on windows-azure-logo_1-f2e19cWindows Azure is that there is an insane number of options available for number of options for logging.   You can view a quick primer here.  One of the things that I like about it is that you don’t necessarily need to learn a whole new API just to use it.  Instead, the logging facilities in Azure integrates really well with the existing Debug and Trace Logging API in .NET.  This is a really nice feature and is done very well in Azure.  In-fact to set-up and configure it is all of about 5 lines of code.  Actually it’s four lines of code with one line that wraps:

   1: public override bool OnStart() 
   2: { 
   3:     DiagnosticMonitorConfiguration dmc =          
   4:             DiagnosticMonitor.GetDefaultInitialConfiguration(); 
   5:     dmc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1); 
   6:     dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
   7:     DiagnosticMonitor.Start("DiagnosticsConnectionString", dmc); 
   8: } 

One specific item to note is the ScheduledTransferPeriod property of the Logs property.  The minimum value you can set for that property is the equivalent of 1 minute.  The only downside to this method of logging is that if your Azure role crashes within that minute, whatever data you have written to your in-built logging will be lost.  This also means that if you are writing exceptions to your logging, that will be lost as well.  That can cause problems if you’d like to know both when your role crashed and why (the exception details).

Before we talk about a way to get around it, let’s review why the Role might crash.  In the Windows Azure world, there are two primary roles that you will use, a Worker and Web role.  Main characteristics and reasons it would crash are below:

Role Type Analogous Why would it crash/restart?
Worker Role Console Application
  • Any unhandled exception.
Web Role ASP.NET Application hosted in IIS 7.0+
  • Any unhandled exception thrown on a background thread.
  • StackOverflowException
  • Unhandled exception on finalizer thread.

As you can see, the majority of the reasons why an Azure role would recycle/crash/restart are essentially the same as with any other application – essentially an unhandled exception.  Therefore, to mitigate this issue, we can subscribe to the AppDomain’s UnhandledException Event.  This event is fired when your application experiences an exception that is not caught and will fire RIGHT BEFORE the application crashes.  You can subscribe to this event in the Role OnStart() method:

   1: public override bool OnStart()
   2: {
   3:     AppDomain appDomain = AppDomain.CurrentDomain;
   4:     appDomain.UnhandledException += 
   5:         new UnhandledExceptionEventHandler(appDomain_UnhandledException);
   6:     ...
   7: }

You will now be notified right before your process crashes.  The last piece to this puzzle is logging the exception details.  Since you must log the details right when it happens, you can’t just use the normal Trace or Debug statements.  Instead, we will write to the Azure storage directly.  Steve Marx has a good blog entry about printf in the cloud.  While it works, it requires the connection string to be placed right into the logging call.  He mentions that you don’t want that in a production application.  in our case, we will do things a little bit differently.  First, we must add the requisite variables and initialize the storage objects:

   1: private static bool storageInitialized = false;
   2: private static object gate = new Object();
   3: private static CloudBlobClient blobStorage;
   4: private static CloudQueueClient queueStorage;
   5:  
   6: private void InitializeStorage()
   7: {
   8:    if (storageInitialized)
   9:    {
  10:        return;
  11:    }
  12:  
  13:    lock (gate)
  14:    {
  15:       if (storageInitialized)
  16:       {
  17:          return;
  18:       }
  19:  
  20:  
  21:       // read account configuration settings
  22:       var storageAccount = 
  23:         CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
  24:  
  25:       // create blob container for images
  26:       blobStorage = 
  27:           storageAccount.CreateCloudBlobClient();
  28:       CloudBlobContainer container = blobStorage.
  29:           GetContainerReference("webroleerrors");
  30:  
  31:       container.CreateIfNotExist();
  32:  
  33:       // configure container for public access
  34:       var permissions = container.GetPermissions();
  35:       permissions.PublicAccess = 
  36:            BlobContainerPublicAccessType.Container;
  37:       container.SetPermissions(permissions);
  38:  
  39:       storageInitialized = true;
  40:   }
  41: }

This will instantiate the requisite logging variables and then when the InitializeStorage() method is executed, we will set these variables to the appropriate initialized values.  Lastly, we must call this new method and then write to the storage.  We put this code in our UnhandledException event handler:

   1: void appDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
   2: {
   3:     // Initialize the storage variables.
   4:     InitializeStorage();
   5:  
   6:     // Get Reference to error container.
   7:     var container = blobStorage.
   8:                GetContainerReference("webroleerrors");
   9:  
  10:  
  11:     if (container != null)
  12:     {
  13:         // Retrieve last exception.
  14:         Exception ex = e.ExceptionObject as Exception;
  15:  
  16:         if (ex != null)
  17:         {
  18:             // Will create a new entry in the container
  19:             // and upload the text representing the 
  20:             // exception.
  21:             container.GetBlobReference(
  22:                String.Format(
  23:                   "<insert unique name for your application>-{0}-{1}",
  24:                   RoleEnvironment.CurrentRoleInstance.Id,
  25:                   DateTime.UtcNow.Ticks)
  26:                ).UploadText(ex.ToString());
  27:         }
  28:     }
  29:     
  30: }

Now, when your Azure role is about to crash, you’ll find an entry in your blog storage with the details of the exception that was thrown.  For example, one of the leading reasons why an Azure Worker Role crashes is because it can’t find a dependency it needs.  So, in that case, you’ll find an entry in your storage with the following details:

   1: System.IO.FileNotFoundException: Could not load file or assembly 
   2:     'GuestBook_Data, Version=1.0.0.0, Culture=neutral, 
   3:     PublicKeyToken=f8a5fcb6c395f621' or one of its dependencies. 
   4:     The system cannot find the file specified.
   5:  
   6: File name: 'GuestBook_Data, Version=1.0.0.0, Culture=neutral, 
   7:     PublicKeyToken=f8a5fcb6c395f621'

You can find some other common reasons why an Azure role might crash (especially when you first deploy it) can be found at Anton Staykov’s excellent blog:

Hope this helps.

Enjoy!