Failed Azure Web App Auto Restart Runbook

Let me start by painting a picture. You are using Azure. You have an App Service configured with a Web App that is hosting a website; this website for example. The website could be single-instanced or it could be multi-instanced using Azure Load Balancer, Azure Traffic Manager, Azure Application Gateway, or any other number of […]

Let me start by painting a picture. You are using Azure. You have an App Service configured with a Web App that is hosting a website; this website for example. The website could be single-instanced or it could be multi-instanced using Azure Load Balancer, Azure Traffic Manager, Azure Application Gateway, or any other number of load balancing and traffic distribution technologies. One day, your web application fails to respond and you get a dreaded HTTP 500 or another error code. As a dedicated Azure consumer, you use Azure Application Insights to monitor your website. Application Insights not only gives you user metrics akin to Google Analytics but also gives you performance and availability metrics.

The picture I painted just then explains my scenario. I use Azure App Service with an Azure Web App to host this blog. I use Azure Application Insights to provide me with all of the metrics and data I need to understand the site. The availability monitoring feature is quite excellent. It allows me to monitor the website availability from up to five locations around the world with performance data for each region so I can see how the site performs for each geography. If the site goes down for any reason, I get an email notification to warn me.

The trouble for me is that if the site does fail, it tends to be in the middle of the night. I have experienced a few outages of late. The site goes down in the middle of the night, presumably as a result of platform maintenance and fails to come back up correctly. I believe the latter to be because of the In-App MySQL Database that I use in Azure but I have not proved this with confidence. As I am one man, I don’t have a monitoring team on standby to investigate the issue when I am asleep. What I want, is my own personal solution to kick the website if it fails as the first defence against prolonged outages.

Azure Automation allows us to automate pretty much anything in Azure. Runbooks can be created to perform any number of tasks using the Azure PowerShell Modules. One such task is to perform a restart of an Azure Web App. Azure Automation allows us to create Webhooks on our Runbooks. A Webhook is a means to invoke the runbook from another service using an accessible URL and it just happens to be, that Azure Application Insights can use this Webhook.

How Does it Work

As I have already described, the solution works using an Azure Web App, Azure Application Insights and Azure Automation. Here is the flow:

  1. Azure Web App encounters an issue resulting in the site to appear down
  2. Azure Application Insights detects the failure with the Availability Test
  3. The Availability Test triggers a Webhook on the Azure Automation runbook
  4. The Azure Automation runbook starts

Once the Runbook starts, it does a few things aside from just restarting the Web App. The most critical of these is another test. Whilst in my testing, I have not encountered this, it is entirely possible that the site is working just fine but the issue is with Azure Application Insights. To work around this, I have leveraged the PowerShell command Invoke-WebRequest. The workflow performs a web request against the site and validates the HTTP Response Code. If the code returned is 200 (OK) then the workflow aborts. We don’t want to restart a site when there is nothing wrong. If the site returns anything other than a 200 code, the workflow goes ahead and performs the restart.

Azure Automation Setup

The first trick to this solution is to have an Azure Automation account created. Everything in this post can be achieved using the free tier of Azure Automation so there is no cost prohibition to using this code. When an Azure Automation account is created, it will automatically create an Azure Run As credential to allow your Runbooks to authenticate against your Azure AD tenant and gain access to your subscription. We will be using this account.

In Azure Automation, create a new runbook. I have posted the script I used to GitHub at https://github.com/richardjgreen/azure-automation-webapp-restart. Copy and paste the script into a new, blank, PowerShell Workflow type runbook in Azure Automation. After inserting the code, you are going to need to modify a few elements.

  • $subscriptionID needs to be set to the GUID of your Azure Subscription
  • $uri needs to be set to the URI/URL of your website
  • $resourceGroupName needs to be defined as the name of the Resource Group hosting the Azure Web App
  • $webAppName must be configured as the name of the Azure Web App
  • $azureConnectionAssetName will need to be set to the name of the Azure Automation Connection object *

* For the Azure Connection Asset Name, there is a very good chance you do not need to change this. The value I have used, AzureRunAsAccount is the default name that Azure Automation will have created your connection asset with too.

The reason we need to hard-code these values into the workflow is due to Azure Application Insights. Whilst it supports the use of Webhooks, it does not support the use of parameter input. Compared with Azure Job Scheduler for example, in Job Scheduler, you can pass as inputs, the values for these variables: Subscription ID, URI, Resource Group Name, and Web App Name. Because Azure Application Insights does not permit this, if you have multiple Web Apps that you want to be able to restart, you will need multiple instances of the runbook.

Once you have updated the Runbook with the details, Publish the Runbook. Once published, you need to add a Webhook to the Runbook as shown below.

As the screenshot illustrates, you must record the URI for your Webhook. Once you save the Webhook, the URI is no longer shown in the Azure Portal. This is your one and only chance to capture it. Once the Webhook is created, it is time to head over to the Azure Application Insights account used to monitor the site.

Azure Application Insights Setup

Within the Azure Application Insights blade, select the Availability option. If you do not already have an Availability Test configured, now is the time to create one. If you have an existing test, we can leverage this to be used. Select the properties of the Availability Test and then select the Edit Test option. This will present you with the configuration parameters for the Availability Test as shown below.

In the screenshot above, you should be able to see the Edit Test blade, the bottom option, Alerts. Within Alerts, we can configure email alerts to be sent out to administrators as I already had configured. My email address is no secret! The last option, which you can see I am editing in the image, is for the Webhook. Simply enter the URI that you copied from the Webhook creation in Azure Automation into this field. It really is that simple to configure.

Working with Multiple Web App Instances

Assume for a moment, you have a multi-instance Web App behind an Azure Traffic Manager or an Azure Load Balancer. In these scenarios, you have multiple Web Apps running and we need to determine which one of the two or more instances has the failure. Using Azure Application Insights, we can create multiple Availability Tests. My advice for this scenario would be as follows:

  1. Configure an Availability Test per Web App
  2. Configure the Availability Test to monitor the Web App instance name, not the custom domain applied to the site
  3. Create an Azure Automation runbook per Web App instance to execute the restart

For example, if you had a website at customdomain.com and the two Azure Web App instances were named customdomainapp1 and customdomainapp2, they would have default addresses of customdomainapp1.azurewebsites.net and customdomainapp2.azurewebsites.net. By targeting the Availability Test at these default names, you can test the health of each node individually. Each Availability Test can trigger a distinct runbook Webhook allowing you to control the Web App instances individually.