Hey friends. Here's a tip from the field. Make use of the auto-healing capabilities in your Azure App Services to increase the reliability of your production workloads.
What is auto-heal?
The auto-heal capabilities are coming out of the Azure App Service Diagnostics, which I previously wrote about.
If you haven't already, please read the article linked above to learn how to discover the Diagnostics area of your App Services. When you're there, you can head on over to the proactive tools. Here you will find the "Auto-Heal" capability.
In this tool, you have a few options. Let's very briefly check them out.
Custom auto-heal rules
If you want to customize how your app should recycle and heal, you can configure this based on different types of signals.
Your scenarios will likely differ a lot, depending on the applications you are operating. Here I am configuring this web app to recycle whenever there are slow requests piling up. If there are 100 slow requests (taking 20s or more) in the last 300 seconds, I will recycle the app service automatically.
Having worked a lot both on the development and infrastructure sides alike; this isn't an ideal solution in the long-term if it keeps happening. I would argue that this is a great way to keep the application alive, meanwhile you investigate why you get those slow requests - most likely there are things you can improve in the code (optimizations), or infrastructure (scale configurations, etc).
Your configuration of the auto-heal may, of course, differ quite a lot from the basic example I gave above. We can drill more into various use-cases in another post, should it be a popular topic.
Next up is a way to configure a proactive auto-heal, fully automated by the Azure service itself. Here's the description directly from the tool:
Proactive Auto-Heal is an extension to the auto healing feature of Azure App Service. It will only take corrective actions for the sites that we have deemed to be in a bad state for which the best way to recover is to simply restart them. Proactive Auto-Heal monitors for high memory and slow response situations and recycles the app when one of these conditions is met.
You can opt-out of this either by an app service configuration entry, or by using the diagnostic tool and switching it to Off.
Historical auto-heal triggers
Thankfully, my apps have behaved well in recent times, and I don't have any historical events or recycles. However, should your app have been auto-heal triggered and recycled, or other actions as per your custom rules, then you'll see it here.
What happens during a recycle?
To ensure that I didn't cause havoc in my production systems, I wanted to test the scenarios with the auto-healing capabilities, and figure out what happens to existing requests being served to users.
Someone beat me to the punch on the Microsoft Q&A area of Docs:
TLDR: The w3wp.exe process gracefully shuts down, and finishes serving the requests that are already picked up.
I would, of course, make sure to test that this is to your satisfaction before blindly configuring it in production systems.
Thanks for tuning in.