Single VM SLA


seal-1771694_640 By now you’ve probably heard the news:  Azure became the first Public Cloud to offer SLA on single VM.

This was announced on Monday, November 21st.

In this article, I’ll quickly explore what that means.

Multi-VMs SLA

Before that announcement, in order to have SLA on connectivity to compute, we needed to have 2 or more VMs in an Availability Set.

This was and still is the High Availability solution.  It gives an SLA of %99.95 availability, measured monthly.

There is no constrain on the storage used (Standard or Premium) and the SLA includes planned maintenance and any failures.  So basically, we put 2+ VMs in an availability set and we’re good all the time.

Single-VM SLA

The new SLA has a few constraints.

  • The SLA isn’t the same.  Single-VM SLA is only %99.9 (as opposed to %99.95).
  • VMs must use Premium storage for both OS & Data disks.  Presumably, Premium Storage has a better reliability.  This is interesting since in terms of SLA, there are no distinction between Premium & Standard.
  • Single VM SLA doesn’t include planned maintenance.  This is important.  It means we are covered with %99.9 availability as long as there are no planned maintenance.  More on that below.
  • SLA is calculated on a monthly basis, as if the VM was up the entire month…  this means that if our VM is up the entire month, it has an SLA of %99.9.  If, on the contrary, we turn it off 12 hours / day, we won’t have %99.9 SLA on the 12 hours / day we are using it.
  • Since this was announced on November 21st, we can expect it to take effect 30 days later ; to be on the safe side, I tell customers January 1st, 2017

So, it is quite important to state that it isn’t a simple extension of the existing SLA to a single-VM.  But it is very useful nonetheless.

Planned maintenance

I just wanted to expand a bit on planned maintenance.

What is a planned maintenance?  Once in a while Azure needs some maintenance which requires a shutdown of hosts.  Either the host itself gets updated (software / hardware) or it gets decommissioned altogether.  In those cases, the underlying VMs are shutdown, the host is rebooted (or decommissioned, in which case the VMs get relocated) and then the VM are booted back.

This is a downtime for a VM.

With an Highly Available configuration, i.e. 2+ VMs in Availability Set, the downtime of one VM doesn’t affect the availability of the availability set since there is a guarantee that there will always be one VM available.

Without an Highly Available configuration, there is no such guarantee.  For that reason, I suppose, this downtime isn’t covered within the SLA.  Remember, %99.9 on a monthly basis means 43 minutes of downtime per month.  A planned maintenance would easily take a few minutes of downtime:  taking into account the VMs shutdown (all of the VMs on the host), the host restart and the VM boot.  That isn’t negligible compare to the 43 minutes of margin the SLA gives.

This would leave very little margin of manoeuver for potential hardware / software failures during the month.

Now, that isn’t the end of the world.  For quite a few months we have the redeploy me now feature in Azure.  This feature redeploys the VM to a new host.  If there is a planned maintenance in course in the Data Center, the new host should be an updated one already, in which case our VM won’t need a reboot anymore.

Planned maintenance follow a workflow where a notification is sent a week in advance subscription owner (see https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-planned-maintenance & https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-linux-planned-maintenance).  We can then trigger a redeploy at our earliest convenience (maintenance window).

Alternatively, we can trigger a redeploy every week, during a maintenance window and ignore notification emails.

High Availability

The previous section should have convinced you that Single-VM SLA isn’t a replacement for an Highly Available (HA) configuration.

On top of the Azure planned maintenance being outside the SLA, our own solution maintenance will impact the SLA of the solution.

In an HA configuration, we can take an instance down, update it, put it back, then upgrade the next one.

With a single VM we cannot do that and a solution maintenance will incur a downtime and should therefore be done inside maintenance window (of the solution).

For those reason, I still recommend to customers to use an HA configurations if HA is a requirement.

Enabled scenarios

What Single VM brings isn’t a cheaper HA configuration.  Instead, it enables non-HA configuration with SLA in Azure.

Until now, there were two modes.  Either we took the HA route or we lived without an SLA.

Often No SLA is ok.  For dev & test scenarios for instance, SLA is rarely required.

Often HA is required.  For most production scenarios I deal with, in the enterprise space & consumer facing space anyway, HA is a requirement.

Sometimes HA doesn’t make business sense and no SLA isn’t acceptable though.  HA might not make business sense when

  • The solution doesn’t support HA ; this is sadly the case for a lot of legacy application
  • The solution supports HA with a premium license, which itself doesn’t make business sense
  • Having HA increases the number of VMs to a point where the management of the solution would be cost prohibitive

For those scenarios, the single-VM SLA might hit the sweet spot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s