Azure Managed Disk–Overview


pexels-photo-196520[1]

Microsoft released Azure Managed disk 2 weeks ago.  Let’s look at it!

What did we have until then?  The virtual hard disk (.vhd file) was stored as a page blob in an Azure Storage account.

That worked quite fine and Azure Disks are a little more than that.  A little abstraction.  But at the same time, Azure now knows it’s a disk and can hence optimize for it.

Issues with unmanaged disks

That’s right, our good old page blob vhd is now an unmanaged disk.  This sounds like 2001 when Microsoft released .NET & managed code and you learned that all the code you’ve been writing since then was unmanaged, unruly!

Let’s look at the different issues.

First that comes to mind is the Input / Output Operations per Seconds (IOPs).  A storage account tops IOPS at 20000.  An unmanaged standard disk can have 500 IOPs.  That means that after 40 disks in a storage account, if we only have disks in there, we’ll start to get throttled.  This doesn’t sound too bad if we plan to run 2-3 VMs but for larger deployments, we need to be careful.  Of course, we could choose to put each VHD in different storage account but a subscription is limited to 100 storage accounts and also, it adds to management (managing the domain names & access keys of 100 accounts for instance).

Another one is access rights.  If we put more than one disks in a storage account, we can’t give different access to different people to different disks:  if somebody is contributor on the storage account, he / she will have access to all disks in the account.

A painful one is around custom images.  Say we customize a Windows or Linux image and have our generalized VHD ready to fire up VMs.  That VHD needs to be in the same storage account than the VHD of the created VM.  That means you can only create 40 VMs really.  That’s where the limitation for VM scale set with custom images comes from.

A side effect of being in a storage account is the VHD is publicly accessible.  You still need a SAS token or an access key.  But that’s the thing.  For industries with strict regulations / compliances / audits, the ideas of saying “if somebody walked out with your access key, even if they got fired and their logins do not work anymore, they can now download and even change your VHD” is a deal breaker.

Finally, one that few people are aware of:  reliability.  Storage accounts are highly available and have 3 synchronous copies.  They have a SLA of %99.9.  The problem is when we match them with VMs.  We can setup high availability of a VM set by defining an availability set:  this gives some guarantees on how your VMs are affected during planned / unplanned downtime.  Now 2 VMs can be set to be in two different failure domains, i.e. they are deployed on different hosts and don’t share any critical hardware (e.g. power supply, network switch, etc.) but…  their VHDs might be on the same storage stamp (or cluster).  So if a storage stamp goes down for some reason, two VMs with different failure / update domain could go down at the same time.  If those are our only two VMs in the availability set, the set goes down.

Managed Disks

Managed disks are simply page blobs stored in a Microsoft managed storage account.  On the surface, not much of a change, right?

Well…  let’s address each issues we’ve identified:

  • IOPS:  disks are assigned to different storage accounts in a way that we’ll never get throttled because of storage account.
  • Access Rights:  Managed disks are first class citizens in Azure.  That means they appear as an Azure Resource and can have RBAC permissions assigned to it.
  • Custom Image:  beside managed disks, we now have snapshots and images as first class citizens.  An image no longer belong to a storage account and this removes the constraint we have before.
  • Public Access:  disks aren’t publically accessible.  The only way to access them is via a SAS token.  This also means we do not need to invent a globally unique domain name.
  • Reliability:  when we associate a disk with a VM in an availability set, Azure makes sure that VMs in different failure domains aren’t on the same storage stamp.

Other differences

Beside the obvious advantages here is a list of differences from unmanaged disks:

  • Managed disks can be in both Premium & Standard storage but only LRS
  • Standard Managed disks are priced given the closest pre-defined fix-sizes, not the “currently used # of GBs”
  • Standard Managed disks still price transactions

Also, quite importantly, Managed Disks do not support Storage Service Encryption at the time of this writing (February 2017).  It is supposed to come very soon though and Managed Disks do support encrypted disks.

Summary

Manage Disks bring a couple of goodies with them.  The most significant one is reliability, but other features will clearly make our lives easier.

In future articles, I’ll do a couple of hands on with Azure Managed Disks.

Advertisements

7 thoughts on “Azure Managed Disk–Overview

  1. Pingback: Migrating from unmanaged to managed disks | Vincent-Philippe Lauzon's blog

  2. Pingback: Taking a snapshot of a Managed Disk | Vincent-Philippe Lauzon's blog

  3. Pingback: Creating an image with 2 Managed Disks for VM Scale Set | Vincent-Philippe Lauzon's blog

  4. Pingback: Azure Weekly: Mar 6, 2017 – Build Azure

  5. Pingback: Sizing & Pricing Virtual Machines in Azure | Vincent-Philippe Lauzon's blog

  6. Anonymous

    hi vincent ..its really so nice of you to publish article with such a great deep dive …really very helpful and the way you collate all the additional information at one stop. Thanks!!
    I need little more understanding on RTO and RPO in the context of ASR as in what azure offers as that is always a deciding factor to customers when comparing DR using traditional methods.

    Reply
    1. Vincent-Philippe Lauzon Post author

      You can have a look at https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-overview for ASR. They mention the RPO could be as low as replications every 30 seconds.

      Basically ASR replicates on a continuous basis. An agent in the VM intercept every IO calls and feed it to replication.

      RTO would basically be the time it takes your ops team to acknowledge a service outage in the primary region + the time to failover. Failover time is linked to the time to create disks from replication vault & boot the VMs. I would definitely test it to get a number.

      Have a look at https://vincentlauzon.com/2016/07/11/disaster-recovery-with-azure-virtual-machines/ to see different options you have for DR in Azure.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s