This is especially important if you are trying to size / price VMs “in advance”. For instance if you are quoting some work in a “fixed bid” context, i.e. you need to provide the Azure cost before you wrote a single line of code of your application.
If that isn’t your case, you can simply trial different VM sizes. The article would still be useful to see what variables you should be looking at if you do not obtain the right performance.
There are a few things to look for. We tend to focus on the CPU & RAM but that’s only part of the equation. The storage & performance target will often drive the choice of VM.
A VM has the following characteristics: # cores, RAM, Local Disk, # data disks, IOPs, # NICs & Network bandwidth. We need to consider all of those before choosing a VM.
For starter, we need to understand that Virtual Machines cannot be “hand crafted”, i.e. we cannot choose CPU speed, RAM & IOPS separately. They come in predefined packages with predefined specs: SKUs, e.g. D2.
Because of that, we might often have to oversize a characteristic (e.g. # cores) in order to get the right amount of another characteristic (e.g. RAM).
SKUs come in families called Series. At the time of this writing Azure has the following VM series:
- Av2 (A version 2)
- D & DS
- Dv2 & DSv2 (D version 2 & DS version 2)
- F & FS
- G & GS
- H & HS
- L & LS
Each series will optimize different ratios. For instance, the F Series will have a higher cores / RAM ratio than the D series. So if we are looking at a lot of cores and not much RAM, the F series is likely a better choice than D series and will not force us to oversize the RAM as much in order to have the right # of cores.
For pricing, the obvious starting point is the pricing page for VM: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/.
Azure compute allocates virtual core from the physical host to the VMs.
Azure cores are dedicated cores. As of the time of this writing, there is no shared core (except for A0 VM) and there are no hyper threading.
There are two components in the price of a VM:
- Compute (the raw underlying VM, i.e. the CPU + RAM + local disk)
- Licensed software running on it (e.g. Windows, SQL, RHEL, etc.)
The compute price corresponds to the CentOS Linux pricing since CentOS is open source and has no license fee.
Azure has different flavours of licensed software (as of the writing of this article, i.e. March 2017):
- Oracle Java
- SQL Server
- Open Source (no License)
- Licensed: Red Hat Enterprise License (RHEL), R Server, SUSE
Windows by itself comes with the same license fee regardless of Windows version (e.g. Windows 2012 & Windows 2016 have the same license fee).
Windows software (e.g. BizTalk) will come with software license (e.g. BizTalk) + OS license. This is reflected in the pricing columns. For instance, for BizTalk Enterprise (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/biztalk-enterprise/), here in Canadian dollars in Canada East region for the F Series:
In the OS column is the price of the compute + the Windows license while in the “Software” column is the price of the BizTalk Enterprise license. The total is what we pay per hour for the VM.
It is possible to “Bring Your Own License” (BYOL) of any software (including Windows or Linux) in Azure and therefore pay only for the bare compute (which, again, correspond to CentOS Linux pricing).
UPDATE: Through Azure Hybrid Use Benefit, we can even “reuse” an on premise Windows license for a new (unrelated) VM in Azure.
We can also run whatever licensed software we want on top of a VM. We can install SAP, get an SAP license and be %100 legal. The licensed software I enumerated come with the option of being integrated in the “per minute” cost.
So one of the first decision to do in pricing is: do we want to go with integrated pricing or external licensed based pricing? Quite easy to decide: simply look at the price of external licenses (e.g. volume licensing) we can have with the vendor and compare.
Typically if we run the VM sporadically, i.e. few hours per day, it is cheaper to go with the integrated pricing. Also, I see a lot of customer starting with integrated pricing for POCs, run it for a while and optimize pricing later.
This is local storage. By local, we mean it’s local to the host itself, it isn’t an attached disk. For that reason it has lower latency than attached disks. It has also another very important characteristic: it is ephemeral. It isn’t persistent. Its content does not survive a reboot of the VM. The disk is empty after reboot.
We are insisting on this point because everybody gets confused on that column and for a good reason: the column title is bunker. It doesn’t lie, it is a disk and it does have the specified size. But it is a temporary disk.
Can we install the OS on that disk? No. Note, we didn’t say “we shouldn’t”, but “we can’t”.
What we typically put on that disk is:
- Page file
- Temporary files (e.g. tempdb for SQL Server running on VM)
- Caching files
Some VM series have quite large temporary disk. Take for instance the L series:
That VM series was specifically designed to work with Big Data workload where data is replicated within a cluster (e.g. Hadoop, Cassandra, etc.). Disk latency is key but not durability since the data is replicated around.
Unless you run such a workload, don’t rely on the temporary disk too much.
The major consequence here is: add attached disks to your pricing. See https://azure.microsoft.com/en-us/pricing/details/managed-disks/.
The pricing page is nice but to have a deeper conversation we’ll need to look at more VM specs. We start our journey at https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes. From there, depending on the “type” of VM we are interested in, we’re going to dive into one of the links, e.g. https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general.
The documentation repeats the specs we see on the pricing page, i.e. # of cores, RAM & local disk size, but also gives other specs: max number of data disks, throughput, max number of NICs and network bandwidth. Here we’ll focus on the maximum number of data disks.
A VM comes with an OS disk, a local disk and a set of optional data disks. Depending on the VM SKU, the maximum number of data disks does vary.
At the time of this writing, the maximum size of a disk on a VM is 1TB. We can have bigger volumes on the VM by stripping multiple disks together on the VM’s OS. But the biggest disk is 1TB.
For instance, a D1v2 (see https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general#dv2-series) can have 2 data disks on top of the OS disk. That means, if we max out each of the 3 disks, 3 TB, including the space for the OS.
So what if the D1v2 really is enough for our need in terms of # of cores and RAM but we need 4 TB of storage space? Well, we’ll need to bump up to another VM SKU, a D2v2 for instance, which supports 4 data disks.
Attached means they aren’t local to the VM’s host. They are attached to the VM and backed by Azure storage.
Azure storage means 3 times synchronous replica, i.e. high resilience, highly persistence.
The Azure storage is its own complex topic with many variables, e.g. LRS / GRS / RA-RGS, Premium / Standard, Cool / Hot, etc. .
Here we’ll discuss two dimensions: Premium vs Standard & Managed vs Unmanaged disks.
We’ve explained what managed disks are in contrast with unmanaged disk in this article. Going forward I recommend only managed disks.
Standard disks are backed by spinning physical disks while Premium disks are backed by Solid State Drive (SSD) disks. In general:
- Premium disk has higher IOPs than Standard disk
- Premium disk has more consistent IOPs than Standard disk (Standard disk IOPs will vary)
- Premium disk is has higher availability (see Single VM SLA)
- Premium disk is more expensive than Standard disk
So really, only the price will stop us from only using Premium disk.
In general: IO intensive workloads (e.g. databases) should always be on premium. Single VM need to be on Premium in order to have an SLA (again, see Single VM SLA).
For the pricing of disks, see https://azure.microsoft.com/en-us/pricing/details/managed-disks/. Disks come in predefined sizes.
This is where the Input / Ouput per seconds (IOPs) come into the picture.
An IO intensive workload (e.g. database) will consume IOPs from the VM disks.
Each disk come with a number of IOPs. In the pricing page (https://azure.microsoft.com/en-us/pricing/details/managed-disks/), the Premium disks, i.e. P10, P20 & P30, have documented IOPs of 500, 2300 & 5000 respectively. Standard disks (at the time of this writing, March 2017), do not have IOPs documented but it is easy to find out by creating disks in the portal ; for instance an S4 disk with 32 GB will have 500 IOPs & 60 MB/s throughput.
In order to get the total number of IOPs we need, we’ll simply select a set of disks that has the right total of IOPs. For instance, for 20000 IOPs, we might choose 4 x P30, which we might expose to the OS as a single volume (by stripping the disks) or not. Again, we might need to oversize here. For instance, we might need 20000 IOPs for a database of only 1TB but 4 x P30 will give us 4 TB of space.
Is that all? Well, no. Now that we have the IOPs we need, we have to make sure the VM can use those IOPs. Let’s take the DSv2 series as an example (see https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general#dsv2-series). A DS2v2 can have 4 data disks and can therefore accommodate our 4 x P3 disks, but it can only pull 8000 IOPs. In order to get the full 20000 IOPs, we would need to oversize to a DS4v2.
One last thing about IOPs: what is it with those two columns cached / uncached disks?
When we attach a disk, we can choose from different caching options: none, read-only & read-write. Caching uses a part of the host resources to cache the disks’ content which obviously accelerate operations.
A VM SKU also controls the network bandwidth of the VM.
There are no precisely documented bandwidth nor SLAs. Instead, categories are used: Low, Moderate, High and Very High. The network bandwidth capacity increases along those categories.
Again, we might need to oversize a VM in order to access higher network throughput if required.
Network Interface Controller (NIC)
Finally, each VM SKU sports a different maximum number of Network Interface Controllers (NICs).
Typically a VM is fine with one NIC. Network appliances (e.g. virtual firewalls) will often require 2 NICs.
There are a few variables to consider when sizing a VM. The number of cores & RAM is a good starting point but you might need to oversize the VMs to satisfy other characteristics such as storage space, disk performance or network performance.