Azure Application Gateway Anatomy

Back in May, we talked about Azure Application Gateway.

In this article, we’re going to look at its anatomy, i.e. its internal component as exposed in the Azure Resource Manager (ARM) model.

A lot of Azure Resource has an internal structure.  For instance, a Virtual Network has a collection of subnets.

Azure Application Gateway has a very rich internal model.  We will look at this model in order to understand how to configure it.

What is Azure Application Gateway

From the official documentation:

Application Gateway is a layer-7 load balancer.  It provides failover, performance-routing HTTP requests between different servers, whether they are on the cloud or on-premises. Application Gateway provides many Application Delivery Controller (ADC) features including HTTP load balancing, cookie-based session affinity, Secure Sockets Layer (SSL) offload, custom health probes, support for multi-site, and many others.

I like to say that it is at time a Reverse Proxy, a Web Application Firewall (WAF) and a layer 7 load balancer.

Internal Resource Model

Let’s start with a summary diagram.  Each box represent a sub resource (except Application Gateway, which represents the main resource) and each of the bullet point within the box represent a property of that sub resource.

image

We can now look at each sub resource.

Application Gateway

Key properties of the gateway itself are

  • The SKU, i.e. the tier (Small, Medium, Large) & the number of instances (1 to 10)
  • A list of SSL certificates (used by HTTP Listeners)

SSL Certificates are optional if the Gateway exposes only HTTP endpoints but are required for HTTPS endpoints.

The SKU can be anything, although in order to have an SLA, it must be of tier medium or large and have at least 2 instances.

Gateway IP Configuration

The Gateway IP Configuration has a 1:1 relationship with the Application Gateway (trying to register a second configuration results in an error) and can therefore be conceptually considered as properties of the Gateway directly.

It simply defines in which subnet does the Application Gateway live.

Frontend IP Configuration

This sub resource defines how the gateway is exposed.

There can be either one or two of those configuration:  either public or private or both.

The same configuration can be used by more than one HTTP listener, using different port.

Frontend Port

Frontend ports describe which ports on the Application Gateway are exposed.  It simply is a port number.

HTTP Listener

This is a key component.  It combines a frontend IP configuration and port ; it also include a protocol (HTTP or HTTPS) and optionally an SSL certificate.

An HTTP listener is what the Application Gateway is listening to, e.g.:

  • Public IP X.Y.Z.W on port 443, HTTPS with a given SSL
  • Private IP 10.0.0.1 on port 80, HTTP

Backend Address Pool

The real compute handling requests.

Typically it’s going to be stand-alone VMs or VMs from a VM Scale Set (VMSS) but technically only the addresses are registered.  It could therefore be some public web site out there.

Backend HTTP Setting

This component describe how to connect to a backend compute:  port#, protocol (HTTP / HTTS) & cookie based affinity.

The frontend & backend can be different:  we can have HTTPS on 443 on the frontend while routing it to HTTP on port 12345 in the backend.  This is actually a typical SSL termination scenario.

Probe

A probe, actually a custom probe, probes a backend for health.  It is described by a protocol, a URL, interval, timeout, etc.  .  Basically we can customize how a backend is determined to be healthy or not.

A custom probe is optional.  By default, a default probe is configured probing the backend using the port and protocol specified in the backend http setting.

Rule

A rule binds an HTTP Listener, a backend address pool together & a backend setting.  It basically binds and endpoint in the frontend with an endpoint in the backend.

There are two types of rules:  basic rules & path rules.  The former simply binds the aforementioned components together while the later adds the concept of mapping a URL pattern to a given backend.

Summary

We covered the anatomy of Application Gateway and articulated how different components relate to each others.

In future articles we will build on this anatomy in order to address specific scenarios.

Hypersphere Volume

abendstimmung, ball-shaped, cloudsIn our last article we looked at how the dimension of data space impacts Machine Learning algorithms.  This is often referred to as the curse of dimensionality.

At the heart of the article we discussed the fact that an hypersphere hyper-volume trends to zero as dimension increases.

Here I want to demonstrate how to find the hyper-volume of an hypersphere of dimension N.

The Math Reference Project gives a short & sweet demonstration.  I personally found it hard to follow.  Foundations of Data Science by John Hopcroft & Ravindran Kannan (chapter 2) starts a demonstration but does cut short.

I wanted to contribute a complete demonstration because I just love that type of mathematical problem.  It’s one of my many flaws.

Approach

countryside, grass, grasslandWe’ll use the Cartesian coordinates and the fact that the volume of an hypersphere of dimension N can be found by integrating the volume of an hypersphere of dimension N-1 with an infinitesimal thickness:

V_N(R) = \displaystyle\int_{-R}^R V_{N-1}(\sqrt{R^2-x^2}) dx

imageWe’ll find the volume for a few dimensions then we’ll generalize the result.

N=1

Well, V_1(R) = 2 R:  it’s a line.

N=2

We already know the result should be V_2(R) = \pi R^2, but let’s demonstrate it.

\begin{array}{lcl} V_2(R) &=& \displaystyle\int_{-R}^R V_1(\sqrt{R^2-x^2}) dx\\ &=& \displaystyle\int_{-R}^R 2 \sqrt{R^2-x^2} dx\\&=& 2 R^2 \displaystyle\int_{-R}^R \sqrt{1- (\frac {x}{R})^2} d \frac{x}{R}\\&=& 2 R^2 \displaystyle\int_{-\pi/2}^{\pi/2} \sqrt{1- \sin^2 \theta} \, d (\sin \theta) \text{ where } \sin \theta = \frac{x}{R}\\&=& 2 R^2 \displaystyle\int_{-\pi/2}^{\pi/2} \cos^2 \theta \, d \theta\\&=& 2 R^2 \cdot \frac{1}{2} [ \theta + \sin {2 \theta} ]_{-\pi/2}^{\pi/2}\\ &=& \pi R^2\end{array}

N=3

We know the result should be V_3(R) = \frac{4}{3} \pi R^3, but again, let’s demonstrate it.

\begin{array}{rcl}V_3(R) &=& \displaystyle\int_{-R}^R V_2(\sqrt{R^2-x^2}) dx\\&=& \displaystyle\int_{-R}^R \pi (\sqrt{R^2-x^2})^2 dx\\&=& \pi (2 R^3 - \displaystyle\int_{-R}^R x^2 dx)\\&=& \pi (2 R^3 - \frac{2 R^3}{3})\\&=& \frac{4}{3} \pi R^3\end{array}

N=4

Let’s find the hyper-volume of an hypersphere of dimension 4.

\begin{array}{rcl} V_4(R) &=& \displaystyle\int_{-R}^R V_3(\sqrt{R^2-x^2}) dx\\&=& \displaystyle\int_{-R}^R \frac{4}{3} \pi (\sqrt{R^2-x^2})^3 dx\\&=& \frac{4}{3} \pi R^4 \displaystyle\int_{-R}^R (1-(\frac{x}{R})^2)^\frac{3}{2} d(\frac{x}{R})\\&=& \frac{4}{3} \pi R^4 \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} (1-\sin^2 \theta)^\frac{3}{2} d(\sin \theta) \text{ where } \sin \theta = \frac{x}{R}\\&=& \frac{4}{3} \pi R^4 \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^3 \theta \cdot \cos \theta d \theta\\&=& \frac{4}{3} \pi R^4 \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^4 \theta d \theta\\&=& \frac{4}{3} \pi R^4 ([\frac{\cos^3 \theta \sin \theta}{4}]_{-\frac{\pi}{2}}^{\frac{\pi}{2}} + \frac{3}{4} \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^2 \theta d \theta)\\&=& \frac{4}{3} \pi R^4 (0 + \frac{3}{4} \frac{1}{2} [\theta + \sin 2 \theta]_{-\frac{\pi}{2}}^{\frac{\pi}{2}})\\&=& \frac{\pi^2}{2} R^4\end{array}

Generalization

Now we have quite some practice.  Let’s try to generalize the hypersphere volume formula.

First let’s assume the volume formula has the following form:

V_N(R) = K_N R^N

Where K_N is a constant (independent of R).  We’ll see that we only need to assume that form for the volumes of N-1 and less.  Since we already know it to be true for N <= 4, it isn’t a strong assumption.

With that, let’s proceed:

\begin{array}{rcl} V_N(R) &=& \displaystyle\int_{-R}^R V_{N-1}(\sqrt{R^2-x^2}) dx\\&=& K_{N-1} \displaystyle\int_{-R}^R (R^2-x^2)^\frac{N-1}{2} dx\\&=& K_{N-1} R^N \displaystyle\int_{-R}^R (1-(\frac{x}{R})^2)^\frac{N-1}{2} d(\frac{x}{R})\\&=& K_{N-1} R^N \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^{N-1} \theta \cdot \cos \theta d \theta \text{ where } \sin \theta = \frac{x}{R}\\&=& K_{N-1} R^N \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^N \theta d \theta\end{array}

We’re dealing with a recursion here, so let’s rewrite this equation in terms of two sets of constants:

\begin{array}{rcl}V_N(R) &=& K_N R^N = C_N K_{N-1} R^N \text{ where } C_N = \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^N \theta d \theta\\&\implies& K_N = C_N K_{N-1}\\&\implies& K_N = (\displaystyle\prod_{i=2}^N C_i) K_1 = 2 \displaystyle\prod_{i=2}^N C_i \text{ (since }K_1=2 \text{)}\end{array}

Let’s work on the set of constants C.  We know the first couple of values:

\begin{array}{rcl} C_0 &=& \pi \\ C_1 &=& 2 \\ C_2 &=& \frac{\pi}{2} \end{array}

We can also obtain a recursive expression.

C_N = \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^N \theta d \theta = \frac{N-1}{N} \displaystyle\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}} \cos^{N-2} \theta d \theta \implies C_N = \frac{N-1}{N} C_{N-2}

If we observes that

\begin{array}{rcl} C_N C_{N-1} &=& \frac{N-1}{N} C_{N-2} \frac{N-2}{N-1} C_{N-3}\\&=& \frac{N-2}{N} C_{N-2} C_{N-3}\\&=& \frac{N-2}{N} \frac{N-4}{N-2} C_{N-4} C_{N-5}\\&=& \frac{N-4}{N} C_{N-4} C_{N-5}\\&=&\begin{cases} \frac{2}{N} C_2 C_1 & \text{if N is even} \\ \frac{1}{N} C_1 C_0 & \text{if N is odd} \end{cases}\\&=&\begin{cases} \frac{2 \pi}{N} & \text{if N is even} \\ \frac{2 \pi}{N} & \text{if N is odd} \end{cases}\\&=&\frac{2 \pi}{N}\end{array}

Then we can write

\begin{array}{lcl} K_N &=& 2 \displaystyle\prod_{i=2}^N C_i \\ &=& \begin{cases} 2 \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{4} C_2 & \text{if N is even} \\ 2 \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{3} & \text{if N is odd} \end{cases}\\ &=& \begin{cases} \pi \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{4} & \text{if N is even} \\ 2 \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{3} & \text{if N is odd} \end{cases}\end{array}

Therefore we found that

\begin{array}{lcl} V_N (R) &=& \begin{cases} \pi \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{4} \cdot R^N & \text{if N is even} \\ 2 \cdot \frac{2 \pi}{N} \frac{2 \pi}{N-2} \dots \frac{2 \pi}{3} \cdot R^N & \text{if N is odd} \end{cases}\end{array}

Which gives us an explicit formula for the volume of an hypersphere in N dimensions.

Limit

Given the formula for K_N \text{ (and that } V_N(R) =K_N R^N, it is easy to it is a product of smaller and smaller terms.

As soon as N becomes bigger than 2 \pi (i.e. at N=6), the terms becomes smaller than 1 and therefore the products start to shrink.

This is why the hyper volume vanishes as N grows towards infinity.

Values

We can then compute values (for R=1):

Dimension Formula Value
1 2 2
2 \pi 3.141592654
3 \frac{4 \pi}{3} 4.188790205
4 \pi \cdot \frac{2 \pi}{4}=\frac{\pi^2}{2}

4.934802201

5 2 \cdot \frac{2 \pi}{5} \frac{2 \pi}{3}=\frac{8 \pi^2}{15}

5.263789014

6 \pi \cdot \frac{2 \pi}{6} \frac{2 \pi}{4}= \frac{\pi^3}{6} 5.16771278

which corresponds to what we gave in our last article.

Summary

We demonstrated how to find the hyper volume of an hyper sphere of dimension N and could rigorously find that the hyper volume vanishes as the dimension grows.

That result is counterintuitive and this is why we thought a mathematical proof was warranted.

Hyperspheres & the curse of dimensionality

fractal-1118515_640I previously talked about the curse of dimensionality (more than 2 years ago) related to Machine Learning.

Here I wanted to discuss it in more depth and dive into the mathematics of it.

High dimensions might sound like Physics’ string theory where our universe is made of more than 4 dimensions.  This isn’t what we are talking about here.

The curse of dimensionality is related to what happens when a model deals with a data space with dimensions in the hundreds or thousands.

As the title of this article suggests, we’re going to take the angle of the properties of Hyperspheres (spheres in N dimensions) to explore high dimension properties.

This article is inspired by Foundations of Data Science by John Hopcroft & Ravindran Kannan (chapter 2).

Why should I care about High Dimension?

When introducing Machine Learning concepts, we typically use few dimensions in order to help visualization.  For instance, when I introduced linear regression or polynomial regression in past articles, I used datasets in two dimensions and plot them on a chart.

Brown RabbitIn the real world, typical data sets have much more dimensions.

A typical case of high dimension is image recognition (or character recognition as a sub category) where even a low resolution pictures will have hundreds of pixels.  The corresponding model would take gray-scale input vector of dimension 100+.

Close-up of an Animal Eating GrassWith fraud detection, transactions do not contain only the value of the transaction, but the time of day, day of week, geo-location, type of commerce, type of products, etc.  .  This might or might not be a high dimension problem, depending on the available data.

In an e-commerce web site, a Product recommendation algorithm could be as simple as an N x N matrix of 0 to 1 values where N is the number of products.

With IoT, multiple sensors feed a prediction model.

In bioinformatics, DNA sequencing generates a huge amount of data which often is arranged in high dimensional model.

Basically, high dimensions crop up everywhere.

What happens as dimension increases?

For starter a space with more dimensions simply is…  bigger.  In order to sample a space with 2 dimensions with a resolution of 10 units, we need to have 10^2 = 100 points.  Having the same sampling in a space of dimension 3 would require 10^3 = 1000 points.  Dimension 20?  20 would require 10^20 = 100 000 000 000 000 000 000 points.

Right off the bat we can tell that sampling the space of dimension 2 & 3 is realistic while for a space of dimension 20, it’s unlikely.  Hence we are likely going to suffer from under-sampling.

Yoshua Bengio has a nice discussion about Curse of Dimensionality here.

Hypersphere in a cube

Tapemeasure on 20Beyond sampling problems, metrics & measures change behaviour at high dimensions.  Intuitively it makes sense since a measure takes a vector (vectors) and squeeze it (them) into a numerical value ; the higher the dimension, the more data we squeeze into one number & hence we should lose information.

We use metrics & measures heavily in Machine Learning.  For instance, a lot of cost (or loss) functions are based on Euclidean’s distance:

dist(x,y) = \displaystyle\sum_{i=1}^N (x_i-y_i)^2

Now if x and / or y are random variables (e.g. samples), the law of large numbers applies when N becomes large.  This implies the sum will trend to the expected value with a narrower standard deviation as N increases.  In turns, this means there is less and less information in the distance as the number of dimensions increases.

This brings us to the hypersphere.  An hypersphere’s equation is

\displaystyle\sum_{i=1}^N x_i^2 = R^2

where x is a point of dimension N and R is the radius of the hypersphere.

An hypersphere of dimension 1 is a line, an hypersphere of dimension 2 is a circle, dimension 3 is a sphere, dimension 4 is an…  expending universe?  and so on.

A theorem I’ll demonstrate in a future article is that the volume of an hypersphere of radius 1 tends to zero as the dimension increases.

UPDATE (12-07-2017):  Demonstration of hypersphere hyper volume is done in this article.

This is fairly unintuitive, so let me give real numerical values:

Dimension Hyper Volume
1 2
2 3.141592654
3 4.188790205
4 4.934802201
5 5.263789014
6 5.16771278
7 4.72476597
8 4.058712126
9 3.298508903
10 2.55016404
11 1.884103879
12 1.335262769
13 0.910628755
14 0.599264529
15 0.381443281
16 0.23533063
17 0.140981107
18 0.082145887
19 0.046621601
20 0.025806891
21 0.01394915
22 0.007370431
23 0.003810656
24 0.001929574
25 0.000957722
26 0.000466303
27 0.000222872
28 0.000104638

If we plot those values:

image

We see the hyper volume increases in the first couple of dimensions.  A circle of radius 1 has an area of pi (3.1416) while a sphere of radius 1 has a volume of 4.19.  It peaks at dimension 5 and then shrinks.

It is unintuitive because in 2 and 3 dimensions (the only dimensions in which we can visualize an hypersphere), the hypersphere pretty much fills its embedding cube.  A way to “visualize” what’s happening in higher dimension is to consider a “diagonal” into an hypersphere.

For a circle, the diagonal (i.e. 45’) intersects with the unit circle at

(\frac {1} {\sqrt {2}}, \frac {1} {\sqrt {2}}) since (\frac {1} {\sqrt {2}})^2 + (\frac {1} {\sqrt {2}})^2 = 1^2

In general, at dimension N, the diagonal intersects at

x_i = \frac {1} {\sqrt {N}}

So, despite the hypersphere of radius 1 touches the cube of side 2 centered at the origin on each of its walls, the surface of the hypersphere, in general, gets further and further away from the cube surface as the dimension increases.

Consequences of the hypersphere volume

A straightforward consequence of the hypersphere volume is sampling.  Randomly sampling a square of side 2 centered at the origin will land points within the unit circle with probability \frac{\pi}{4} = \%79.  The same process with an hypersphere of dimension 8 would hit the inside of the hypersphere with a probability of %1.6.

A corollary to the hypersphere volume is that at higher dimension, the bulk of the volume of the hypersphere is concentrated in a thin annulus below its surface.  An obvious consequence of that is that optimizing a metric (i.e. a distance) in high dimension is difficult.

What should we do about it?

First step is to be aware of it.

A symptom of high dimensionality is under sampling:  the space covered is so large the number of sample points required to learn the underlying model are likely to be over the actual sample set’s size.

The simplest solution is to avoid high dimensionality with some pre-processing.  For instance, if we have a priori knowledge of the domain, we might be able to combine dimensions together.  For example, in an IoT field with 10 000 sensors, for many reasons, including curse of dimensionality, it wouldn’t be a good idea to consider each sensor inputs as an independent input.  It would be worth trying to aggregate out sensor inputs by analyzing the data.

Summary

Some Machine Learning algorithms will be more sensitive to higher dimensionality than others but the curse of dimensionality affects most algorithms.

It is a problem to be aware of and we should be ready to mitigate it with some good feature engineering.

URL Routing with Azure Application Gateway

Update (13-06-2017):  The POC of this article is available on GitHub here.

I have a scenario perfect for a Layer-7 Load Balancer / Reverse Proxy:

  • Multiple web server clusters to be routed under one URL hierarchy (one domain name)
  • Redirect HTTP traffic to the same URL on HTTPS
  • Have reverse proxy performing SSL termination (or SSL offloading), i.e. accepting HTTPS but routing to underlying servers using HTTP

On paper, Azure Application Gateway can do all of those.  Let’s fine out in practice.

Azure Application Gateway Concepts

From the documentation:

Application Gateway is a layer-7 load balancer.  It provides failover, performance-routing HTTP requests between different servers, whether they are on the cloud or on-premises. Application Gateway provides many Application Delivery Controller (ADC) features including HTTP load balancing, cookie-based session affinity, Secure Sockets Layer (SSL) offload, custom health probes, support for multi-site, and many others.

Before we get into the meat of it, there are a bunch of concepts Application Gateway uses and we need to understand:

  • Back-end server pool: The list of IP addresses of the back-end servers. The IP addresses listed should either belong to the virtual network subnet or should be a public IP/VIP.
  • Back-end server pool settings: Every pool has settings like port, protocol, and cookie-based affinity. These settings are tied to a pool and are applied to all servers within the pool.
  • Front-end port: This port is the public port that is opened on the application gateway. Traffic hits this port, and then gets redirected to one of the back-end servers.
  • Listener: The listener has a front-end port, a protocol (Http or Https, these values are case-sensitive), and the SSL certificate name (if configuring SSL offload).
  • Rule: The rule binds the listener, the back-end server pool and defines which back-end server pool the traffic should be directed to when it hits a particular listener.

On top of those, we should probably add probes that are associated to a back-end pool to determine its health.

Proof of Concept

As a proof of concept, we’re going to implement the following:

image

We use Windows Virtual Machine Scale Sets (VMSS) for back-end servers.

In a production setup, we would go for exposing the port 443 on the web, but for a POC, this should be sufficient.

As of this writing, there are no feature to allow automatic redirection from port 80 to port 443.  Usually, for public web site, we want to redirect users to HTTPS.  This could be achieve by having one of the VM scale set implementing the redirection and routing HTTP traffic to it.

ARM Template

We’ve published the ARM template on GitHub.

First, let’s look at the visualization.

image

The template is split within 4 files:

  • azuredeploy.json, the master ARM template.  It simply references the others and passes parameters around.
  • network.json, responsible for the virtual network and Network Security Groups
  • app-gateway.json, responsible for the Azure Application Gateway and its public IP
  • vmss.json, responsible for VM scale set, a public IP and a public load balancer ; this template is invoked 3 times with 3 different set of parameters to create the 3 VM scale sets

We’ve configured the VMSS to have public IPs.  It is quite typical to want to connect directly to a back-end servers while testing.  We also optionally open the VMSS to RDP traffic ; this is controlled by the ARM template’s parameter RDP Rule (Allow, Deny).

Template parameters

Here are the following ARM template parameters.

Parameter Description
Public DNS Prefix The DNS suffix for each VMSS public IP.
They are then suffixed by ‘a’, ‘b’ & ‘c’.
RDP Rule Switch allowing or not allowing RDP network traffic to reach VMSS from public IPs.
Cookie Based Affinity Switch enabling / disabling cookie based affinity on the Application Gateway.
VNET Name Name of the Virtual Network (default to VNet).
VNET IP Prefix Prefix of the IP range for the VNET (default to 10.0.0).
VM Admin Name Local user account for administrator on all the VMs in all VMSS (default to vmssadmin).
VM Admin Password Password for the VM Admin (same for all VMs of all VMSS).
Instance Count Number of VMs in each VMSS.
VM Size SKU of the VMs for the VMSS (default to Standard DS2-v2).

Routing

An important characteristic of URL-based routing is that requests are routed to back-end servers without alteration.

This is important.  It means that /a/ on the Application Gateway is mapped to /a/ on the Web Server.  It isn’t mapped to /, which seems more intuitive as that would seem like the root of the ‘a’ web servers.  This is because URL-base routing can be more general than just defining suffix.

Summary

This proof of concept gives a fully functional example of Azure Application Gateway using URL-based routing.

This is a great showcase for Application Gateway as it can then reverse proxy all traffic while keeping user affinity using cookies.

Automating Role Assignment in Subscriptions & Resource Groups

keys-unlock[1]Azure supports a Role Based Access Control (RBAC) system.  This system links identity (users & groups) to roles.

RBAC is enforced at the REST API access level, which is the fundamental access in Azure:  it can’t be bypassed.

In this article, we’ll look at how we can automate the role assignation procedure.

This is useful if you routinely create resource groups for different people, e.g. each time a department request some Azure environment or even if you routinely create new subscriptions.

We’re going to do this in PowerShell.  So let’s prep a PowerShell environment with Azure SDK & execute the Add-AzureRmAccount (login) command.

Exploring roles

A role is an aggregation of actions.

Let’s look at the available roles.


Get-AzureRmRoleDefinition | select Name | sort -Property Name

This gives us the rather long list of roles:

  • API Management Service Contributor
  • API Management Service Operator Role
  • API Management Service Reader Role
  • Application Insights Component Contributor
  • Application Insights Snapshot Debugger
  • Automation Job Operator
  • Automation Operator
  • Automation Runbook Operator
  • Azure Service Deploy Release Management Contributor
  • Backup Contributor
  • Backup Operator
  • Backup Reader
  • Billing Reader
  • BizTalk Contributor
  • CDN Endpoint Contributor
  • CDN Endpoint Reader
  • CDN Profile Contributor
  • CDN Profile Reader
  • Classic Network Contributor
  • Classic Storage Account Contributor
  • Classic Storage Account Key Operator Service Role
  • Classic Virtual Machine Contributor
  • ClearDB MySQL DB Contributor
  • Contributor
  • Data Factory Contributor
  • Data Lake Analytics Developer
  • DevTest Labs User
  • DNS Zone Contributor
  • DocumentDB Account Contributor
  • GenevaWarmPathResourceContributor
  • Intelligent Systems Account Contributor
  • Key Vault Contributor
  • Log Analytics Contributor
  • Log Analytics Reader
  • Logic App Contributor
  • Logic App Operator
  • Monitoring Contributor Service Role
  • Monitoring Reader Service Role
  • Network Contributor
  • New Relic APM Account Contributor
  • Office DevOps
  • Owner
  • Reader
  • Redis Cache Contributor
  • Scheduler Job Collections Contributor
  • Search Service Contributor
  • Security Admin
  • Security Manager
  • Security Reader
  • SQL DB Contributor
  • SQL Security Manager
  • SQL Server Contributor
  • Storage Account Contributor
  • Storage Account Key Operator Service Role
  • Traffic Manager Contributor
  • User Access Administrator
  • Virtual Machine Contributor
  • Web Plan Contributor
  • Website Contributor

Some roles are specific, e.g. Virtual Machine Contributor, while others are much broader, e.g. Contributor.

Let’s look at a specific role:


Get-AzureRmRoleDefinition "Virtual Machine Contributor"

This gives us a role definition object:


Name             : Virtual Machine Contributor
Id               : 9980e02c-c2be-4d73-94e8-173b1dc7cf3c
IsCustom         : False
Description      : Lets you manage virtual machines, but not access to them, and not the virtual network or storage account
they�re connected to.
Actions          : {Microsoft.Authorization/*/read, Microsoft.Compute/availabilitySets/*, Microsoft.Compute/locations/*,
Microsoft.Compute/virtualMachines/*...}
NotActions       : {}
AssignableScopes : {/}

Of particular interest are the actions allowed by that role:


(Get-AzureRmRoleDefinition "Virtual Machine Contributor").Actions

This returns the 34 actions (as of the time of this writing) the role enables:

  • Microsoft.Authorization/*/read
  • Microsoft.Compute/availabilitySets/*
  • Microsoft.Compute/locations/*
  • Microsoft.Compute/virtualMachines/*
  • Microsoft.Compute/virtualMachineScaleSets/*
  • Microsoft.Insights/alertRules/*

We see that wildcards are used to allow multiple actions.  Therefore there are actually much more than 34 actions allowed by this role.

Let’s look at a more generic role:


Get-AzureRmRoleDefinition "Contributor"

This role definition object is:


Name             : Contributor
Id               : b24988ac-6180-42a0-ab88-20f7382dd24c
IsCustom         : False
Description      : Lets you manage everything except access to resources.
Actions          : {*}
NotActions       : {Microsoft.Authorization/*/Delete, Microsoft.Authorization/*/Write,
Microsoft.Authorization/elevateAccess/Action}
AssignableScopes : {/}

We notice that all actions (*) are allowed but that some actions are explicitly disallowed via the NotActions property.


(Get-AzureRmRoleDefinition "Contributor").NotActions

  • Microsoft.Authorization/*/Delete
  • Microsoft.Authorization/*/Write
  • Microsoft.Authorization/elevateAccess/Action

We could create custom roles aggregating arbitrary groups of actions together but we won’t cover that here.

Users & Groups

Groups-Meeting-Dark-icon[1]

Now that we know about role, let’s look at users & groups.

Users & groups will come from the Azure AD managing our Azure subscription.

We can grab a user with Get-AzureRmADUser.  This will list all the users in the tenant.  If you are part of a large organization, this is likely a long list.  We can grab a specific user with the following command:


Get-AzureRmADUser -UserPrincipalName john.smith@contoso.com

We need to specify the domain of the user since we could have users coming from different domains inside the same tenant.

Let’s grab the object ID of the user:


$userID = (Get-AzureRmADUser -UserPrincipalName john.smith@contoso.com).Id

Similarly, we could grab the object ID of a group:


$groupID = (Get-AzureRmADGroup -SearchString "Azure Team").Id

Scope

Apps-Brackets-B-icon[1]Next thing to determine is the scope where we want to apply a role.

The scope can be either a subscription, a resource group or a resource.

To use our subscription as the scope, let’s run:


$scope = "/subscriptions/" + (Get-AzureRmSubscription)[0].SubscriptionId

To use a resource group as the scope, let’s run:


$scope = (Get-AzureRmResourceGroup -Name MyGroup).ResourceId

Finally, to use a specific resource as the scope, let’s run:


$scope = (Get-AzureRmResource -ResourceGroupName MyGroup -ResourceName MyResource).ResourceId

Assigning a role

Ok, let’s do this:  let’s put it all together:


New-AzureRmRoleAssignment -ObjectId $userID -Scope $scope -RoleDefinitionName "Contributor"

We can double check in the portal the assignation occurred.

Summary

We simply automate the role assignation using PowerShell.

As with everything that can be done in PowerShell, it can be done using Azure Command Line Interface CLI.  Commands are quite similar.

Also, like every automation, it can be bundled in an Azure Automation Runbook.  So if we have routine operations consisting in provisioning subscriptions or resource groups to group of users, we could package it in a Runbook to ensure consistency.

Managing Azure AD Application members in Portal

One of Azure AD’s powerful concept is the application.  It gives context to an authentication as we explained in this article.

An application can also be used as an authorization barrier since we can manage an application members.  This is optional as by default, everyone in a tenant has access to its application.  But if we opt in to control the members, only members can has access to the application, hence only members can authenticate via the application.

In this article, we’ll look at how to manage members of an application in the Portal.  We’ll discuss how to automate this in a future article.

Application Creation

First, let’s create an application.

In the Azure Active Directory (Azure AD or AAD) blade, let’s select App Registrations, then Add.

image

Let’s type the following specifications:

image

Opt in to Manage members

If we now go into the application and select Managed Application in Local Directory:

image

We can select the properties tab and there we can require user assignment.

image

Assigning users

We can then assign users & groups (assigning groups require Azure AD Premium SKU).

image

Summary

Azure AD Application Membership, also called User Assignment, is a simple opt-in feature that allows us to control which user can use a given application.

It can be used as a simple (application-wide) authorization mechanism.

Sizing & Pricing Virtual Machines in Azure

https://pixabay.com/en/dog-dog-breed-large-puppy-1966394/I’m recurrently asked by customers similar questions around sizing & pricing of Virtual Machines (VMs), storage, etc. .  So I thought I would do a reusable asset in the form of this article.

This is especially important if you are trying to size / price VMs “in advance”.  For instance if you are quoting some work in a “fixed bid” context, i.e. you need to provide the Azure cost before you wrote a single line of code of your application.

If that isn’t your case, you can simply trial different VM sizes.  The article would still be useful to see what variables you should be looking at if you do not obtain the right performance.

There are a few things to look for.  We tend to focus on the CPU & RAM but that’s only part of the equation.  The storage & performance target will often drive the choice of VM.

A VM has the following characteristics:  # cores, RAM, Local Disk, # data disks, IOPs, # NICs & Network bandwidth.  We need to consider all of those before choosing a VM.

For starter, we need to understand that Virtual Machines cannot be “hand crafted”, i.e. we cannot choose CPU speed, RAM & IOPS separately.  They come in predefined packages with predefined specs:  SKUs, e.g. D2.

Because of that, we might often have to oversize a characteristic (e.g. # cores) in order to get the right amount of another characteristic (e.g. RAM).

SKUs come in families called Series.  At the time of this writing Azure has the following VM series:

  • A
  • Av2 (A version 2)
  • D & DS
  • Dv2 & DSv2 (D version 2 & DS version 2)
  • F & FS
  • G & GS
  • H & HS
  • L & LS
  • NC
  • NV

Each series will optimize different ratios.  For instance, the F Series will have a higher cores / RAM ratio than the D series.  So if we are looking at a lot of cores and not much RAM, the F series is likely a better choice than D series and will not force us to oversize the RAM as much in order to have the right # of cores.

For pricing, the obvious starting point is the pricing page for VM:  https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/.

Cores

Azure compute allocates virtual core from the physical host to the VMs.

Azure cores are dedicated cores.  As of the time of this writing, there is no shared core (except for A0 VM) and there are no hyper threading.

Operating System

There are two components in the price of a VM:

  1. Compute (the raw underlying VM, i.e. the CPU + RAM + local disk)
  2. Licensed software running on it (e.g. Windows, SQL, RHEL, etc.)

The compute price corresponds to the CentOS Linux pricing since CentOS is open source and has no license fee.

Azure has different flavours of licensed software (as of the writing of this article, i.e. March 2017):

  • Windows
    • BizTalk
    • Oracle Java
    • SharePoint
    • SQL Server
  • Linux
    • Open Source (no License)
    • Licensed:  Red Hat Enterprise License (RHEL), R Server, SUSE

Windows by itself comes with the same license fee regardless of Windows version (e.g. Windows 2012 & Windows 2016 have the same license fee).

Windows software (e.g. BizTalk) will come with software license (e.g. BizTalk) + OS license.  This is reflected in the pricing columns.  For instance, for BizTalk Enterprise (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/biztalk-enterprise/), here in Canadian dollars in Canada East region for the F Series:

image

In the OS column is the price of the compute + the Windows license while in the “Software” column is the price of the BizTalk Enterprise license.  The total is what we pay per hour for the VM.

It is possible to “Bring Your Own License” (BYOL) of any software (including Windows or Linux) in Azure and therefore pay only for the bare compute (which, again, correspond to CentOS Linux pricing).

UPDATE:  Through Azure Hybrid Use Benefit, we can even “reuse” an on premise Windows license for a new (unrelated) VM in Azure.

We can also run whatever licensed software we want on top of a VM.  We can install SAP, get an SAP license and be %100 legal.  The licensed software I enumerated come with the option of being integrated in the “per minute” cost.

So one of the first decision to do in pricing is:  do we want to go with integrated pricing or external licensed based pricing?  Quite easy to decide:  simply look at the price of external licenses (e.g. volume licensing) we can have with the vendor and compare.

Typically if we run the VM sporadically, i.e. few hours per day, it is cheaper to go with the integrated pricing.  Also, I see a lot of customer starting with integrated pricing for POCs, run it for a while and optimize pricing later.

Temporary Disk

footprint-93482_640Ok, here, let’s debunk what probably takes 2 hours from me every single week:  the “disk size” column in the pricing sheets.

image

This is local storage.  By local, we mean it’s local to the host itself, it isn’t an attached disk.  For that reason it has lower latency than attached disks.  It has also another very important characteristic:  it is ephemeralIt isn’t persistentIts content does not survive a reboot of the VMThe disk is empty after reboot.

We are insisting on this point because everybody gets confused on that column and for a good reason:  the column title is bunker.  It doesn’t lie, it is a disk and it does have the specified size.  But it is a temporary disk.

Can we install the OS on that disk?  No.  Note, we didn’t say “we shouldn’t”, but “we can’t”.

What we typically put on that disk is:

  • Page file
  • Temporary files (e.g. tempdb for SQL Server running on VM)
  • Caching files

Some VM series have quite large temporary disk.  Take for instance the L series:

image

That VM series was specifically designed to work with Big Data workload where data is replicated within a cluster (e.g. Hadoop, Cassandra, etc.).  Disk latency is key but not durability since the data is replicated around.

Unless you run such a workload, don’t rely on the temporary disk too much.

The major consequence here is:  add attached disks to your pricing.  See https://azure.microsoft.com/en-us/pricing/details/managed-disks/.

Storage Space

The pricing page is nice but to have a deeper conversation we’ll need to look at more VM specs.  We start our journey at https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes.  From there, depending on the “type” of VM we are interested in, we’re going to dive into one of the links, e.g. https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general.

The documentation repeats the specs we see on the pricing page, i.e. # of cores, RAM & local disk size, but also gives other specs:  max number of data disks, throughput, max number of NICs and network bandwidth.  Here we’ll focus on the maximum number of data disks.

A VM comes with an OS disk, a local disk and a set of optional data disks.  Depending on the VM SKU, the maximum number of data disks does vary.

At the time of this writing, the maximum size of a disk on a VM is 1TB.  We can have bigger volumes on the VM by stripping multiple disks together on the VM’s OS.  But the biggest disk is 1TB.

For instance, a D1v2 (see https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general#dv2-series) can have 2 data disks on top of the OS disk.  That means, if we max out each of the 3 disks, 3 TB, including the space for the OS.

So what if the D1v2 really is enough for our need in terms of # of cores and RAM but we need 4 TB of storage space?  Well, we’ll need to bump up to another VM SKU, a D2v2 for instance, which supports 4 data disks.

Attached Disks

night-computer-hdd-hard-driveBeside the temporary disk all VM disks have attached disks.

Attached means they aren’t local to the VM’s host.  They are attached to the VM and backed by Azure storage.

Azure storage means 3 times synchronous replica, i.e. high resilience, highly persistence.

The Azure storage is its own complex topic with many variables, e.g. LRS / GRS / RA-RGS, Premium / Standard, Cool / Hot, etc.  .

Here we’ll discuss two dimensions:  Premium vs Standard & Managed vs Unmanaged disks.

We’ve explained what managed disks are in contrast with unmanaged disk in this article.  Going forward I recommend only managed disks.

Standard disks are backed by spinning physical disks while Premium disks are backed by Solid State Drive (SSD) disks.  In general:

  • Premium disk has higher IOPs than Standard disk
  • Premium disk has more consistent IOPs than Standard disk (Standard disk IOPs will vary)
  • Premium disk is has higher availability (see Single VM SLA)
  • Premium disk is more expensive than Standard disk

So really, only the price will stop us from only using Premium disk.

In general:  IO intensive workloads (e.g. databases) should always be on premium.  Single VM need to be on Premium in order to have an SLA (again, see Single VM SLA).

For the pricing of disks, see https://azure.microsoft.com/en-us/pricing/details/managed-disks/.  Disks come in predefined sizes.

IOPs

speed-1249610_640We have our VM, the OS on it, we have the storage space but are the disks going to perform?

This is where the Input / Ouput per seconds (IOPs) come into the picture.

An IO intensive workload (e.g. database) will consume IOPs from the VM disks.

Each disk come with a number of IOPs.  In the pricing page (https://azure.microsoft.com/en-us/pricing/details/managed-disks/), the Premium disks, i.e. P10, P20 & P30, have documented IOPs of 500, 2300 & 5000 respectively.  Standard disks (at the time of this writing, March 2017), do not have IOPs documented but it is easy to find out by creating disks in the portal ; for instance an S4 disk with 32 GB will have 500 IOPs & 60 MB/s throughput.

In order to get the total number of IOPs we need, we’ll simply select a set of disks that has the right total of IOPs.  For instance, for 20000 IOPs, we might choose 4 x P30, which we might expose to the OS as a single volume (by stripping the disks) or not.  Again, we might need to oversize here.  For instance, we might need 20000 IOPs for a database of only 1TB but 4 x P30 will give us 4 TB of space.

Is that all?  Well, no.  Now that we have the IOPs we need, we have to make sure the VM can use those IOPs.  Let’s take the DSv2 series as an example (see https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-general#dsv2-series).  A DS2v2 can have 4 data disks and can therefore accommodate our 4 x P3 disks, but it can only pull 8000 IOPs.  In order to get the full 20000 IOPs, we would need to oversize to a DS4v2.

image

One last thing about IOPs:  what is it with those two columns cached / uncached disks?

When we attach a disk, we can choose from different caching options:  none, read-only & read-write.  Caching uses a part of the host resources to cache the disks’ content which obviously accelerate operations.

Network bandwidth

A VM SKU also controls the network bandwidth of the VM.

There are no precisely documented bandwidth nor SLAs.  Instead, categories are used:  Low, Moderate, High and Very High.  The network bandwidth capacity increases along those categories.

Again, we might need to oversize a VM in order to access higher network throughput if required.

Network Interface Controller (NIC)

Finally, each VM SKU sports a different maximum number of Network Interface Controllers (NICs).

Typically a VM is fine with one NIC.  Network appliances (e.g. virtual firewalls) will often require 2 NICs.

Summary

There are a few variables to consider when sizing a VM.  The number of cores & RAM is a good starting point but you might need to oversize the VMs to satisfy other characteristics such as storage space, disk performance or network performance.