Joining an ARM Linux VM to AAD Domain Services

Active Directory is one of the most popular domain controller / LDAP server around.

In Azure we have Azure Active Directory (AAD).  Despite the name, AAD isn’t just a multi-tenant AD.  It is built for the cloud.

Sometimes though, it is useful to have a traditional domain controller…  in the cloud.  Typically this is with legacy workloads built to work with Active Directory.  But also, a very common scenario is to join an Azure VM to a domain so that users authenticate on it with the same accounts they authenticate to use the Azure Portal.

The underlying directory could even be synced with a corporate network, in which case users could log into the VMs using their corporate account.  I won’t cover this here but you can read about it in a previous article for AD Connect part.

The straightforward option is to build an Active Directory cluster on Azure VMs.  This will work but requires the maintenance of those 2 VMs.

mont-saint-michel-france-normandy-europe[1]

An easier option is AAD Domain Services (AADDS).  AADDS exposes an AAD tenant as a managed domain service.  It does it by provisioning some variant of Active Directory managed cluster, i.e. we do not see or care about the underlying VMs.

The cluster is synchronized one-way (from AAD to AADDS).  For this reason, AAD is read only through its LDAP interface, e.g. we can’t reset a password using LDAP.

The Azure documentation walks us through such an integration with classic (ASM) VMs.  Since ARM has been around for more than a year, I recommend to always go with ARM VMs.  This article aims at showing how to do this.

I’ll heavily leveraged the existing documentation and detail only what differs from the official documentation.

Also, keep in mind this article is written in January 2017.  Azure AD will transition to ARM in the future and will likely render this article obsolete.

Dual Networks

The main challenge we face is that AAD is an ASM service and AAD Domain Service are exposed within an ASM Virtual Network (VNET), which is incompatible with our ARM VM.

Thankfully, we now have Virtual Network peering allowing us to peer an ASM and an ARM VNET together so they can act as if they were one.

image

Peering

As with all VNET peering, the two VNETs must be of mutually exclusive IP addresses space.

I created two VNETs in the portal (https://portal.azure.com).  I recommend creating them in the new portal explicitly, this way even the classic one will be part of the desired resource group.

The classic one has 10.0.0.0/24 address range while the ARM one has 10.1.0.0/24.

The peering can be done from the portal too.  In the Virtual Network pane (say the ARM one), select Peerings:

image

We should see an empty list here, so let’s click Add.

We need to give the peering a name.  Let’s type PeeringToDomainServices.

We then select Classic in the Peer details since we want to peer with a classic VNET.

Finally, we’ll click on Choose a Virtual Network.

image

From there we should see the classic VNET we created.

Configuring AADDS

The online documentation is quite good for this.

Just make sure you select the classic VNET we created.

You can give a domain name that is different that the AAD domain name (i.e. *.onmicrosoft.com).

Enabling AADDS takes up to 30 minutes.  Don’t hold your breath!

Joining a VM

We can create a Linux VM, put it in the ARM VNET we created, and join it to the AADDS domain now.

Again, the online documentation does a great job of walking us through the process.  The documentation is written for Red Hat.

When I tried it, I used a CentOS VM and I ended up using different commands, namely the realmd command (ignoring the SAMBA part of the article).

Conclusion

It is fairly straightforward to enable Domain Services in AAD and well documented.

A challenge we currently have currently is to join or simply communicate from an ARM VM to AADDS.  For this we need two networks, a classic (ASM) one and an ARM one, and we need to peer them together.

Troubleshooting NSGs using Diagnostic Logs

I’ve wrote about how to use Network Security Group (NSG) before.

Chances are, once you get a complicated enough set of rules in a NSG, you’ll find yourself with NSGs that do not do what you think they should do.

Troubleshooting NSGs isn’t trivial.

I’ll try to give some guidance here but to this date (January 2017), there is no tool where you can just say “please follow packet X and tell me against which wall it bumps”.  It’s more indirect than that.

 

Connectivity

First thing, make sure you can connect to your VNET.

If you are connecting to a VM via a public IP, make sure you have access to that IP (i.e. you’re not sitting behind an on premise firewall blocking the outgoing port you are trying to use), that the IP is connected to the VM either directly or via a Load Balancer.

If you are connecting to a VM via a private IP through a VPN Gateway of some sort, make sure you can connect and that your packets are routed to the gateway and from there they get routed to the proper subnet.

An easy way to make sure of that is to remove all NSGs and replace them by a “let everything go in”.  Of course, that’s also opening your workloads to hackers, so I recommend you do that with a test VM that you destroy afterwards.

Diagnose

Then I would recommend to go through the official Azure guidelines to troubleshoot NSGs.  This walks you through the different diagnosis tools.

Diagnostic Logs

If you reached this section and haven’t achieve greatness yet, well…  You need something else.

What we’re going to do here is use NSG Diagnostic Logs to understand a bit more what is going on.

By no means is this magic and especially in an environment already in use where a lot of traffic is occurring, it might be difficult to make sense of what the logs are going to give us.

Nevertheless, the logs give us a picture of what really is happening.  They are aggregated though, so we won’t see your PC IP address for instance.  The aggregation is probably what limit the logs effectiveness the most.

Sample configuration

I provide here a sample configuration I’m going to use to walk through the troubleshooting process.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "VM Admin User Name": {
      "defaultValue": "myadmin",
      "type": "string"
    },
    "VM Admin Password": {
      "defaultValue": null,
      "type": "securestring"
    },
    "Disk Storage Account Name": {
      "defaultValue": "<your prefix>vmpremium",
      "type": "string"
    },
    "Log Storage Account Name": {
      "defaultValue": "<your prefix>logstandard",
      "type": "string"
    },
    "VM Size": {
      "defaultValue": "Standard_DS2",
      "type": "string",
      "allowedValues": [
        "Standard_DS1",
        "Standard_DS2",
        "Standard_DS3"
      ],
      "metadata": {
        "description": "SKU of the VM."
      }
    },
    "Public Domain Label": {
      "type": "string"
    }
  },
  "variables": {
    "Vhds Container Name": "vhds",
    "VNet Name": "MyVNet",
    "Ip Range": "10.0.1.0/24",
    "Public IP Name": "MyPublicIP",
    "Public LB Name": "PublicLB",
    "Address Pool Name": "addressPool",
    "Subnet NSG Name": "subnetNSG",
    "VM NSG Name": "vmNSG",
    "RDP NAT Rule Name": "RDP",
    "NIC Name": "MyNic",
    "VM Name": "MyVM"
  },
  "resources": [
    {
      "type": "Microsoft.Network/publicIPAddresses",
      "name": "[variables('Public IP Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "tags": {
        "displayName": "Public IP"
      },
      "properties": {
        "publicIPAllocationMethod": "Dynamic",
        "idleTimeoutInMinutes": 4,
        "dnsSettings": {
          "domainNameLabel": "[parameters('Public Domain Label')]"
        }
      }
    },
    {
      "type": "Microsoft.Network/loadBalancers",
      "name": "[variables('Public LB Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "tags": {
        "displayName": "Public Load Balancer"
      },
      "properties": {
        "frontendIPConfigurations": [
          {
            "name": "LoadBalancerFrontEnd",
            "comments": "Front end of LB:  the IP address",
            "properties": {
              "publicIPAddress": {
                "id": "[resourceId('Microsoft.Network/publicIPAddresses/', variables('Public IP Name'))]"
              }
            }
          }
        ],
        "backendAddressPools": [
          {
            "name": "[variables('Address Pool Name')]"
          }
        ],
        "loadBalancingRules": [
          {
            "name": "Http",
            "properties": {
              "frontendIPConfiguration": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/frontendIPConfigurations/LoadBalancerFrontEnd')]"
              },
              "frontendPort": 80,
              "backendPort": 80,
              "enableFloatingIP": false,
              "idleTimeoutInMinutes": 4,
              "protocol": "Tcp",
              "loadDistribution": "Default",
              "backendAddressPool": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/backendAddressPools/', variables('Address Pool Name'))]"
              },
              "probe": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/probes/TCP-Probe')]"
              }
            }
          }
        ],
        "probes": [
          {
            "name": "TCP-Probe",
            "properties": {
              "protocol": "Tcp",
              "port": 80,
              "intervalInSeconds": 5,
              "numberOfProbes": 2
            }
          }
        ],
        "inboundNatRules": [
          {
            "name": "[variables('RDP NAT Rule Name')]",
            "properties": {
              "frontendIPConfiguration": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/frontendIPConfigurations/LoadBalancerFrontEnd')]"
              },
              "frontendPort": 3389,
              "backendPort": 3389,
              "protocol": "Tcp"
            }
          }
        ],
        "outboundNatRules": [],
        "inboundNatPools": []
      },
      "dependsOn": [
        "[resourceId('Microsoft.Network/publicIPAddresses', variables('Public IP Name'))]"
      ]
    },
    {
      "type": "Microsoft.Network/virtualNetworks",
      "name": "[variables('VNet Name')]",
      "apiVersion": "2016-03-30",
      "location": "[resourceGroup().location]",
      "properties": {
        "addressSpace": {
          "addressPrefixes": [
            "10.0.0.0/16"
          ]
        },
        "subnets": [
          {
            "name": "default",
            "properties": {
              "addressPrefix": "[variables('Ip Range')]",
              "networkSecurityGroup": {
                "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('Subnet NSG Name'))]"
              }
            }
          }
        ]
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Network/networkSecurityGroups', variables('Subnet NSG Name'))]"
      ]
    },
    {
      "apiVersion": "2015-06-15",
      "name": "[variables('Subnet NSG Name')]",
      "type": "Microsoft.Network/networkSecurityGroups",
      "location": "[resourceGroup().location]",
      "tags": {},
      "properties": {
        "securityRules": [
          {
            "name": "Allow-HTTP-From-Internet",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "80",
              "sourceAddressPrefix": "Internet",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 100,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-RDP-From-Everywhere",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "3389",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 150,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-Health-Monitoring",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "AzureLoadBalancer",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 200,
              "direction": "Inbound"
            }
          },
          {
            "name": "Disallow-everything-else-Inbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 300,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-to-VNet",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "VirtualNetwork",
              "access": "Allow",
              "priority": 100,
              "direction": "Outbound"
            }
          },
          {
            "name": "Disallow-everything-else-Outbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 200,
              "direction": "Outbound"
            }
          }
        ],
        "subnets": []
      }
    },
    {
      "apiVersion": "2015-06-15",
      "name": "[variables('VM NSG Name')]",
      "type": "Microsoft.Network/networkSecurityGroups",
      "location": "[resourceGroup().location]",
      "tags": {},
      "properties": {
        "securityRules": [
          {
            "name": "Allow-HTTP-From-Internet",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "80",
              "sourceAddressPrefix": "Internet",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 100,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-Health-Monitoring",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "AzureLoadBalancer",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 200,
              "direction": "Inbound"
            }
          },
          {
            "name": "Disallow-everything-else-Inbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 300,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-to-VNet",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "VirtualNetwork",
              "access": "Allow",
              "priority": 100,
              "direction": "Outbound"
            }
          },
          {
            "name": "Disallow-everything-else-Outbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 200,
              "direction": "Outbound"
            }
          }
        ],
        "subnets": []
      }
    },
    {
      "type": "Microsoft.Network/networkInterfaces",
      "name": "[variables('NIC Name')]",
      "apiVersion": "2016-03-30",
      "location": "[resourceGroup().location]",
      "properties": {
        "ipConfigurations": [
          {
            "name": "ipconfig",
            "properties": {
              "privateIPAllocationMethod": "Dynamic",
              "subnet": {
                "id": "[concat(resourceId('Microsoft.Network/virtualNetworks', variables('VNet Name')), '/subnets/default')]"
              },
              "loadBalancerBackendAddressPools": [
                {
                  "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/backendAddressPools/', variables('Address Pool Name'))]"
                }
              ],
              "loadBalancerInboundNatRules": [
                {
                  "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/inboundNatRules/', variables('RDP NAT Rule Name'))]"
                }
              ]
            }
          }
        ],
        "dnsSettings": {
          "dnsServers": []
        },
        "enableIPForwarding": false,
        "networkSecurityGroup": {
          "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('VM NSG Name'))]"
        }
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Network/virtualNetworks', variables('VNet Name'))]",
        "[resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name'))]"
      ]
    },
    {
      "type": "Microsoft.Compute/virtualMachines",
      "name": "[variables('VM Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "properties": {
        "hardwareProfile": {
          "vmSize": "[parameters('VM Size')]"
        },
        "storageProfile": {
          "imageReference": {
            "publisher": "MicrosoftWindowsServer",
            "offer": "WindowsServer",
            "sku": "2012-R2-Datacenter",
            "version": "latest"
          },
          "osDisk": {
            "name": "[variables('VM Name')]",
            "createOption": "FromImage",
            "vhd": {
              "uri": "[concat('https', '://', parameters('Disk Storage Account Name'), '.blob.core.windows.net', concat('/', variables('Vhds Container Name'),'/', variables('VM Name'), '-os-disk.vhd'))]"
            },
            "caching": "ReadWrite"
          },
          "dataDisks": []
        },
        "osProfile": {
          "computerName": "[variables('VM Name')]",
          "adminUsername": "[parameters('VM Admin User Name')]",
          "windowsConfiguration": {
            "provisionVMAgent": true,
            "enableAutomaticUpdates": true
          },
          "secrets": [],
          "adminPassword": "[parameters('VM Admin Password')]"
        },
        "networkProfile": {
          "networkInterfaces": [
            {
              "id": "[resourceId('Microsoft.Network/networkInterfaces', concat(variables('NIC Name')))]"
            }
          ]
        }
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', parameters('Disk Storage Account Name'))]",
        "[resourceId('Microsoft.Network/networkInterfaces', variables('NIC Name'))]"
      ]
    },
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[parameters('Disk Storage Account Name')]",
      "sku": {
        "name": "Premium_LRS",
        "tier": "Premium"
      },
      "kind": "Storage",
      "apiVersion": "2016-01-01",
      "location": "[resourceGroup().location]",
      "properties": {},
      "resources": [],
      "dependsOn": []
    },
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[parameters('Log Storage Account Name')]",
      "sku": {
        "name": "Standard_LRS",
        "tier": "standard"
      },
      "kind": "Storage",
      "apiVersion": "2016-01-01",
      "location": "[resourceGroup().location]",
      "properties": {},
      "resources": [],
      "dependsOn": []
    }
  ]
}

The sample has one VM sitting in a subnet protected by a NSG.  The VM’s NIC is also protected by NSG, to make our life complicated (as we do too often).  The VM is exposed on a Load Balanced Public IP and RDP is enabled via NAT rules on the Load Balancer.

The VM is running on a Premium Storage account but the sample also creates a standard storage account to store the logs.

The Problem

The problem we are going to try to find using Diagnostic Logs is that the subnet’s NSG let RDP in via “Allow-RDP-From-Everywhere” rule while the NIC’s doesn’t and that type of traffic will get blocked, as everything else, by the “Disallow-everything-else-Inbound” rule.

In practice, you’ll likely have something more complicated going on, maybe some IP filtering, etc.  .   But the principles remain.

Enabling Diagnostic Logs

I couldn’t enable the Diagnostic Logs via the ARM template as it isn’t possible to do so yet.  We can do that via the Portal or PowerShell.

I’ll illustrate the Portal here, since it’s for troubleshooting, chances are you won’t automate it.

I’ve covered Azure Monitor in a previous article.  We’ve seen that different providers expose different schemas.

NSGs expose two categories of Diagnostic LogsEvent and Rule Counter.  We’re going to use Rule Counter only.

Rule Counter will give us a count of how many times a given rule was triggered for a given target (MAC address / IP).  Again, if we have lots of traffic flying around, that won’t be super useful.  This is why I recommend to isolate the network (or recreate an isolated one) in order to troubleshoot.

We’ll start by the subnet NSG.

image

Scrolling all the way down on the NSG’s pane left menu, we select Diagnostics Logs.

image

The pane should look as follow since no diagnostics are enabled.  Let’s click on Turn on diagnostics.

image

We then turn it on.

image

For simplicity here, we’re going to use the Archive to a storage account.

image

We will configure the storage account to send the logs to.

image

For that, we’re selecting the standard account created by the template or whichever storage account you fancy.  Log Diagnostics will go and create a blob container for each category in the selected account.  The names a predefined (you can’t choose).

We select the NetworkSecurityGroupRuleCounter category.

image

And finally we hit the save button on the pane.

We’ll do the same thing with the VM NSG.

image

Creating logs

No we are going to try to get through our VM.  We are going to describe how to that with the sample I gave but if you are troubleshooting something, just try the faulty connection.

We’re going to try to RDP to the public IP.  First we need the public IP domain name.  So in the resource group:

image

At the top of the pane we’ll find the DNS name that we can copy.

image

We can then paste it in an RDP window.

image

Trying to connect should fail and it should leave traces in the logs for us to analyse.

Analysis

We’ll have to wait 5-10 minutes for the logs to get in the storage account as this is done asynchronously.

Actually, a way to make sure to get clean logs is to delete the blob container and then try the RDP connection.  The blob container should reappear after 5-10 minutes.

To get the logs in the storage account we need some tool.  I use Microsoft Azure Storage Explorer.

image

The blob container is called insights-logs-networksecuritygrouprulecounter.

The logs are hidden inside a complicated hierarchy allowing us to send all our diagnostic logs from all our NSGs over time there.

Basically, resourceId%3D / SUBSCRIPTIONS / <Your subscription ID> / RESOURCEGROUPS / NSG / PROVIDERS / MICROSOFT.NETWORK / NETWORKSECURITYGROUPS / we’ll see two folders:  SUBNETNSG & VMNSG.  Those are our two NSGs.

If we dig under those two folders, we should find one file (or more if you’ve waited for a while).

Let’s copy those file with appropriate naming somewhere to analyse them.

Preferably, use a viewer / editor that understands JSON (I use Visual Studio).  If you use notepad…  you’re going to have fun.

If we look at the subnet NSG logs first and search for “RDP”, we’ll find this entry:

    {
      "time": "2017-01-09T11:46:44.9090000Z",
      "systemId": "...",
      "category": "NetworkSecurityGroupRuleCounter",
      "resourceId": ".../RESOURCEGROUPS/NSG/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/SUBNETNSG",
      "operationName": "NetworkSecurityGroupCounters",
      "properties": {
        "vnetResourceGuid": "{50C7B76A-4B8F-481A-8029-73569E5C7D87}",
        "subnetPrefix": "10.0.1.0/24",
        "macAddress": "00-0D-3A-00-B6-B5",
        "primaryIPv4Address": "10.0.1.4",
        "ruleName": "UserRule_Allow-RDP-From-Everywhere",
        "direction": "In",
        "type": "allow",
        "matchedConnections": 0
      }
    },

The most interesting part is the matchedConnections property, which is zero because we didn’t achieve connections.

If we look in the VM logs, we’ll find this:

    {
      "time": "2017-01-09T11:46:44.9110000Z",
      "systemId": "...",
      "category": "NetworkSecurityGroupRuleCounter",
      "resourceId": ".../RESOURCEGROUPS/NSG/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/VMNSG",
      "operationName": "NetworkSecurityGroupCounters",
      "properties": {
        "vnetResourceGuid": "{50C7B76A-4B8F-481A-8029-73569E5C7D87}",
        "subnetPrefix": "10.0.1.0/24",
        "macAddress": "00-0D-3A-00-B6-B5",
        "primaryIPv4Address": "10.0.1.4",
        "ruleName": "UserRule_Disallow-everything-else-Inbound",
        "direction": "In",
        "type": "block",
        "matchedConnections": 2
      }
    },

Where matchedConnections is 2 (because I tried twice).

So the logs tell us where the traffic when.

From here we could wonder why it hit that rule and look for a rule with a higher priority that allow RDP in, find none and conclude that’s our problem.

Trial & Error

If the logs are not helping you, the last resort is to modify the NSG until you understand what is going on.

A way to do this is to create a rule “allow everything in from anywhere”, give it maximum priority.

If traffic still doesn’t go in, you have another problem than NSG, so go back to previous steps.

If traffic goes in, good.  Move that allow-everything rule down until you find which rule is blocking you.  You may have a lot of rules, in which case I would recommend a dichotomic search algorithm:  put your allow-everything rule in the middle of your “real rules”, if traffic passes, move the rule to the middle of the bottom half, otherwise, the middle of the top half, and so on.  This way, you’ll only need log(N) steps where N is your number of rules.

Summary

Troubleshooting NSGs can be difficult but here I highlighted a basic methodology to find your way around.

Diagnostic Logs help to give us insight about what is really going on although it can be tricky to work with.

In general, as with every debugging experience just apply the principle of Sherlock Holmes:

Eliminate the impossible.  Whatever remains, however improbable, must be the truth.

In terms of debugging, that means remove all the noise, all the fat and then some meat, until what remains is so simply that the truth will hit you.

Azure SQL Elastic Pool – Moving databases across pools using PowerShell

hand-truck-564242_640[1]

I’ve written a bit about Azure SQL Elastic Pool lately:  an overview, about ARM template and about database size.

One of the many great features of Azure SQL Elastic Pool is that like Azure SQL Database (standalone), we can change the eDTU capacity of the pool “on the fly”, without downtime.

Unlike its standalone cousin though, we can’t change the edition of the pool.  The edition is either Basic, Standard or Premium.  It is set at creation and is immutable after that.

If we want to change the edition of a pool, the obvious way is to create another pool, move the databases there, delete the original, recreate it with a different edition and move the databases back.

This article shows how to do that using PowerShell.

You might want to move databases around for other reasons, typically to optimize the density and performance of pools.  You would then use a very similar script.

Look at the pool

Let’s start with the pools we established with the sample ARM template of a previous article.

From there we can look at the pool Pool-A using the following PowerShell command:


$old = Get-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Pool-A -ServerName pooldemoserver

$old

We can see the pool current edition is Standard while its Data Transaction Unit (DTU) count is 200.

image

Create a temporary pool

We’ll create a temporary pool, aptly named temporary, attached to the same server:


$temp = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Temporary -ServerName pooldemoserver -Edition $old.Edition -Dtu $old.Dtu

$temp

It’s important to create a pool that will allow the databases to be moved into.  The maximum size of a database is dependent of the edition and number of DTU of the elastic pool.  The easiest way is to create a pool with the same edition / DTU and this is what we do here by referencing the $old variable.

Move databases across

First, let’s grab the databases in the original pool:


$dbs = Get-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver | where {$_.ElasticPoolName -eq $old.ElasticPoolName}

$dbs | select DatabaseName

ElasticPoolName is a property of a database.  We’ll simply change it by setting each database:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $temp.ElasticPoolName}

That command takes longer to run as the databases have to move from one compute to another.

Delete / Recreate pool

We can now delete the original pool.  It’s important to note that we wouldn’t have been able to delete a pool with databases in it.


Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver

$new = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver -Edition Premium -Dtu 250

The second line recreates it with Premium edition.  We could keep the original DTU, but it’s not always possible since different editions support different DTU values.  In this case, for instance, it wasn’t possible since 200 DTUs isn’t supported for Premium pools.

If you execute those two commands without pausing in between, you will likely receive an error.  It is one of those cases where the Azure REST API returns and the resource you asked to be removed seems to be removed but you can’t really recreate it back yet.  An easy work around consist in pausing or retrying.

Move databases back

We can then move the databases back to the new pool:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $new.ElasticPoolName}

Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $temp.ElasticPoolName -ServerName pooldemoserver

In the second line we delete the temporary pool.  Again, this takes a little longer to execute since databases must be moved from one compute to another.

Summary

We showed how to move databases from a pool to another.

The pretext was a change in elastic pool edition but we might want to move databases around for other reasons.

In practice you might not want to move your databases twice to avoid the duration of the operation and might be happy to have a different pool name.  In the demo we did, the move took less than a minute because we had two empty databases.  With many databases totaling a lot of storage it would take much more time to move those.

Azure SQL Elastic Pool – Database Size

PlanetsI mentioned in a past article, regarding database sizes within an elastic pool:

“No policies limit an individual database to take more storage although a database maximum size can be set on a per-database basis.”

I’m going to focus on that in this article.

An Azure SQL Database resource has a MaxSizeInBytes property.  We can set it either in an ARM template (see this ARM template and the property maxSizeBytes) or in PowerShell.

An interesting aspect of that property is that:

  • It takes only specific values
  • Not all values are permitted, depending on the elastic pool edition (i.e. Basic, Standard or Premium)

Valid values

One way to find the valid values is to navigate to the ARM schema.  That documented schema likely is slightly out of date since, as of December 2016, the largest value is 500GB, which isn’t the largest possible database size (1 TB for a P15).

The online documentation of Set-AzureRmSqlDatabase isn’t fairing much better as the documentation for the MaxSizeBytes parameter refers to a parameter MaxSizeGB to know about the acceptable values.  Problem is, MaxSizeGB parameter doesn’t exist.

But let’s start with the documented schema as it probably only lacks the most recent DB sizes.

Using that schema list of possible values and comparing that with the stand alone database size for given editions, we can conclude (after testing with ARM templates of course), that a Basic pool can have databases up to 2GB, for Standard we have 250GB and of course Premium can take all values.

It is important to notice that the pool can have larger storage.  For instance, even the smallest basic pool, with 50 eDTUs, can have a maximum storage of 5 GB.  But each DB within that pool can only grow up to 2 GB.

That gives us the following landscape:

Maximum Size (in bytes) Maximum Size (in GB) Available for (edition)
104857600 0.1 Premium, Standard, Basic
524288000 0.5 Premium, Standard, Basic
1073741824 1 Premium, Standard, Basic
2147483648 2 Premium, Standard, Basic
5368709120 5 Premium, Standard
10737418240 10 Premium, Standard
21474836480 20 Premium, Standard
32212254720 30 Premium, Standard
42949672960 40 Premium, Standard
53687091200 50 Premium, Standard
107374182400 100 Premium, Standard
161061273600 150 Premium, Standard
214748364800 200 Premium, Standard
268435456000 250 Premium, Standard
322122547200 300 Premium
429496729600 400 Premium
536870912000 500 Premium

Storage Policies

We can now use this maximum database as a storage policy, i.e. a way to make sure a single database doesn’t take all the storage available in a pool.

Now, this isn’t as trivially useful as the eDTUs min / max we’ve seen in a pool.  In the eDTU case, that was controlling how much compute was given to a database at all time.  In the case of a database maximum size, once the database reaches that size, it becomes read only.  That will likely break our applications running on top of it unless we planned for it.

A better approach would be to monitor the different databases and react to size changes, by moving the database to other pool for instance.

The maximum size could be a safeguard though.  For instance, let’s imagine we want each database in a pool to stay below 50 GB and we’ll monitor for that and raise alerts in case that threshold is reached (see Azure Monitor for monitoring and alerts).  Now we might still put a maximum size for the databases of 100 GB.  This would act as a safeguard:  if we do not do anything about a database outgrowing its target 50GB, it won’t be able to grow indefinitely, which could top the pool maximum size and make the entire pool read only, affecting ALL the databases in the pool.

In that sense the maximum size still act as a resource governor, preventing noisy neighbour effect.

PowerShell example

We can’t change a database maximum size in the portal (as of December 2016).

Using ARM template, it is easy to change the parameter.  Here, let’s simply show how we would change it for an existing database.

Building on the example we gave in a previous article, we can easily grab the Pool-A-Db0 database in resource group DBs and server pooldemoserver:


Get-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0

image

We can see the size is the one that was specified in the ARM template (ARM parameter DB Max Size default value), i.e. 10 GB.  We can bump it to 50 GB, i.e. 53687091200 bytes:


Set-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0 -MaxSizeBytes 53687091200

We can confirm the change in the portal by looking at the properties.

image

Default Behaviour

If the MaxSizeByte property is omitted, either in an ARM Template or a new-AzureRmSqlDatabase PowerShell cmdlet, the default behaviour is for the database to have the maximum capacity (e.g. for Standard, 250 GB).

After creation, we can’t set the property value to null to obtain the same effect.  Omitting the parameter simply keep to previously set value.

Summary

We’ve looked at the maximum size property of a database.

It can be used to control the growth of a database inside a pool and prevent a database growth to affect others.

Azure SQL Elastic Pool – ARM Templates

coil-632650_640[1]In my last article, I covered Azure SQL Elastic Pool.  In this one I cover how to provision it using ARM templates.

As of today (December 2016), the documentation about Azure SQL Elastic Pool provisioning via ARM templates is…  not existing.

Searching for it I was able to gather hints via a few colleagues GitHub repos, but there are no examples in the ARM quickstart templates nor is the elastic pool resource schema documented.  Also, the Automation Script feature in the portal doesn’t reverse engineer an ARM template for the elastic pool.

So I hope this article fills that gap and is easy to search for & consume.

ARM Template

Here we’re going to provision a Server with two pools, Pool-A & Pool-B (yeah, sounds a bit like Thing 1 & Thing 2), each having a few (configurable number of) databases in them.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "Server Name": {
      "defaultValue": "pooldemoserver",
      "type": "string",
      "metadata": {
        "description": "Name of the SQL:  needs to be unique among all servers in Azure"
      }
    },
    "Admin Login": {
      "defaultValue": "myadmin",
      "type": "string",
      "metadata": {
        "description": "SQL Server Admin login name"
      }
    },
    "Admin Password": {
      "type": "securestring",
      "metadata": {
        "description": "SQL Server Admin login password"
      }
    },
    "Pool A Edition": {
      "defaultValue": "Standard",
      "type": "string",
      "allowedValues": [
        "Basic",
        "Standard",
        "Premium"
      ],
      "metadata": {
        "description": "Pool A Edition"
      }
    },
    "Pool B Edition": {
      "defaultValue": "Standard",
      "type": "string",
      "allowedValues": [
        "Basic",
        "Standard",
        "Premium"
      ],
      "metadata": {
        "description": "Pool B Edition"
      }
    },
    "DB Max Size": {
      "defaultValue": "10737418240",
      "type": "string",
      "allowedValues": [
        "104857600",
        "524288000",
        "1073741824",
        "2147483648",
        "5368709120",
        "10737418240",
        "21474836480",
        "32212254720",
        "42949672960",
        "53687091200",
        "107374182400",
        "161061273600",
        "214748364800",
        "268435456000",
        "322122547200",
        "429496729600",
        "536870912000"
      ],
      "metadata": {
        "description": "DB Max Size, in bytes"
      }
    }
  },
  "variables": {
    "Pool A": "Pool-A",
    "Pool B": "Pool-B",
    "DB A Prefix": "Pool-A-Db",
    "DB B Prefix": "Pool-B-Db",
    "Count A": 2,
    "Count B": 4
  },
  "resources": [
    {
      "name": "[parameters('Server Name')]",
      "type": "Microsoft.Sql/servers",
      "apiVersion": "2014-04-01-preview",
      "location": "[resourceGroup().location]",
      "dependsOn": [],
      "properties": {
        "administratorLogin": "[parameters('Admin Login')]",
        "administratorLoginPassword": "[parameters('Admin Password')]",
        "version": "12.0"
      },
      "resources": [
        {
          "type": "firewallRules",
          "kind": "v12.0",
          "name": "AllowAllAzureIps",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "startIpAddress": "0.0.0.0",
            "endIpAddress": "0.0.0.0"
          }
        },
        {
          "type": "elasticpools",
          "name": "[variables('Pool A')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool A Edition')]",
            "dtu": "200",
            "databaseDtuMin": "10",
            "databaseDtuMax": "50"
          }
        },
        {
          "type": "elasticpools",
          "name": "[variables('Pool B')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool B Edition')]",
            "dtu": "400",
            "databaseDtuMin": "0",
            "databaseDtuMax": null
          }
        }
      ]
    },
    {
      "type": "Microsoft.Sql/servers/databases",
      "copy": {
        "name": "DBs-A",
        "count": "[variables('Count A')]"
      },
      "name": "[concat(parameters('Server Name'), '/', variables('DB A Prefix'), copyIndex())]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]",
        "[resourceId('Microsoft.Sql/servers/elasticpools', parameters('Server Name'), variables('Pool A'))]"
      ],
      "tags": {
        "displayName": "Pool-A DBs"
      },
      "apiVersion": "2014-04-01-preview",
      "properties": {
        "collation": "SQL_Latin1_General_CP1_CI_AS",
        "maxSizeBytes": "[parameters('DB Max Size')]",
        "requestedServiceObjectiveName": "ElasticPool",
        "elasticPoolName": "[variables('Pool A')]"
      }
    },
    {
      "type": "Microsoft.Sql/servers/databases",
      "copy": {
        "name": "DBs-B",
        "count": "[variables('Count B')]"
      },
      "name": "[concat(parameters('Server Name'), '/', variables('DB B Prefix'), copyIndex())]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]",
        "[resourceId('Microsoft.Sql/servers/elasticpools', parameters('Server Name'), variables('Pool B'))]"
      ],
      "tags": {
        "displayName": "Pool-B DBs"
      },
      "apiVersion": "2014-04-01-preview",
      "properties": {
        "edition": "[parameters('Pool B Edition')]",
        "collation": "SQL_Latin1_General_CP1_CI_AS",
        "maxSizeBytes": "[parameters('DB Max Size')]",
        "requestedServiceObjectiveName": "ElasticPool",
        "elasticPoolName": "[variables('Pool B')]"
      }
    }
  ]
}

We can deploy the template as is.  We’ll need to enter at least an Admin password (for the Azure SQL server).

The “Server Name” parameter must be unique throughout Azure (not just your subscription).  So if it happens to be taken when you try to deploy the template (in which case you would receive an error message around Server ‘pooldemoserver’ is busy with another operation), try a new, more original name.

Each parameter is documented in the metadata description.

Results

Let’s look at the result.  Let’s first go in the resource group where we deployed the template.

In the resource list we should see the following:

image

We first have our server, with default name pooldemoserver, our two pools, Pool-A & Pool-B, and 6 databases.

Let’s select Pool-A.

image

We can see the pool is of Standard edition, has 200 eDTUs with a minimum of 10 eDTUs and maximum 50 per databases, which is faithful to its ARM definition (line 10-13).

        {
          "type": "elasticpools",
          "name": "[variables('Pool A')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool A Edition')]",
            "dtu": "200",
            "databaseDtuMin": "10",
            "databaseDtuMax": "50"
          }
        }

Similarly, Pool-B has a minimum of 0 and a maximum of 100.  The maximum was set to null in the template and hence is the maximum allowed for a standard pool of 400 DTUs.

Let’s select the databases in Pool-B.  Alternatively, we can select the Configure pool tool bar option.

image

The following pane shows us the eDTUs consumed in the last 14 days.  It also allows us to change the assigned eDTUs to the pool.

It is in this pane that we can add / remove databases from the pool.

image

In order to remove databases from the pool, they must first be selected in the lower right pane corner.  We will have to chose a standalone pricing tier for each DB and hit save.  As of today (December 2016), there are no way to move databases from one pool to another directly, i.e. they must first be converted as a stand alone.  It is possible to move databases from a pool to another using PowerShell as I’ll demonstrate in a future article though.

If we go back to the resource group and select any of the database, we have a link to its parent pool.

image

Summary

Despite the current lack (as of December 2016) of documentation around it, it is quite possible to create databases within an elastic pool using ARM templates as we’ve demonstrated here.

Azure SQL Elastic Pool Overview

What is rubber-bands-1158199_640[1] Azure SQL Elastic Pool and what it brings to Azure SQL Database, the SQL Azure Platform as a Service (PaaS).

Traditional model

Let’s look at how Azure SQL works without elastic pools first.

imageAzure SQL Database comes with an Azure SQL Server.  This shouldn’t be confused with SQL Server installed on a VM:  it is a logical server holding everything that doesn’t belong to a database.  This model makes it compatible with SQL server “on premise”.

The important point here is that the compute sits with the database and not the server.  The edition (i.e. Basic, Standard & Premium) & Pricing Tier / DTUs are set at the database level, not the server level.  Actually, the server doesn’t even have a cost associated to it.

In some ways, this is the opposite to what SQL Server on premise got us used to.  On premise, we have a server sitting on an OS and the databases construct on top of it borrowing computes from the Server.  In Azure the compute sits at the database level  while the server is this pseudo centralized thing with no compute associated to it.

In that sense, Azure SQL DB has a much better isolation model out of the box although you can now do the same thing with SQL Server on premise using the Resource Governor.

Elastic Pool Conceptual model

imageAlong came Elastic Pool.  Interestingly, Elastic Pools brought back the notion of a centralized compute shared across databases.  Unlike on premise SQL Server on premise though, that compute doesn’t sit with the server itself but with a new resource called an elastic pool.

This allows us to provision certain compute, i.e. DTUs, to a pool and share it across many databases.

A typical scenario where that is beneficial is a lot of small databases which tend to be cost prohibitive with the traditional model.

That makes it an excellent solution for ISV / SaaS providers where different tenants have different spikes.

See this article for the different scenarios where elastic pools apply.

We could have “hybrid” scenarios where a server have  “traditional databases” with their own pricing tier and databases attached to a pool.

DTU policy

The pool can define a policy regarding the minimum and maximum DTUs per database.  This allows for each database

  • to have a minimum amount of compute dedicated to it, avoiding compute starvation
  • to have a maximum amount of compute, avoiding the noisy neighbour effect avoiding that one database starves all the others

Storage

On the other hand, a pool has a maximum storage size shared across the pool.

No policies limit an individual database to take more storage although a database maximum size can be set on a per-database basis.

It is important to note that once the maximum pool size has been reached by the sum of the databases’ size, all databases become read only.

Limits

I often find it useful to look at the different limits and quotas of Azure services to understand the structure of a service and inform design decisions.

http://aka.ms/azurelimits should never be too far in your links.

Looking at Azure SQL databases limits, we find those interesting facts:

  • Maximum number of databases per pool vary depending on the pool size, e.g. a Standard 100 DTUs can have 200 databases
  • A server can have up to 5000 databases associated to it
  • A server can have up to 45000 DTUs associated to it, either via elastic pools, databases directly or even Azure Data Warehouses
  • There is no documented limit on the number of pools per server
  • The server, its pools & databases must be in the same Azure region under the same subscription

Let’s look at a few design questions now.

Why use more than one pool?

Why not using a pool with a huge number of DTUs?

  • Ultimately a pool cannot be of infinite size (4000 DTUs / 750 GB for Premium, 3000 DTUs / 2.9 TB for Standard) so we’ll use multiple pools to scale
  • Policies, i.e. min / max DTU are setup at the pool level ; if we have a bunch of tiny DBs with little transactions on & a group of middle sized DBs with more traffic on them, we might want to have multiple pools with different policies to handle those

Should we have one server per pool or multiple pools per server?

An Azure SQL Server does very little:

  • Holds an Admin account for the entire server
  • Holds the pools & databases
  • Exists in a region

Obviously, multiple regions, multiple servers.

Why would we choose multiple servers over one server multiple pools?  Security:  if we want to segregate access to different databases at the administration level, we wouldn’t want to share one admin account for all.

A lot can be argued around that point, e.g. we could have one admin account for every DBs but different admins per DB for instance.  In compliance scenario, I could see this playing out, e.g. dev vs prod, banking vs federal customers, etc.  .

Why use a Premium elastic pool?

Standard pools have bigger storage and comparable parallelism specs, so why go Premium and pay a…  Premium?

The main spec where Premium shines is for min / max DTUs per DB:  Premium allows us to have bigger databases within a pool while Standard is geared to have smaller DBs.

More concretely, standard pools allow to have up to 100 DTUs per database while in Premium, it goes up to 4000.

As a comparison, 100 DTUs is equivalent to a standalone S3 database.

Summary

We did look at Azure SQL Database Elastic Pool feature.

Elastic Pool really is an economic feature as it’s a way to increase the number of databases ran on the same compute and hence reducing the cost.

In scenarios where we have lots of small databases, it can drastically reduce costs.

In a future post, I’ll cover how to provision an Elastic pool using ARM template.

Moving existing workloads to Azure

From https://www.pexels.com/

Applications born in the cloud can take full advantage of the cloud and the agility it brings.

But there are a lot of existing solutions out there that weren’t born in the cloud.

In this article I want to sketch a very high level approach on how to proceed about taking an existing on premise solution and move it to Azure.

Let’s first talk about pure Lift & Shift.  Lift & Shift refers to the approach of taking on premise workloads and deploying them as-is in Azure.

Despite its popularity, it receives a fair bit of bad press because performing a lift and shift doesn’t give you most of the advantage of the cloud, mainly the agility.

I agree with the assessment since a lift and shift basically brings you to the cloud with a pre-cloud paradigm.  That being said, I wouldn’t discard that approach wholesale.

For many organizations, it is one of the many paths to get to the cloud.  Do you move to the cloud and then modernize or modernize in order to move to the cloud?  It’s really up to you and each organization have different constraints.

It often makes sense especially for dev & test workloads.  Dev + Test usually:

  • Do not run 24 / 7
  • Do not have High Availability requirements
  • Do not have sensitive data ; unless you bring back your production data, without trimming the sensitive data, for your dev to fiddle with, in which case sensitive data probably isn’t a concern to you

The first point means potential huge economy.  Azure tends to be cheaper than on premise solutions but if you only run it part time, it definitely is cheaper.

The last two points make Dev + Test workloads easier to move, i.e. there are less friction along the way.

Where I would be cautious is to make sure you do not need to do a lot of costly transformations in order to purely do a lift and shift ; if that’s the case I would consider modernizing first, otherwise there won’t be budget in the bucket for the modernization later.

Address blockers

red-building-industry-bricks[1]Will it run on Azure?  Most x86 stuff that run on a VM will run in Azure, but not all.  Typically this boils down to unsupported network protocols and shared disks.  Azure supports most IP protocols, except Generic Routing Encapsulation (GRE), IP in IP & multicast ; User Datagram Protocol is supported but not with multicast.  Shared disks are not supported in Azure:  every disk belong to one-and-only-one-VM.  Shared drive can be mounted via Azure File Storage, but for application requiring a disk accessible by multiple VMs, that isn’t supported.  This often is the case with Quorum disk-based HA solutions, e.g. Oracle RAC.

If you hit one of those walls, the question you have to ask yourself is are there any mitigation?  This will vary greatly depending on your solution and the blockers you face.

Does it provide suitable High Availability (HA) feature support?  A lot of on premise solution relies on hardware for high availability, while cloud-based solutions rely on software, typically by having a cluster of identical workload fronted by a load balancer.  In Azure, this is less of a blocker as it use to be, thanks to the new single VM SLA, which isn’t a full fledged HA solution but at least provide a SLA.

Will it be supported in Azure?  You can run it, now will you get support if you have problems?  This goes for both Microsoft support and other vendors support.  Some vendors won’t support you in the cloud, although the list of such vendors is shrinking everyday.  A good example of support is Windows Server 2003 in Azure:  it isn’t supported out-of-the-box, although it will work.  You do need a Custom Support Agreement (CSA) with Microsoft since Windows Server 2003 is no longer a supported product.

If not, does it matter and / or will ISV work with you?  If you aren’t supported, it isn’t always the end of the road.  It might not matter for dev-test workloads.  Also, most ISVs are typically willing to work with you to make it possible.

Does it have a license that allow running in Azure?  Don’t forget the licenses!  Some vendors will have some funky licensing schemes for solution running in the cloud.  One question I get all the time is about Oracle, so here is the answer:  yes Oracle can be licensed under Azure and no you don’t have to pay for all the cores of the physical server you’re running on ; read about it here.

Address limitations

fence-1809742_640[1]

Time Window to transition should drive your strategy.  This might sound obvious but often people do not know where to start, so start with your destination:  when do you want to be done?

Authentication mechanism (Internal vs External).  Do you need to bring your Domain Controllers over or can you use Azure AD?  Do you have another authentication mechanism that isn’t easy to migrate?

VM Requirements:  cores, RAM, Disk, IOPS, bandwidth.  Basically, do a sizing assessment and make sure you won’t face limitations in Azure.  The number of VM skus has grown significantly in the last year but I still see on premise workloads with “odd configuration” which are hard to migrate to Azure economically.  For instance, a VM with 64 Gb of RAM and only one core will need to be migrated to a VM with many cores and the price might not be compelling.  Disks are limited to 1Tb in Azure (as of this writing, December 2016), but you can stripe many disks to create an OS volume.  That being said, different VM skus have different number of disk limits.

Latency requirements (e.g. web-data tier).  Basically, no, if you put your front end in East US and the back-end in South India, latency won’t be great.  But in general if you have a low latency requirement, make sure you can attain it with the right solution in Azure.

Solution SLA.  Azure offers great SLAs but if you have a very aggressive SLAs, in the 4-5 nines, you’ll need to push the envelope in Azure which will affect the cost.

Recovery Time Objective (RTO) & Recovery Point Objective (RPO).  Again, this will influence your solution which will influence the cost.

Backup strategy / DR.  Similar to the previous point:  make sure you architect your solution accordingly.

Compliance standards.  Different services have different compliances.  See this for details.

Basically, for most of those points, the idea is to consider the point and architect the solution to address it.  This will alter the cost.  For instance, if you put 2 instances instead  of 1, you’re gona pay for twice the compute.

Make it great

london-140785_640[1]

We have our solution.  We worked through the blockers & limitations, now let’s take it to the next level.

Storage:  check out Microsoft Azure Storage Performance and Scalability Checklist.

Scalability:  consult the best practices on scalability.

Availability:  make sure you’ve been through the availability checklist & the high availability checklist.

Express Route:  define your connectivity strategy & consider Express Route prerequisites

Guidance:  in general, consult the Patterns & Practices guidance.

Get it going

As with every initiative involving change, the temptation is to do a heavy analysis before migrating a single app.  People want to get the networking right, the backup strategy, the DR, etc.  .  This is how they do it on premise when creating a data center so this is how they want to do it in Azure.

For many reasons, this approach isn’t optimal in Azure:

  • The constraints aren’t the same in Azure
  • People often have little knowledge of Azure or the cloud in general and therefore spin their wheels for quite a while looking for issues while being blind to the issues that will cause them problem (usual unknown-unknowns problem)
  • The main advantage of the cloud is agility:  long up-front analysis in order to attain agility is the straightest line between the two points

This is why I always give the same advise:  start now, start small, start on something low-risk.  If you migrate 30 solutions and realize that you bust a limit of Virtual Network and have to rebuild it one week-end, that’s expensive.  But if you migrate a solution, experiment, realize that the way you laid out the Network won’t scale to 30, you tear it down and rebuild it:  this will be much cheaper.

imageI’m not advocating to migrate all your environments freestyle in a cowboy manner, quite the opposite:  experiment with something real and low-risk and build from there.  You will learn from the experiment and move forward instead of experimenting in vacuum.  As you migrate more and more workloads, you’ll gain experience and expertise.  You’ll probably start with dev-test and in time you’ll feel confident to move to production workloads.

Look at your application park and try to take a few solutions with little dependencies, so you can move them without carrying your entire park with it.

The diagram I’ve put here might look a bit simplistic.  To get there you’ll probably have to do a few transformations.  For instance, you might want to consider replicating your domain controllers to replica in Azure to break that dependency.  There might be a system everything depend on in a light way ; could your sample solutions access it through a VPN connection?

Summary

imageI tried to summarize the general guidelines we give to customers when considering migration.

This is no one X steps plan, but a bunch of considerations to remove risk from the endeavor.

Cloud brings agility and agility should be your end goal.  The recipe for agility is simple:  small bites, quick turn around, feedback, repeat.  This should be your guideline.