Automating Azure AD

https://pixabay.com/en/machine-factory-automation-1651014/

In the previous article, we explored how to interact (read / write) to an Azure AD tenant using Microsoft Graph API.

In the article before that, we looked at how to authenticate a user without using Azure AD web flow.

Those were motivated by a specific scenario:  replacing a LDAP server by Azure AD while migrating a SaaS application to Azure.

Now a SaaS application will typically have multiple tenants or instances.  Configuring Azure AD by hand, like any other Azure service, can be tedious and error prone.  Furthermore, once we’ve onboarded say 20 tenants, making a configuration change will be even more tedious and error prone.

This is why we’ll look at automation in this article.

We’ll look at how to automate the creation of Azure AD applications we used in the last two articles.  From there it’s pretty easy to generalize (aka exercise to the reader!).

Azure AD Tenant creation

From the get go, bad news, we can’t create the tenant by automation.

No API is exposed for that, we need to go through the portal.

Sorry.

Which PowerShell commands to use?

Automating Azure AD is a little confusing.  Too many options is like not enough.

The first approach should probably be to use the Azure PowerShell package like the rest of Azure services.  For instance, to create an application, we would use New-AzureRmADApplication.

The problem with that package for our scenario is that the Azure AD tenant isn’t attached to a subscription. This is typical for a SaaS model:  we have an Azure AD tenant to manage internal users on all subscriptions and then different tenants to manage external users.  Unfortunately, at the time of this writing, the Azure PowerShell package is tied around the Add-AzureRmAccount command to authenticate the user ; that command binds a subscription (or via the Select-AzureRmSubscription).  But in our case we do not have a subscription:  our Azure AD tenant isn’t managing a subscription.

The second approach would then be to use the MSOnline Module.  That’s centered around Azure AD, but it is slowly being deprecated for…

The third approach, Azure Active Directory V2 PowerShell module.  This is what we’re going to use.

I want to give a big shout to Chris Dituri for tapping the trail here.  His article Azure Active Directory: Creating Applications and SPNs with Powershell was instrumental to write this article.  As we’ll see, there are bunch of intricacies about the application permissions that aren’t documented and that Chris unraveled.

The first thing we’ll need to do is to install the PowerShell package.  Easy:

Install-Module AzureADPreview

If you read this from the future, this might have changed, so check out the documentation page for install instructions.

Connect-AzureAD

We need to connect to our tenant:

connect-azuread -TenantId bc7d0032…

You can see the documentation on the Connect-AzureAD command here.

Where do we take our tenant ID?

image

Now we can go and create applications.

Service-Client

Here we’ll replicate the applications we built by hand in the Authenticating to Azure AD non-interactively article.

Remember, those are two applications, a service and a client one.  The client one has permission to access the service one & let users sign in to it.  As we’ll see, giving those permissions are a little tricky.

Let’s start by the final PowerShell code:

#  Grab the Azure AD Service principal
$aad = (Get-AzureADServicePrincipal | `
    where {$_.ServicePrincipalNames.Contains("https://graph.windows.net")})[0]
#  Grab the User.Read permission
$userRead = $aad.Oauth2Permissions | ? {$_.Value -eq "User.Read"}

#  Resource Access User.Read + Sign in
$readUserAccess = [Microsoft.Open.AzureAD.Model.RequiredResourceAccess]@{
  ResourceAppId=$aad.AppId ;
  ResourceAccess=[Microsoft.Open.AzureAD.Model.ResourceAccess]@{
    Id = $userRead.Id ;
    Type = "Scope"}}

#  Create Service App
$svc = New-AzureADApplication -DisplayName "MyLegacyService" `
    -IdentifierUris "uri://mylegacyservice.com"
# Associate a Service Principal to the service Application 
$spSvc = New-AzureADServicePrincipal -AppId $svc.AppId
#  Grab the user-impersonation permission
$svcUserImpersonation = $spSvc.Oauth2Permissions | `
    ?{$_.Value -eq "user_impersonation"}
 
#  Resource Access 'Access' a service
$accessAccess = [Microsoft.Open.AzureAD.Model.RequiredResourceAccess]@{
  ResourceAppId=$svc.AppId ;
  ResourceAccess=[Microsoft.Open.AzureAD.Model.ResourceAccess]@{
    Id = $svcUserImpersonation.Id ;
    Type = "Scope"}}
#  Create Required Access 
$client = New-AzureADApplication -DisplayName "MyLegacyClient" `
  -PublicClient $true `
  -RequiredResourceAccess $readUserAccess, $accessAccess

As promised, there is ample amount of ceremony.  Let’s go through it.

  • Line 1:  we grab the service principal that has a https://graph.windows.net for a name ; you can check all the service principals living in your tenant with Get-AzureADServicePrincipal ; I have 15 with a clean tenant.  We’ll need the Graph one since we need to give access to it.
  • Line 5:  we grab the specific user read permission inside that service principal’s Oauth2Permissions collection.  Basically, service principals expose the permission that other apps can get with them.  We’re going to need the ID.  Lots of GUIDs in Azure AD.
  • Line 8:  we then construct a user-read RequiredResourceAccess object with that permission
  • Line 15:  we create our Legacy service app
  • Line 16:  we associate a service principal to that app
  • Line 20:  we grab the user impersonation permission of that service principal.  Same mechanism we used for the Graph API, just a different permission.
  • Line 24:  we build another RequiredResourceAccess object around that user impersonation permission.
  • Line 30:  we create our Legacy client app ; we attach both the user-read & user impersonation permission to it.

Grant Permissions

If we try to run the authentication piece of code we had in the article, we’ll first need to change the “clientID” value for $client.AppId (and make sure serviceUri has the value of “uri://mylegacyservice.com”).

Now if we run that, we’ll get an error along the line of

The user or administrator has not consented to use the application with ID ‘…’. Send an interactive authorization request for this user and resource.

What is that?

There is one manual step we have to take, that is to grant the permissions to the application.  In Azure AD, this must be performed by an admin and there are no API exposed for it.

We could sort of automate it with code via an authentication workflow (which is what the error message is suggesting to do), which I won’t do here.

Basically, an administrator (of the Azure AD tenant) needs to approve the use of the app.

We can also do it, still manually, via the portal as we did in the article.  But first, let’s throw the following command:

Get-AzureADOAuth2PermissionGrant

On an empty tenant, there should be nothing returned.  Unfortunately, there are no Add/New-AzureADOAuth2PermissionGrant at the time of this writing (this might have changed if you are from the future so make sure you check out the available commands).

So the manual step is, in the portal, to go in the MyLegacyClient App, select Required Permissions then click the Grant Permissions button.

image

Once we’ve done this we can run the same PowerShell command, i.e.

Get-AzureADOAuth2PermissionGrant

and have two entries now.

image

We see the two permissions we attached to MyLegacyClient.

We should now be able to run the authentication code.

Graph API App

Here we’ll replicate the application we created by hand in the Using Microsoft Graph API to interact with Azure AD article.

This is going to be quite similar, except we’re going to attach a client secret on the application so that we can authenticate against it.

#  Grab the Azure AD Service principal
$aad = (Get-AzureADServicePrincipal | `
    where {$_.ServicePrincipalNames.Contains("https://graph.windows.net")})[0]
#  Grab the User.Read permission
$userRead = $aad.Oauth2Permissions | ? {$_.Value -eq "User.Read"}
#  Grab the Directory.ReadWrite.All permission
$directoryWrite = $aad.Oauth2Permissions | `
  ? {$_.Value -eq "Directory.ReadWrite.All"}


#  Resource Access User.Read + Sign in & Directory.ReadWrite.All
$readWriteAccess = [Microsoft.Open.AzureAD.Model.RequiredResourceAccess]@{
  ResourceAppId=$aad.AppId ;
  ResourceAccess=[Microsoft.Open.AzureAD.Model.ResourceAccess]@{
    Id = $userRead.Id ;
    Type = "Scope"}, [Microsoft.Open.AzureAD.Model.ResourceAccess]@{
    Id = $directoryWrite.Id ;
    Type = "Role"}}

#  Create querying App
$queryApp = New-AzureADApplication -DisplayName "QueryingApp" `
    -IdentifierUris "uri://myqueryingapp.com" `
    -RequiredResourceAccess $readWriteAccess

#  Associate a Service Principal so it can login
$spQuery = New-AzureADServicePrincipal -AppId $queryApp.AppId

#  Create a key credential for the app valid from now
#  (-1 day, to accomodate client / service time difference)
#  till three months from now
$startDate = (Get-Date).AddDays(-1)
$endDate = $startDate.AddMonths(3)

$pwd = New-AzureADApplicationPasswordCredential -ObjectId $queryApp.ObjectId `
  -StartDate $startDate -EndDate $endDate `
  -CustomKeyIdentifier "MyCredentials"

You need to “grant permissions” for the new application before trying to authenticate against it.

Two big remarks on tiny bugs ; they might be fixed by the time you read this and they aren’t critical as they both have easy work around:

  1. The last command in the script, i.e. the password section, will fail with a “stream property was found in a JSON Light request payload. Stream properties are only supported in responses” if you execute the entire script in one go.  If you execute it separately, it doesn’t.  Beat me.
  2. This one took me 3 hours to realize, so use my wisdom:  DO NOT TRY TO AUTHENTICATE THE APP BEFORE GRANTING PERMISSIONS.  There seems to be some caching on the authentication service so if you do try to authenticate when you don’t have the permissions, you’ll keep receiving the same claims after even if you did grant the permissions.  Annoying, but easy to avoid once you know it.

An interesting aspect of the automation is that we have a much more fine grained control on the duration of the password than in the Portal (1 year, 2 years, infinite).  That allows us to implement a more aggressive rotation of secrets.

Summary

Automation with Azure AD, as with other services, helps reduce the effort to provision and the human errors.

There are two big manual steps that can’t be automated in Azure AD:

  • Azure AD tenant creation
  • Granting permissions on an application

That might change in the future, but for now, that limits the amount of automation you can do with human interactions.

Using Microsoft Graph API to interact with Azure AD

In my last article, I showed how to authenticate on Azure AD using a user name / password without using the native web flow.

The underlying scenario was to migrate an application using an LDAP server by leveraging an Azure AD tenant.

The logical continuation of that scenario is to use the Microsoft Graph API to interact with the tenant the same way we would use LDAP queries to interact with the LDAP server.

Microsoft Graph API is a generalization of the Azure AD Graph API and should be used instead.  It consists of simple REST queries which are all documented.

In this scenario, I’ll consider three simple interactions:

  • Testing if a user exists
  • Returning the groups a user belong to
  • Creating a new user

But first we need to setup the Azure AD tenant.

Azure AD setup

We’re going to rely on the last article to do the heavy lifting.

We are going to create a new application (here we’re going to use the name “QueryingApp”) of type Web App / API (although native should probably work).

The important part is to grab its Application-ID & also to give it enough permission to create users.

image

In this scenario, the application is going to authenticate itself (as opposed to a user) so we need to define a secret.

image

We’ll need to add a key, save & copy the key value.

Authenticating with ADAL

This sample is in C# / .NET but since the Active Directory Authentication Library (ADAL) is available on multiple platform (e.g. Java), this should be easy to port.

We need to install the NuGet package Microsoft.IdentityModel.Clients.ActiveDirectory in our project.

        private static async Task<string> AppAuthenticationAsync()
        {
            //  Constants
            var tenant = "LdapVplDemo.onmicrosoft.com";
            var resource = "https://graph.microsoft.com/";
            var clientID = "9a9f5e70-5501-4e9c-bd00-d4114ebeb419";
            var secret = "Ou+KN1DYv8337hG8o8+qRZ1EPqBMWwER/zvgqvmEe74=";

            //  Ceremony
            var authority = $"https://login.microsoftonline.com/{tenant}";
            var authContext = new AuthenticationContext(authority);
            var credentials = new ClientCredential(clientID, secret);
            var authResult = await authContext.AcquireTokenAsync(resource, credentials);

            return authResult.AccessToken;
        }

Here the clientID is the application ID of the application we created at secret is the secret key we created for it.

Here we return the access token as we’re going to use them.

Authenticating with HTTP POST

If we do not want to integrate with the ADAL, here’s the bare bone HTTP post version:

        private static async Task<string> HttpAppAuthenticationAsync()
        {
            //  Constants
            var tenant = "LdapVplDemo.onmicrosoft.com";
            var clientID = "9a9f5e70-5501-4e9c-bd00-d4114ebeb419";
            var resource = "https://graph.microsoft.com/";
            var secret = "Ou+KN1DYv8337hG8o8+qRZ1EPqBMWwER/zvgqvmEe74=";

            using (var webClient = new WebClient())
            {
                var requestParameters = new NameValueCollection();

                requestParameters.Add("resource", resource);
                requestParameters.Add("client_id", clientID);
                requestParameters.Add("grant_type", "client_credentials");
                requestParameters.Add("client_secret", secret);

                var url = $"https://login.microsoftonline.com/{tenant}/oauth2/token";
                var responsebytes = await webClient.UploadValuesTaskAsync(url, "POST", requestParameters);
                var responsebody = Encoding.UTF8.GetString(responsebytes);
                var obj = JsonConvert.DeserializeObject<JObject>(responsebody);
                var token = obj["access_token"].Value<string>();

                return token;
            }
        }

Here I use the popular Newtonsoft Json Nuget Package to handle JSON.

By no mean is the code here a master piece of robustness and style.  It is meant to be straightforward and easy to understand.  By all means, improve it %500 before calling it production ready!

Testing if a user exists

Here we’re going to use the User Get of Microsoft Graph API.

There is actually a NuGet package for Microsoft Graph API and, in general SDKs (at the time of this writing) for 9 platforms.

        private static async Task<bool> DoesUserExistsAsync(HttpClient client, string user)
        {
            try
            {
                var payload = await client.GetStringAsync($"https://graph.microsoft.com/v1.0/users/{user}");

                return true;
            }
            catch (HttpRequestException)
            {
                return false;
            }
        }

Again, the code is minimalist here.  The HTTP GET actually returns user information that could be used.

Returning the groups a user belong to

Here we’re going to use the memberof method of Microsoft Graph API.

        private static async Task<string[]> GetUserGroupsAsync(HttpClient client, string user)
        {
            var payload = await client.GetStringAsync(
                $"https://graph.microsoft.com/v1.0/users/{user}/memberOf");
            var obj = JsonConvert.DeserializeObject<JObject>(payload);
            var groupDescription = from g in obj["value"]
                                   select g["displayName"].Value<string>();

            return groupDescription.ToArray();
        }

Here, we deserialize the returned payload to extract the group display names.  The information returned is richer and could be used.

Creating a new user

Finally we’re going to use the Create User method of Microsoft Graph API.

This is slightly more complicated as it is an HTTP POST with a JSON payload in input.

        private static async Task CreateUserAsync(HttpClient client, string user, string domain)
        {
            using (var stream = new MemoryStream())
            using (var writer = new StreamWriter(stream))
            {
                var payload = new
                {
                    accountEnabled = true,
                    displayName = user,
                    mailNickname = user,
                    userPrincipalName = $"{user}@{domain}",
                    passwordProfile = new
                    {
                        forceChangePasswordNextSignIn = true,
                        password = "tempPa$$w0rd"
                    }
                };
                var payloadText = JsonConvert.SerializeObject(payload);

                writer.Write(payloadText);
                writer.Flush();
                stream.Flush();
                stream.Position = 0;

                using (var content = new StreamContent(stream))
                {
                    content.Headers.Add("Content-Type", "application/json");

                    var response = await client.PostAsync("https://graph.microsoft.com/v1.0/users/", content);

                    if (!response.IsSuccessStatusCode)
                    {
                        throw new InvalidOperationException(response.ReasonPhrase);
                    }
                }
            }
        }

Calling Code

The calling code looks like this:


        private static async Task Test()
        {
            //var token = await AppAuthenticationAsync();
            var token = await HttpAppAuthenticationAsync();

            using (var client = new HttpClient())
            {
                client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token);

                var user = "test@LdapVplDemo.onmicrosoft.com";
                var userExist = await DoesUserExistsAsync(client, user);

                Console.WriteLine($"Does user exists?  {userExist}");

                if (userExist)
                {
                    var groups = await GetUserGroupsAsync(client, user);

                    foreach (var g in groups)
                    {
                        Console.WriteLine($"Group:  {g}");
                    }

                    await CreateUserAsync(client, "newuser", "LdapVplDemo.onmicrosoft.com");
                }
            }
        }

Summary

We can see that using the Microsoft Graph API, most if not all LDAP query can easily be converted.

Microsoft Graph API is aligned with OData v3 which makes it a great REST API where filtering queries are standardized.

Authenticating to Azure AD non-interactively

fingerprint-1382652_640I want to use Azure AD as a user directory but I do not want to use its native web authentication mechanism which requires users to go via an Active Directory page to login (which can be branded and customized to look like my own).

I just want to give a user name & password to an authentication API.

The reason is I want to migrate an existing application currently using an LDAP server and want to change the code as little as possible.

You might have other reasons why you want to do that.

Let’s just say that the online literature HEAVILY leans towards using the web workflow.  Even the documentation on native apps recommends that your native app pops the Azure AD web page to authenticate the user.  There are good reasons for that as this way your app never touches user credentials and is therefore more secure and your app more trustworthy.  So in general, I totally agree with those recommendations.

But in my case, my legacy app is touching the user credentials all over the place.  Also this approach is a stepping stone before considering moving to the web workflow.

After a weekend spent scanning the entire web (as a side note I’ve learned there was little Scandinavian men living inside the Earth, which is flat by the way), I finally found Danny Strockis’ article about Authenticating to Azure AD non-interactively using a username & password.

That is it basically.  Except the article is a little quick on setup, so I’m gona elaborate here.

I’m going to give a sample in C# using ADAL, but since at the end of the day, the authentication is one HTTP POST, I’ll also give a more bare bone sample using HTTP post if you don’t want or cannot to integrate with ADAL.  The samples are quite trivial so you should be able to convert them in the language / platform of your choice.

Conceptually

We basically want our users to interact with our application only, punch in their credentials and have the application check with Azure AD if the credentials are good.

image

Here the application is represented as a Web app here but it could also be a native application.

In order to do this in the world of Azure AD is to use two Azure AD apps:

  1. A client app, representing the agent authenticating the user
  2. A service app, representing the actual application the user is authenticating against

I know it looks weird, it makes your brain hurts and is in great part the reason I’m writing this article, because it isn’t straightforward.

image

In the authentication parlance, the client App is the client (so far so good) while the service app is the resource, i.e. the thing the user is trying to access:  the client is accessing the resource on the user’s behalf.

Setting up Azure AD

Yes, there will be some steps to setup Azure AD.

First we need a tenant.  We can use the tenant used by our subscription but typically for those types of scenario we’ll want to have a separate tenant for our end users.  This article shows how to create an Azure AD tenant.

We then need a user to test.  We can create a user like this:

image

where, of course, ldapvpldemo is my Azure AD tenant.

We’ll also need to give that user a “permanent” password, so let’s show the password on creation (or reset it afterwards) then let’s go to an InPrivate browsing window and navigate to https://login.microsoftonline.com/.  We can then login as the user (e.g. test@ldapvpldemo.onmicrosoft.com) with the said password.  We’ll be prompted to change it (e.g. My$uperComplexPassw0rd).

Let’s create the client app.  In App Registrations, let’s add an app with:

image

It would probably work as a Web app / API but a Native App seems more fitting.

We’ll need the Application ID of that application which we can find by looking at the properties of the application.

image

We then need to create the service app:

image

We’ll need the App ID URI of the service:

image

That URI can be changed, either way we need the final value.

We will need to give permission to the client app to access the service app.  For that we need to go back to the client app, go to the Required Permissions menu and add a permission.  From there, in the search box we can just start to write the name of the service app (e.g. MyLegacyService) and it should appear where we can select it.  We then click the Access MyLegacyService box.

Finally, we need to grant the permissions to users.

image

With all that, we’re ready to authenticate.

ADAL

This sample is in C# / .NET but since ADAL is available on multiple platform (e.g. Java), this should be easy to port.

We’ll create a Console Application project.  We need the full .NET Framework as the .NET core version of ADAL doesn’t have the UserPasswordCredential class we’re gona be using.

We need to install the NuGet package Microsoft.IdentityModel.Clients.ActiveDirectory in the project.

        private static async Task AdalAuthenticationAsync()
        {
            //  Constants
            var tenant = "LdapVplDemo.onmicrosoft.com";
            var serviceUri = "https://LdapVplDemo.onmicrosoft.com/d0f883f6-1c32-4a14-a436-0a995a19c39b";
            var clientID = "b9faf13d-9258-4142-9a5a-bb9f2f335c2d";
            var userName = $"test@{tenant}";
            var password = "My$uperComplexPassw0rd1";

            //  Ceremony
            var authority = "https://login.microsoftonline.com/" + tenant;
            var authContext = new AuthenticationContext(authority);
            var credentials = new UserPasswordCredential(userName, password);
            var authResult = await authContext.AcquireTokenAsync(serviceUri, clientID, credentials);
        }

We have to make sure we’ve copied the constants in the constant section.

This should work and authResult should contain a valid access token that we could use as a bearer token in different scenarios.

If we pass a wrong password or wrong user name, we should obtain an error as expected.

HTTP POST

We can use Fiddler or other HTTP sniffing tool to see what ADAL did for us.  It is easy enough to replicate.

        private static async Task HttpAuthenticationAsync()
        {
            //  Constants
            var tenant = "LdapVplDemo.onmicrosoft.com";
            var serviceUri = "https://LdapVplDemo.onmicrosoft.com/d0f883f6-1c32-4a14-a436-0a995a19c39b";
            var clientID = "b9faf13d-9258-4142-9a5a-bb9f2f335c2d";
            var userName = $"test@{tenant}";
            var password = "My$uperComplexPassw0rd";

            using (var webClient = new WebClient())
            {
                var requestParameters = new NameValueCollection();

                requestParameters.Add("resource", serviceUri);
                requestParameters.Add("client_id", clientID);
                requestParameters.Add("grant_type", "password");
                requestParameters.Add("username", userName);
                requestParameters.Add("password", password);
                requestParameters.Add("scope", "openid");

                var url = $"https://login.microsoftonline.com/{tenant}/oauth2/token";
                var responsebytes = await webClient.UploadValuesTaskAsync(url, "POST", requestParameters);
                var responsebody = Encoding.UTF8.GetString(responsebytes);
            }
        }

Basically, we have an HTTP post where all the previous argument are passed in the POST body.

If we pass a wrong password or wrong user name, we should obtain an error as expected.  Interestingly, the HTTP code is 400 (i.e. bad request) instead of some unauthorized variant.

Summary

Authenticating on an Azure AD tenant isn’t the most recommended method as it means your application is handling credentials whereas the preferred method delegate to an Azure AD hosted page the handling of those credential so your application only see an access token.

But for a legacy migration for instance, it makes sense.  Azure AD definitely is more secure than an LDAP server sitting on a VM.

We’ve seen two ways to perform the authentication.  Under the hood they end up being the same.  One is using the ADAL library while the other uses bare bone HTTP POST.

Keep in mind, ADAL does perform token caching.  If you plan to use it in production, you’ll want to configure the cache properly not to get strange behaviours.

Joining an ARM Linux VM to AAD Domain Services

Active Directory is one of the most popular domain controller / LDAP server around.

In Azure we have Azure Active Directory (AAD).  Despite the name, AAD isn’t just a multi-tenant AD.  It is built for the cloud.

Sometimes though, it is useful to have a traditional domain controller…  in the cloud.  Typically this is with legacy workloads built to work with Active Directory.  But also, a very common scenario is to join an Azure VM to a domain so that users authenticate on it with the same accounts they authenticate to use the Azure Portal.

The underlying directory could even be synced with a corporate network, in which case users could log into the VMs using their corporate account.  I won’t cover this here but you can read about it in a previous article for AD Connect part.

The straightforward option is to build an Active Directory cluster on Azure VMs.  This will work but requires the maintenance of those 2 VMs.

mont-saint-michel-france-normandy-europe[1]

An easier option is AAD Domain Services (AADDS).  AADDS exposes an AAD tenant as a managed domain service.  It does it by provisioning some variant of Active Directory managed cluster, i.e. we do not see or care about the underlying VMs.

The cluster is synchronized one-way (from AAD to AADDS).  For this reason, AAD is read only through its LDAP interface, e.g. we can’t reset a password using LDAP.

The Azure documentation walks us through such an integration with classic (ASM) VMs.  Since ARM has been around for more than a year, I recommend to always go with ARM VMs.  This article aims at showing how to do this.

I’ll heavily leveraged the existing documentation and detail only what differs from the official documentation.

Also, keep in mind this article is written in January 2017.  Azure AD will transition to ARM in the future and will likely render this article obsolete.

Dual Networks

The main challenge we face is that AAD is an ASM service and AAD Domain Service are exposed within an ASM Virtual Network (VNET), which is incompatible with our ARM VM.

Thankfully, we now have Virtual Network peering allowing us to peer an ASM and an ARM VNET together so they can act as if they were one.

image

Peering

As with all VNET peering, the two VNETs must be of mutually exclusive IP addresses space.

I created two VNETs in the portal (https://portal.azure.com).  I recommend creating them in the new portal explicitly, this way even the classic one will be part of the desired resource group.

The classic one has 10.0.0.0/24 address range while the ARM one has 10.1.0.0/24.

The peering can be done from the portal too.  In the Virtual Network pane (say the ARM one), select Peerings:

image

We should see an empty list here, so let’s click Add.

We need to give the peering a name.  Let’s type PeeringToDomainServices.

We then select Classic in the Peer details since we want to peer with a classic VNET.

Finally, we’ll click on Choose a Virtual Network.

image

From there we should see the classic VNET we created.

Configuring AADDS

The online documentation is quite good for this.

Just make sure you select the classic VNET we created.

You can give a domain name that is different that the AAD domain name (i.e. *.onmicrosoft.com).

Enabling AADDS takes up to 30 minutes.  Don’t hold your breath!

Joining a VM

We can create a Linux VM, put it in the ARM VNET we created, and join it to the AADDS domain now.

Again, the online documentation does a great job of walking us through the process.  The documentation is written for Red Hat.

When I tried it, I used a CentOS VM and I ended up using different commands, namely the realmd command (ignoring the SAMBA part of the article).

Conclusion

It is fairly straightforward to enable Domain Services in AAD and well documented.

A challenge we currently have currently is to join or simply communicate from an ARM VM to AADDS.  For this we need two networks, a classic (ASM) one and an ARM one, and we need to peer them together.

Troubleshooting NSGs using Diagnostic Logs

I’ve wrote about how to use Network Security Group (NSG) before.

Chances are, once you get a complicated enough set of rules in a NSG, you’ll find yourself with NSGs that do not do what you think they should do.

Troubleshooting NSGs isn’t trivial.

I’ll try to give some guidance here but to this date (January 2017), there is no tool where you can just say “please follow packet X and tell me against which wall it bumps”.  It’s more indirect than that.

 

Connectivity

First thing, make sure you can connect to your VNET.

If you are connecting to a VM via a public IP, make sure you have access to that IP (i.e. you’re not sitting behind an on premise firewall blocking the outgoing port you are trying to use), that the IP is connected to the VM either directly or via a Load Balancer.

If you are connecting to a VM via a private IP through a VPN Gateway of some sort, make sure you can connect and that your packets are routed to the gateway and from there they get routed to the proper subnet.

An easy way to make sure of that is to remove all NSGs and replace them by a “let everything go in”.  Of course, that’s also opening your workloads to hackers, so I recommend you do that with a test VM that you destroy afterwards.

Diagnose

Then I would recommend to go through the official Azure guidelines to troubleshoot NSGs.  This walks you through the different diagnosis tools.

Diagnostic Logs

If you reached this section and haven’t achieve greatness yet, well…  You need something else.

What we’re going to do here is use NSG Diagnostic Logs to understand a bit more what is going on.

By no means is this magic and especially in an environment already in use where a lot of traffic is occurring, it might be difficult to make sense of what the logs are going to give us.

Nevertheless, the logs give us a picture of what really is happening.  They are aggregated though, so we won’t see your PC IP address for instance.  The aggregation is probably what limit the logs effectiveness the most.

Sample configuration

I provide here a sample configuration I’m going to use to walk through the troubleshooting process.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "VM Admin User Name": {
      "defaultValue": "myadmin",
      "type": "string"
    },
    "VM Admin Password": {
      "defaultValue": null,
      "type": "securestring"
    },
    "Disk Storage Account Name": {
      "defaultValue": "<your prefix>vmpremium",
      "type": "string"
    },
    "Log Storage Account Name": {
      "defaultValue": "<your prefix>logstandard",
      "type": "string"
    },
    "VM Size": {
      "defaultValue": "Standard_DS2",
      "type": "string",
      "allowedValues": [
        "Standard_DS1",
        "Standard_DS2",
        "Standard_DS3"
      ],
      "metadata": {
        "description": "SKU of the VM."
      }
    },
    "Public Domain Label": {
      "type": "string"
    }
  },
  "variables": {
    "Vhds Container Name": "vhds",
    "VNet Name": "MyVNet",
    "Ip Range": "10.0.1.0/24",
    "Public IP Name": "MyPublicIP",
    "Public LB Name": "PublicLB",
    "Address Pool Name": "addressPool",
    "Subnet NSG Name": "subnetNSG",
    "VM NSG Name": "vmNSG",
    "RDP NAT Rule Name": "RDP",
    "NIC Name": "MyNic",
    "VM Name": "MyVM"
  },
  "resources": [
    {
      "type": "Microsoft.Network/publicIPAddresses",
      "name": "[variables('Public IP Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "tags": {
        "displayName": "Public IP"
      },
      "properties": {
        "publicIPAllocationMethod": "Dynamic",
        "idleTimeoutInMinutes": 4,
        "dnsSettings": {
          "domainNameLabel": "[parameters('Public Domain Label')]"
        }
      }
    },
    {
      "type": "Microsoft.Network/loadBalancers",
      "name": "[variables('Public LB Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "tags": {
        "displayName": "Public Load Balancer"
      },
      "properties": {
        "frontendIPConfigurations": [
          {
            "name": "LoadBalancerFrontEnd",
            "comments": "Front end of LB:  the IP address",
            "properties": {
              "publicIPAddress": {
                "id": "[resourceId('Microsoft.Network/publicIPAddresses/', variables('Public IP Name'))]"
              }
            }
          }
        ],
        "backendAddressPools": [
          {
            "name": "[variables('Address Pool Name')]"
          }
        ],
        "loadBalancingRules": [
          {
            "name": "Http",
            "properties": {
              "frontendIPConfiguration": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/frontendIPConfigurations/LoadBalancerFrontEnd')]"
              },
              "frontendPort": 80,
              "backendPort": 80,
              "enableFloatingIP": false,
              "idleTimeoutInMinutes": 4,
              "protocol": "Tcp",
              "loadDistribution": "Default",
              "backendAddressPool": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/backendAddressPools/', variables('Address Pool Name'))]"
              },
              "probe": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/probes/TCP-Probe')]"
              }
            }
          }
        ],
        "probes": [
          {
            "name": "TCP-Probe",
            "properties": {
              "protocol": "Tcp",
              "port": 80,
              "intervalInSeconds": 5,
              "numberOfProbes": 2
            }
          }
        ],
        "inboundNatRules": [
          {
            "name": "[variables('RDP NAT Rule Name')]",
            "properties": {
              "frontendIPConfiguration": {
                "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/frontendIPConfigurations/LoadBalancerFrontEnd')]"
              },
              "frontendPort": 3389,
              "backendPort": 3389,
              "protocol": "Tcp"
            }
          }
        ],
        "outboundNatRules": [],
        "inboundNatPools": []
      },
      "dependsOn": [
        "[resourceId('Microsoft.Network/publicIPAddresses', variables('Public IP Name'))]"
      ]
    },
    {
      "type": "Microsoft.Network/virtualNetworks",
      "name": "[variables('VNet Name')]",
      "apiVersion": "2016-03-30",
      "location": "[resourceGroup().location]",
      "properties": {
        "addressSpace": {
          "addressPrefixes": [
            "10.0.0.0/16"
          ]
        },
        "subnets": [
          {
            "name": "default",
            "properties": {
              "addressPrefix": "[variables('Ip Range')]",
              "networkSecurityGroup": {
                "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('Subnet NSG Name'))]"
              }
            }
          }
        ]
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Network/networkSecurityGroups', variables('Subnet NSG Name'))]"
      ]
    },
    {
      "apiVersion": "2015-06-15",
      "name": "[variables('Subnet NSG Name')]",
      "type": "Microsoft.Network/networkSecurityGroups",
      "location": "[resourceGroup().location]",
      "tags": {},
      "properties": {
        "securityRules": [
          {
            "name": "Allow-HTTP-From-Internet",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "80",
              "sourceAddressPrefix": "Internet",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 100,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-RDP-From-Everywhere",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "3389",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 150,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-Health-Monitoring",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "AzureLoadBalancer",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 200,
              "direction": "Inbound"
            }
          },
          {
            "name": "Disallow-everything-else-Inbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 300,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-to-VNet",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "VirtualNetwork",
              "access": "Allow",
              "priority": 100,
              "direction": "Outbound"
            }
          },
          {
            "name": "Disallow-everything-else-Outbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 200,
              "direction": "Outbound"
            }
          }
        ],
        "subnets": []
      }
    },
    {
      "apiVersion": "2015-06-15",
      "name": "[variables('VM NSG Name')]",
      "type": "Microsoft.Network/networkSecurityGroups",
      "location": "[resourceGroup().location]",
      "tags": {},
      "properties": {
        "securityRules": [
          {
            "name": "Allow-HTTP-From-Internet",
            "properties": {
              "protocol": "Tcp",
              "sourcePortRange": "*",
              "destinationPortRange": "80",
              "sourceAddressPrefix": "Internet",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 100,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-Health-Monitoring",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "AzureLoadBalancer",
              "destinationAddressPrefix": "*",
              "access": "Allow",
              "priority": 200,
              "direction": "Inbound"
            }
          },
          {
            "name": "Disallow-everything-else-Inbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 300,
              "direction": "Inbound"
            }
          },
          {
            "name": "Allow-to-VNet",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "VirtualNetwork",
              "access": "Allow",
              "priority": 100,
              "direction": "Outbound"
            }
          },
          {
            "name": "Disallow-everything-else-Outbound",
            "properties": {
              "protocol": "*",
              "sourcePortRange": "*",
              "destinationPortRange": "*",
              "sourceAddressPrefix": "*",
              "destinationAddressPrefix": "*",
              "access": "Deny",
              "priority": 200,
              "direction": "Outbound"
            }
          }
        ],
        "subnets": []
      }
    },
    {
      "type": "Microsoft.Network/networkInterfaces",
      "name": "[variables('NIC Name')]",
      "apiVersion": "2016-03-30",
      "location": "[resourceGroup().location]",
      "properties": {
        "ipConfigurations": [
          {
            "name": "ipconfig",
            "properties": {
              "privateIPAllocationMethod": "Dynamic",
              "subnet": {
                "id": "[concat(resourceId('Microsoft.Network/virtualNetworks', variables('VNet Name')), '/subnets/default')]"
              },
              "loadBalancerBackendAddressPools": [
                {
                  "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/backendAddressPools/', variables('Address Pool Name'))]"
                }
              ],
              "loadBalancerInboundNatRules": [
                {
                  "id": "[concat(resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name')), '/inboundNatRules/', variables('RDP NAT Rule Name'))]"
                }
              ]
            }
          }
        ],
        "dnsSettings": {
          "dnsServers": []
        },
        "enableIPForwarding": false,
        "networkSecurityGroup": {
          "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('VM NSG Name'))]"
        }
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Network/virtualNetworks', variables('VNet Name'))]",
        "[resourceId('Microsoft.Network/loadBalancers', variables('Public LB Name'))]"
      ]
    },
    {
      "type": "Microsoft.Compute/virtualMachines",
      "name": "[variables('VM Name')]",
      "apiVersion": "2015-06-15",
      "location": "[resourceGroup().location]",
      "properties": {
        "hardwareProfile": {
          "vmSize": "[parameters('VM Size')]"
        },
        "storageProfile": {
          "imageReference": {
            "publisher": "MicrosoftWindowsServer",
            "offer": "WindowsServer",
            "sku": "2012-R2-Datacenter",
            "version": "latest"
          },
          "osDisk": {
            "name": "[variables('VM Name')]",
            "createOption": "FromImage",
            "vhd": {
              "uri": "[concat('https', '://', parameters('Disk Storage Account Name'), '.blob.core.windows.net', concat('/', variables('Vhds Container Name'),'/', variables('VM Name'), '-os-disk.vhd'))]"
            },
            "caching": "ReadWrite"
          },
          "dataDisks": []
        },
        "osProfile": {
          "computerName": "[variables('VM Name')]",
          "adminUsername": "[parameters('VM Admin User Name')]",
          "windowsConfiguration": {
            "provisionVMAgent": true,
            "enableAutomaticUpdates": true
          },
          "secrets": [],
          "adminPassword": "[parameters('VM Admin Password')]"
        },
        "networkProfile": {
          "networkInterfaces": [
            {
              "id": "[resourceId('Microsoft.Network/networkInterfaces', concat(variables('NIC Name')))]"
            }
          ]
        }
      },
      "resources": [],
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', parameters('Disk Storage Account Name'))]",
        "[resourceId('Microsoft.Network/networkInterfaces', variables('NIC Name'))]"
      ]
    },
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[parameters('Disk Storage Account Name')]",
      "sku": {
        "name": "Premium_LRS",
        "tier": "Premium"
      },
      "kind": "Storage",
      "apiVersion": "2016-01-01",
      "location": "[resourceGroup().location]",
      "properties": {},
      "resources": [],
      "dependsOn": []
    },
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[parameters('Log Storage Account Name')]",
      "sku": {
        "name": "Standard_LRS",
        "tier": "standard"
      },
      "kind": "Storage",
      "apiVersion": "2016-01-01",
      "location": "[resourceGroup().location]",
      "properties": {},
      "resources": [],
      "dependsOn": []
    }
  ]
}

The sample has one VM sitting in a subnet protected by a NSG.  The VM’s NIC is also protected by NSG, to make our life complicated (as we do too often).  The VM is exposed on a Load Balanced Public IP and RDP is enabled via NAT rules on the Load Balancer.

The VM is running on a Premium Storage account but the sample also creates a standard storage account to store the logs.

The Problem

The problem we are going to try to find using Diagnostic Logs is that the subnet’s NSG let RDP in via “Allow-RDP-From-Everywhere” rule while the NIC’s doesn’t and that type of traffic will get blocked, as everything else, by the “Disallow-everything-else-Inbound” rule.

In practice, you’ll likely have something more complicated going on, maybe some IP filtering, etc.  .   But the principles remain.

Enabling Diagnostic Logs

I couldn’t enable the Diagnostic Logs via the ARM template as it isn’t possible to do so yet.  We can do that via the Portal or PowerShell.

I’ll illustrate the Portal here, since it’s for troubleshooting, chances are you won’t automate it.

I’ve covered Azure Monitor in a previous article.  We’ve seen that different providers expose different schemas.

NSGs expose two categories of Diagnostic LogsEvent and Rule Counter.  We’re going to use Rule Counter only.

Rule Counter will give us a count of how many times a given rule was triggered for a given target (MAC address / IP).  Again, if we have lots of traffic flying around, that won’t be super useful.  This is why I recommend to isolate the network (or recreate an isolated one) in order to troubleshoot.

We’ll start by the subnet NSG.

image

Scrolling all the way down on the NSG’s pane left menu, we select Diagnostics Logs.

image

The pane should look as follow since no diagnostics are enabled.  Let’s click on Turn on diagnostics.

image

We then turn it on.

image

For simplicity here, we’re going to use the Archive to a storage account.

image

We will configure the storage account to send the logs to.

image

For that, we’re selecting the standard account created by the template or whichever storage account you fancy.  Log Diagnostics will go and create a blob container for each category in the selected account.  The names a predefined (you can’t choose).

We select the NetworkSecurityGroupRuleCounter category.

image

And finally we hit the save button on the pane.

We’ll do the same thing with the VM NSG.

image

Creating logs

No we are going to try to get through our VM.  We are going to describe how to that with the sample I gave but if you are troubleshooting something, just try the faulty connection.

We’re going to try to RDP to the public IP.  First we need the public IP domain name.  So in the resource group:

image

At the top of the pane we’ll find the DNS name that we can copy.

image

We can then paste it in an RDP window.

image

Trying to connect should fail and it should leave traces in the logs for us to analyse.

Analysis

We’ll have to wait 5-10 minutes for the logs to get in the storage account as this is done asynchronously.

Actually, a way to make sure to get clean logs is to delete the blob container and then try the RDP connection.  The blob container should reappear after 5-10 minutes.

To get the logs in the storage account we need some tool.  I use Microsoft Azure Storage Explorer.

image

The blob container is called insights-logs-networksecuritygrouprulecounter.

The logs are hidden inside a complicated hierarchy allowing us to send all our diagnostic logs from all our NSGs over time there.

Basically, resourceId%3D / SUBSCRIPTIONS / <Your subscription ID> / RESOURCEGROUPS / NSG / PROVIDERS / MICROSOFT.NETWORK / NETWORKSECURITYGROUPS / we’ll see two folders:  SUBNETNSG & VMNSG.  Those are our two NSGs.

If we dig under those two folders, we should find one file (or more if you’ve waited for a while).

Let’s copy those file with appropriate naming somewhere to analyse them.

Preferably, use a viewer / editor that understands JSON (I use Visual Studio).  If you use notepad…  you’re going to have fun.

If we look at the subnet NSG logs first and search for “RDP”, we’ll find this entry:

    {
      "time": "2017-01-09T11:46:44.9090000Z",
      "systemId": "...",
      "category": "NetworkSecurityGroupRuleCounter",
      "resourceId": ".../RESOURCEGROUPS/NSG/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/SUBNETNSG",
      "operationName": "NetworkSecurityGroupCounters",
      "properties": {
        "vnetResourceGuid": "{50C7B76A-4B8F-481A-8029-73569E5C7D87}",
        "subnetPrefix": "10.0.1.0/24",
        "macAddress": "00-0D-3A-00-B6-B5",
        "primaryIPv4Address": "10.0.1.4",
        "ruleName": "UserRule_Allow-RDP-From-Everywhere",
        "direction": "In",
        "type": "allow",
        "matchedConnections": 0
      }
    },

The most interesting part is the matchedConnections property, which is zero because we didn’t achieve connections.

If we look in the VM logs, we’ll find this:

    {
      "time": "2017-01-09T11:46:44.9110000Z",
      "systemId": "...",
      "category": "NetworkSecurityGroupRuleCounter",
      "resourceId": ".../RESOURCEGROUPS/NSG/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/VMNSG",
      "operationName": "NetworkSecurityGroupCounters",
      "properties": {
        "vnetResourceGuid": "{50C7B76A-4B8F-481A-8029-73569E5C7D87}",
        "subnetPrefix": "10.0.1.0/24",
        "macAddress": "00-0D-3A-00-B6-B5",
        "primaryIPv4Address": "10.0.1.4",
        "ruleName": "UserRule_Disallow-everything-else-Inbound",
        "direction": "In",
        "type": "block",
        "matchedConnections": 2
      }
    },

Where matchedConnections is 2 (because I tried twice).

So the logs tell us where the traffic when.

From here we could wonder why it hit that rule and look for a rule with a higher priority that allow RDP in, find none and conclude that’s our problem.

Trial & Error

If the logs are not helping you, the last resort is to modify the NSG until you understand what is going on.

A way to do this is to create a rule “allow everything in from anywhere”, give it maximum priority.

If traffic still doesn’t go in, you have another problem than NSG, so go back to previous steps.

If traffic goes in, good.  Move that allow-everything rule down until you find which rule is blocking you.  You may have a lot of rules, in which case I would recommend a dichotomic search algorithm:  put your allow-everything rule in the middle of your “real rules”, if traffic passes, move the rule to the middle of the bottom half, otherwise, the middle of the top half, and so on.  This way, you’ll only need log(N) steps where N is your number of rules.

Summary

Troubleshooting NSGs can be difficult but here I highlighted a basic methodology to find your way around.

Diagnostic Logs help to give us insight about what is really going on although it can be tricky to work with.

In general, as with every debugging experience just apply the principle of Sherlock Holmes:

Eliminate the impossible.  Whatever remains, however improbable, must be the truth.

In terms of debugging, that means remove all the noise, all the fat and then some meat, until what remains is so simply that the truth will hit you.

Azure SQL Elastic Pool – Moving databases across pools using PowerShell

hand-truck-564242_640[1]

I’ve written a bit about Azure SQL Elastic Pool lately:  an overview, about ARM template and about database size.

One of the many great features of Azure SQL Elastic Pool is that like Azure SQL Database (standalone), we can change the eDTU capacity of the pool “on the fly”, without downtime.

Unlike its standalone cousin though, we can’t change the edition of the pool.  The edition is either Basic, Standard or Premium.  It is set at creation and is immutable after that.

If we want to change the edition of a pool, the obvious way is to create another pool, move the databases there, delete the original, recreate it with a different edition and move the databases back.

This article shows how to do that using PowerShell.

You might want to move databases around for other reasons, typically to optimize the density and performance of pools.  You would then use a very similar script.

Look at the pool

Let’s start with the pools we established with the sample ARM template of a previous article.

From there we can look at the pool Pool-A using the following PowerShell command:


$old = Get-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Pool-A -ServerName pooldemoserver

$old

We can see the pool current edition is Standard while its Data Transaction Unit (DTU) count is 200.

image

Create a temporary pool

We’ll create a temporary pool, aptly named temporary, attached to the same server:


$temp = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Temporary -ServerName pooldemoserver -Edition $old.Edition -Dtu $old.Dtu

$temp

It’s important to create a pool that will allow the databases to be moved into.  The maximum size of a database is dependent of the edition and number of DTU of the elastic pool.  The easiest way is to create a pool with the same edition / DTU and this is what we do here by referencing the $old variable.

Move databases across

First, let’s grab the databases in the original pool:


$dbs = Get-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver | where {$_.ElasticPoolName -eq $old.ElasticPoolName}

$dbs | select DatabaseName

ElasticPoolName is a property of a database.  We’ll simply change it by setting each database:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $temp.ElasticPoolName}

That command takes longer to run as the databases have to move from one compute to another.

Delete / Recreate pool

We can now delete the original pool.  It’s important to note that we wouldn’t have been able to delete a pool with databases in it.


Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver

$new = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver -Edition Premium -Dtu 250

The second line recreates it with Premium edition.  We could keep the original DTU, but it’s not always possible since different editions support different DTU values.  In this case, for instance, it wasn’t possible since 200 DTUs isn’t supported for Premium pools.

If you execute those two commands without pausing in between, you will likely receive an error.  It is one of those cases where the Azure REST API returns and the resource you asked to be removed seems to be removed but you can’t really recreate it back yet.  An easy work around consist in pausing or retrying.

Move databases back

We can then move the databases back to the new pool:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $new.ElasticPoolName}

Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $temp.ElasticPoolName -ServerName pooldemoserver

In the second line we delete the temporary pool.  Again, this takes a little longer to execute since databases must be moved from one compute to another.

Summary

We showed how to move databases from a pool to another.

The pretext was a change in elastic pool edition but we might want to move databases around for other reasons.

In practice you might not want to move your databases twice to avoid the duration of the operation and might be happy to have a different pool name.  In the demo we did, the move took less than a minute because we had two empty databases.  With many databases totaling a lot of storage it would take much more time to move those.

Azure SQL Elastic Pool – Database Size

PlanetsI mentioned in a past article, regarding database sizes within an elastic pool:

“No policies limit an individual database to take more storage although a database maximum size can be set on a per-database basis.”

I’m going to focus on that in this article.

An Azure SQL Database resource has a MaxSizeInBytes property.  We can set it either in an ARM template (see this ARM template and the property maxSizeBytes) or in PowerShell.

An interesting aspect of that property is that:

  • It takes only specific values
  • Not all values are permitted, depending on the elastic pool edition (i.e. Basic, Standard or Premium)

Valid values

One way to find the valid values is to navigate to the ARM schema.  That documented schema likely is slightly out of date since, as of December 2016, the largest value is 500GB, which isn’t the largest possible database size (1 TB for a P15).

The online documentation of Set-AzureRmSqlDatabase isn’t fairing much better as the documentation for the MaxSizeBytes parameter refers to a parameter MaxSizeGB to know about the acceptable values.  Problem is, MaxSizeGB parameter doesn’t exist.

But let’s start with the documented schema as it probably only lacks the most recent DB sizes.

Using that schema list of possible values and comparing that with the stand alone database size for given editions, we can conclude (after testing with ARM templates of course), that a Basic pool can have databases up to 2GB, for Standard we have 250GB and of course Premium can take all values.

It is important to notice that the pool can have larger storage.  For instance, even the smallest basic pool, with 50 eDTUs, can have a maximum storage of 5 GB.  But each DB within that pool can only grow up to 2 GB.

That gives us the following landscape:

Maximum Size (in bytes) Maximum Size (in GB) Available for (edition)
104857600 0.1 Premium, Standard, Basic
524288000 0.5 Premium, Standard, Basic
1073741824 1 Premium, Standard, Basic
2147483648 2 Premium, Standard, Basic
5368709120 5 Premium, Standard
10737418240 10 Premium, Standard
21474836480 20 Premium, Standard
32212254720 30 Premium, Standard
42949672960 40 Premium, Standard
53687091200 50 Premium, Standard
107374182400 100 Premium, Standard
161061273600 150 Premium, Standard
214748364800 200 Premium, Standard
268435456000 250 Premium, Standard
322122547200 300 Premium
429496729600 400 Premium
536870912000 500 Premium

Storage Policies

We can now use this maximum database as a storage policy, i.e. a way to make sure a single database doesn’t take all the storage available in a pool.

Now, this isn’t as trivially useful as the eDTUs min / max we’ve seen in a pool.  In the eDTU case, that was controlling how much compute was given to a database at all time.  In the case of a database maximum size, once the database reaches that size, it becomes read only.  That will likely break our applications running on top of it unless we planned for it.

A better approach would be to monitor the different databases and react to size changes, by moving the database to other pool for instance.

The maximum size could be a safeguard though.  For instance, let’s imagine we want each database in a pool to stay below 50 GB and we’ll monitor for that and raise alerts in case that threshold is reached (see Azure Monitor for monitoring and alerts).  Now we might still put a maximum size for the databases of 100 GB.  This would act as a safeguard:  if we do not do anything about a database outgrowing its target 50GB, it won’t be able to grow indefinitely, which could top the pool maximum size and make the entire pool read only, affecting ALL the databases in the pool.

In that sense the maximum size still act as a resource governor, preventing noisy neighbour effect.

PowerShell example

We can’t change a database maximum size in the portal (as of December 2016).

Using ARM template, it is easy to change the parameter.  Here, let’s simply show how we would change it for an existing database.

Building on the example we gave in a previous article, we can easily grab the Pool-A-Db0 database in resource group DBs and server pooldemoserver:


Get-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0

image

We can see the size is the one that was specified in the ARM template (ARM parameter DB Max Size default value), i.e. 10 GB.  We can bump it to 50 GB, i.e. 53687091200 bytes:


Set-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0 -MaxSizeBytes 53687091200

We can confirm the change in the portal by looking at the properties.

image

Default Behaviour

If the MaxSizeByte property is omitted, either in an ARM Template or a new-AzureRmSqlDatabase PowerShell cmdlet, the default behaviour is for the database to have the maximum capacity (e.g. for Standard, 250 GB).

After creation, we can’t set the property value to null to obtain the same effect.  Omitting the parameter simply keep to previously set value.

Summary

We’ve looked at the maximum size property of a database.

It can be used to control the growth of a database inside a pool and prevent a database growth to affect others.