Tag Archives: Data

Data Architecture & Data Science ; Azure SQL Database, SQL on Azure VMs, Azure Data Warehouse, Data Factory, Redis Cache, Azure Search

Hyperspheres & the curse of dimensionality

fractal-1118515_640I previously talked about the curse of dimensionality (more than 2 years ago) related to Machine Learning.

Here I wanted to discuss it in more depth and dive into the mathematics of it.

High dimensions might sound like Physics’ string theory where our universe is made of more than 4 dimensions.  This isn’t what we are talking about here.

The curse of dimensionality is related to what happens when a model deals with a data space with dimensions in the hundreds or thousands.

As the title of this article suggests, we’re going to take the angle of the properties of Hyperspheres (spheres in N dimensions) to explore high dimension properties.

This article is inspired by Foundations of Data Science by John Hopcroft & Ravindran Kannan (chapter 2).

Why should I care about High Dimension?

When introducing Machine Learning concepts, we typically use few dimensions in order to help visualization.  For instance, when I introduced linear regression or polynomial regression in past articles, I used datasets in two dimensions and plot them on a chart.

Brown RabbitIn the real world, typical data sets have much more dimensions.

A typical case of high dimension is image recognition (or character recognition as a sub category) where even a low resolution pictures will have hundreds of pixels.  The corresponding model would take gray-scale input vector of dimension 100+.

Close-up of an Animal Eating GrassWith fraud detection, transactions do not contain only the value of the transaction, but the time of day, day of week, geo-location, type of commerce, type of products, etc.  .  This might or might not be a high dimension problem, depending on the available data.

In an e-commerce web site, a Product recommendation algorithm could be as simple as an N x N matrix of 0 to 1 values where N is the number of products.

With IoT, multiple sensors feed a prediction model.

In bioinformatics, DNA sequencing generates a huge amount of data which often is arranged in high dimensional model.

Basically, high dimensions crop up everywhere.

What happens as dimension increases?

For starter a space with more dimensions simply is…  bigger.  In order to sample a space with 2 dimensions with a resolution of 10 units, we need to have 10^2 = 100 points.  Having the same sampling in a space of dimension 3 would require 10^3 = 1000 points.  Dimension 20?  20 would require 10^20 = 100 000 000 000 000 000 000 points.

Right off the bat we can tell that sampling the space of dimension 2 & 3 is realistic while for a space of dimension 20, it’s unlikely.  Hence we are likely going to suffer from under-sampling.

Yoshua Bengio has a nice discussion about Curse of Dimensionality here.

Hypersphere in a cube

Tapemeasure on 20Beyond sampling problems, metrics & measures change behaviour at high dimensions.  Intuitively it makes sense since a measure takes a vector (vectors) and squeeze it (them) into a numerical value ; the higher the dimension, the more data we squeeze into one number & hence we should lose information.

We use metrics & measures heavily in Machine Learning.  For instance, a lot of cost (or loss) functions are based on Euclidean’s distance:

dist(x,y) = \displaystyle\sum_{i=1}^N (x_i-y_i)^2

Now if x and / or y are random variables (e.g. samples), the law of large numbers applies when N becomes large.  This implies the sum will trend to the expected value with a narrower standard deviation as N increases.  In turns, this means there is less and less information in the distance as the number of dimensions increases.

This brings us to the hypersphere.  An hypersphere’s equation is

\displaystyle\sum_{i=1}^N x_i^2 = R^2

where x is a point of dimension N and R is the radius of the hypersphere.

An hypersphere of dimension 1 is a line, an hypersphere of dimension 2 is a circle, dimension 3 is a sphere, dimension 4 is an…  expending universe?  and so on.

A theorem I’ll demonstrate in a future article is that the volume of an hypersphere of radius 1 tends to zero as the dimension increases.

This is fairly unintuitive, so let me give real numerical values:

Dimension Hyper Volume
1 2
2 3.141592654
3 4.188790205
4 4.934802201
5 5.263789014
6 5.16771278
7 4.72476597
8 4.058712126
9 3.298508903
10 2.55016404
11 1.884103879
12 1.335262769
13 0.910628755
14 0.599264529
15 0.381443281
16 0.23533063
17 0.140981107
18 0.082145887
19 0.046621601
20 0.025806891
21 0.01394915
22 0.007370431
23 0.003810656
24 0.001929574
25 0.000957722
26 0.000466303
27 0.000222872
28 0.000104638

If we plot those values:

image

We see the hyper volume increases in the first couple of dimensions.  A circle of radius 1 has an area of pi (3.1416) while a sphere of radius 1 has a volume of 4.19.  It peaks at dimension 5 and then shrinks.

It is unintuitive because in 2 and 3 dimensions (the only dimensions in which we can visualize an hypersphere), the hypersphere pretty much fills its embedding cube.  A way to “visualize” what’s happening in higher dimension is to consider a “diagonal” into an hypersphere.

For a circle, the diagonal (i.e. 45’) intersects with the unit circle at

(\frac {1} {\sqrt {2}}, \frac {1} {\sqrt {2}}) since (\frac {1} {\sqrt {2}})^2 + (\frac {1} {\sqrt {2}})^2 = 1^2

In general, at dimension N, the diagonal intersects at

x_i = \frac {1} {\sqrt {N}}

So, despite the hypersphere of radius 1 touches the cube of side 2 centered at the origin on each of its walls, the surface of the hypersphere, in general, gets further and further away from the cube surface as the dimension increases.

Consequences of the hypersphere volume

A straightforward consequence of the hypersphere volume is sampling.  Randomly sampling a square of side 2 centered at the origin will land points within the unit circle with probability \frac{\pi}{4} = \%79.  The same process with an hypersphere of dimension 8 would hit the inside of the hypersphere with a probability of %1.6.

A corollary to the hypersphere volume is that at higher dimension, the bulk of the volume of the hypersphere is concentrated in a thin annulus below its surface.  An obvious consequence of that is that optimizing a metric (i.e. a distance) in high dimension is difficult.

What should we do about it?

First step is to be aware of it.

A symptom of high dimensionality is under sampling:  the space covered is so large the number of sample points required to learn the underlying model are likely to be over the actual sample set’s size.

The simplest solution is to avoid high dimensionality with some pre-processing.  For instance, if we have a priori knowledge of the domain, we might be able to combine dimensions together.  For example, in an IoT field with 10 000 sensors, for many reasons, including curse of dimensionality, it wouldn’t be a good idea to consider each sensor inputs as an independent input.  It would be worth trying to aggregate out sensor inputs by analyzing the data.

Summary

Some Machine Learning algorithms will be more sensitive to higher dimensionality than others but the curse of dimensionality affects most algorithms.

It is a problem to be aware of and we should be ready to mitigate it with some good feature engineering.

Extended Outage @ Instapaper – Resiliency example

art-broken-explosion-glass[1]I use Instapaper extensively to store the continuous flow of internet articles I want to read.  I created a bunch of tools integrating with it (e.g. monitoring atom feeds and sending new articles to Instapaper)

Last week my tools didn’t work for a while so I finally logged in directly to the site.  The site was down, citing an extended outage, with a reference to a blog article to explain the outage.

It got back on its feet after around 48 hours.  This isn’t an article to call out Instapaper’s engineering:  that type of outage happens everywhere all the time.  But let’s learn from it.

The article cites the cause of the outage as being “we hit a system limit for our hosted database that’s preventing new articles from being saved”.  The article also cite they had to spent time on the phone with their cloud provider (didn’t mention which one) before diagnosing the problem.

Know the limits

They hit a limit.

What are the limits?

A clear advise when working in Azure:  know where your sandbox ends.  We should always consult http://aka.ms/azurelimits to inform our architecture.

The nice thing about the cloud is that most limit are clearly defined and embedded in SLAs.

This comes as a surprise to a lot of people, especially when they see some of the restrictions.  “How come?  I can only put 2 disks on a Standard D1 v2 VM?”  This is because the experience we have on premise where there are few hard wired limitations.  Instead, we typically have degradations, e.g. sure you can put a Terabyte storage on your old laptop, but you are likely going to saturate its IOs before you will exhaust the storage.  In Azure, it is more clearly defined because Azure is a public cloud, i.e. a multi-tenant environment.  The only way for Azure to guarantee other customers they can have capacity X is to make sure we do not use that capacity.  So there are sandboxes everywhere.

On the flipside, you have less surprises.  The reason you do not have so many sandboxes on prem is that everything is shared.  That is nice as long as we are the only one to crank the resource usage.  But when other systems start grinding that SAN, it isn’t so fast anymore.

So first lesson we get from that Instapaper outage is to know the limit.  Did the Instapaper know the limits of their database?  Did they hit some software limit, e.g. they used all the 32 bits of an identifier column?

Monitor for those limits

Now that we know the limits, we know when our system is going to break.

So we need to monitor it so that we won’t have a bad surprise as Instapaper team must have had (I’m sure some dinners were cancelled during those 48 hours).

In Azure, we can use Azure Monitor / Log Analytics to monitor different metrics on our resources.  We can setup alerts to be notified when some threshold has been reached.

We can then react.  If we setup the threshold low enough, that will give us a window of time to react and to sketch a plan.

In an article on SQL Database sizes in an elastic pool, we saw that we can fix a limit on database size to make sure it doesn’t consume the allowed size of the entire pool.  This is a safeguard mechanism in case our monitoring strategy fails.

Strategy to get beyond the limits

We know the limits of our architecture.  We monitor our system to get some heads up before those limits are reached.

When they are reached, what do we do?

Typically, if we’re aware of the limits, we will have given it some thoughts, even if it isn’t fully documented.  This is something that goes in handover conversations actually “this system has this limit, but you know, if that ever becomes a problem, consider that alternative”.  Hopefully, the alternative doesn’t consist into re-writing the entire system.

Scale out / partitioning strategies

If we are lucky, the original architect(s) of the system have baked in a way to overcome its limits.

A typical way is to partition the system.  We can partition the system per user / group of users or other data segments.  This way, we can scale out the system to handle different partitions.

We can start with only one set of resources (e.g. Virtual Machines) handling all partitions and the day that set of resource hits its limits, we can split the partitions into two groups and have another set of resources handling the other set of partitions.  And so on.

One day, that partition scheme might also hit its limits.  For instance, maybe we have a set of resources handling only one partition each and most of the set of resource have all hit their limits.  In that case, we’re back to figure out how to go beyond the limits of our system.  Typically, that will consist into repartition it in a way we can scale out further.

Azure is a partitioned / scaled out system.  That strategy has allowed it to grow to its current size.  Along the way, repartitioning was needed.  For instance, the transition of ASM to ARM was partially that.  In ASM, there were only 2 regions handling the Azure Management APIs while in ARM, each region handles API requests.

Conclusion

Instapaper was victim of its own success.  That’s a good problem to have but a problem nevertheless.

Make sure you at least know the limits of your system and monitor for them.  This way if success curses you, you’ll be able to react during working hours instead of cancelling your valentine dinner and spending your days justifying yourself to your CEO.

Azure SQL Elastic Pool – Moving databases across pools using PowerShell

hand-truck-564242_640[1]

I’ve written a bit about Azure SQL Elastic Pool lately:  an overview, about ARM template and about database size.

One of the many great features of Azure SQL Elastic Pool is that like Azure SQL Database (standalone), we can change the eDTU capacity of the pool “on the fly”, without downtime.

Unlike its standalone cousin though, we can’t change the edition of the pool.  The edition is either Basic, Standard or Premium.  It is set at creation and is immutable after that.

If we want to change the edition of a pool, the obvious way is to create another pool, move the databases there, delete the original, recreate it with a different edition and move the databases back.

This article shows how to do that using PowerShell.

You might want to move databases around for other reasons, typically to optimize the density and performance of pools.  You would then use a very similar script.

Look at the pool

Let’s start with the pools we established with the sample ARM template of a previous article.

From there we can look at the pool Pool-A using the following PowerShell command:


$old = Get-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Pool-A -ServerName pooldemoserver

$old

We can see the pool current edition is Standard while its Data Transaction Unit (DTU) count is 200.

image

Create a temporary pool

We’ll create a temporary pool, aptly named temporary, attached to the same server:


$temp = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName Temporary -ServerName pooldemoserver -Edition $old.Edition -Dtu $old.Dtu

$temp

It’s important to create a pool that will allow the databases to be moved into.  The maximum size of a database is dependent of the edition and number of DTU of the elastic pool.  The easiest way is to create a pool with the same edition / DTU and this is what we do here by referencing the $old variable.

Move databases across

First, let’s grab the databases in the original pool:


$dbs = Get-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver | where {$_.ElasticPoolName -eq $old.ElasticPoolName}

$dbs | select DatabaseName

ElasticPoolName is a property of a database.  We’ll simply change it by setting each database:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $temp.ElasticPoolName}

That command takes longer to run as the databases have to move from one compute to another.

Delete / Recreate pool

We can now delete the original pool.  It’s important to note that we wouldn’t have been able to delete a pool with databases in it.


Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver

$new = New-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $old.ElasticPoolName -ServerName pooldemoserver -Edition Premium -Dtu 250

The second line recreates it with Premium edition.  We could keep the original DTU, but it’s not always possible since different editions support different DTU values.  In this case, for instance, it wasn’t possible since 200 DTUs isn’t supported for Premium pools.

If you execute those two commands without pausing in between, you will likely receive an error.  It is one of those cases where the Azure REST API returns and the resource you asked to be removed seems to be removed but you can’t really recreate it back yet.  An easy work around consist in pausing or retrying.

Move databases back

We can then move the databases back to the new pool:


$dbs | foreach {Set-AzureRmSqlDatabase -ResourceGroupName DBs -ServerName pooldemoserver -DatabaseName $_.DatabaseName -ElasticPoolName $new.ElasticPoolName}

Remove-AzureRmSqlElasticPool -ResourceGroupName DBs -ElasticPoolName $temp.ElasticPoolName -ServerName pooldemoserver

In the second line we delete the temporary pool.  Again, this takes a little longer to execute since databases must be moved from one compute to another.

Summary

We showed how to move databases from a pool to another.

The pretext was a change in elastic pool edition but we might want to move databases around for other reasons.

In practice you might not want to move your databases twice to avoid the duration of the operation and might be happy to have a different pool name.  In the demo we did, the move took less than a minute because we had two empty databases.  With many databases totaling a lot of storage it would take much more time to move those.

Azure SQL Elastic Pool – Database Size

PlanetsI mentioned in a past article, regarding database sizes within an elastic pool:

“No policies limit an individual database to take more storage although a database maximum size can be set on a per-database basis.”

I’m going to focus on that in this article.

An Azure SQL Database resource has a MaxSizeInBytes property.  We can set it either in an ARM template (see this ARM template and the property maxSizeBytes) or in PowerShell.

An interesting aspect of that property is that:

  • It takes only specific values
  • Not all values are permitted, depending on the elastic pool edition (i.e. Basic, Standard or Premium)

Valid values

One way to find the valid values is to navigate to the ARM schema.  That documented schema likely is slightly out of date since, as of December 2016, the largest value is 500GB, which isn’t the largest possible database size (1 TB for a P15).

The online documentation of Set-AzureRmSqlDatabase isn’t fairing much better as the documentation for the MaxSizeBytes parameter refers to a parameter MaxSizeGB to know about the acceptable values.  Problem is, MaxSizeGB parameter doesn’t exist.

But let’s start with the documented schema as it probably only lacks the most recent DB sizes.

Using that schema list of possible values and comparing that with the stand alone database size for given editions, we can conclude (after testing with ARM templates of course), that a Basic pool can have databases up to 2GB, for Standard we have 250GB and of course Premium can take all values.

It is important to notice that the pool can have larger storage.  For instance, even the smallest basic pool, with 50 eDTUs, can have a maximum storage of 5 GB.  But each DB within that pool can only grow up to 2 GB.

That gives us the following landscape:

Maximum Size (in bytes) Maximum Size (in GB) Available for (edition)
104857600 0.1 Premium, Standard, Basic
524288000 0.5 Premium, Standard, Basic
1073741824 1 Premium, Standard, Basic
2147483648 2 Premium, Standard, Basic
5368709120 5 Premium, Standard
10737418240 10 Premium, Standard
21474836480 20 Premium, Standard
32212254720 30 Premium, Standard
42949672960 40 Premium, Standard
53687091200 50 Premium, Standard
107374182400 100 Premium, Standard
161061273600 150 Premium, Standard
214748364800 200 Premium, Standard
268435456000 250 Premium, Standard
322122547200 300 Premium
429496729600 400 Premium
536870912000 500 Premium

Storage Policies

We can now use this maximum database as a storage policy, i.e. a way to make sure a single database doesn’t take all the storage available in a pool.

Now, this isn’t as trivially useful as the eDTUs min / max we’ve seen in a pool.  In the eDTU case, that was controlling how much compute was given to a database at all time.  In the case of a database maximum size, once the database reaches that size, it becomes read only.  That will likely break our applications running on top of it unless we planned for it.

A better approach would be to monitor the different databases and react to size changes, by moving the database to other pool for instance.

The maximum size could be a safeguard though.  For instance, let’s imagine we want each database in a pool to stay below 50 GB and we’ll monitor for that and raise alerts in case that threshold is reached (see Azure Monitor for monitoring and alerts).  Now we might still put a maximum size for the databases of 100 GB.  This would act as a safeguard:  if we do not do anything about a database outgrowing its target 50GB, it won’t be able to grow indefinitely, which could top the pool maximum size and make the entire pool read only, affecting ALL the databases in the pool.

In that sense the maximum size still act as a resource governor, preventing noisy neighbour effect.

PowerShell example

We can’t change a database maximum size in the portal (as of December 2016).

Using ARM template, it is easy to change the parameter.  Here, let’s simply show how we would change it for an existing database.

Building on the example we gave in a previous article, we can easily grab the Pool-A-Db0 database in resource group DBs and server pooldemoserver:


Get-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0

image

We can see the size is the one that was specified in the ARM template (ARM parameter DB Max Size default value), i.e. 10 GB.  We can bump it to 50 GB, i.e. 53687091200 bytes:


Set-AzureRmSqlDatabase -ServerName pooldemoserver -ResourceGroupName DBs -DatabaseName Pool-A-Db0 -MaxSizeBytes 53687091200

We can confirm the change in the portal by looking at the properties.

image

Default Behaviour

If the MaxSizeByte property is omitted, either in an ARM Template or a new-AzureRmSqlDatabase PowerShell cmdlet, the default behaviour is for the database to have the maximum capacity (e.g. for Standard, 250 GB).

After creation, we can’t set the property value to null to obtain the same effect.  Omitting the parameter simply keep to previously set value.

Summary

We’ve looked at the maximum size property of a database.

It can be used to control the growth of a database inside a pool and prevent a database growth to affect others.

Azure SQL Elastic Pool – ARM Templates

coil-632650_640[1]In my last article, I covered Azure SQL Elastic Pool.  In this one I cover how to provision it using ARM templates.

As of today (December 2016), the documentation about Azure SQL Elastic Pool provisioning via ARM templates is…  not existing.

Searching for it I was able to gather hints via a few colleagues GitHub repos, but there are no examples in the ARM quickstart templates nor is the elastic pool resource schema documented.  Also, the Automation Script feature in the portal doesn’t reverse engineer an ARM template for the elastic pool.

So I hope this article fills that gap and is easy to search for & consume.

ARM Template

Here we’re going to provision a Server with two pools, Pool-A & Pool-B (yeah, sounds a bit like Thing 1 & Thing 2), each having a few (configurable number of) databases in them.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "Server Name": {
      "defaultValue": "pooldemoserver",
      "type": "string",
      "metadata": {
        "description": "Name of the SQL:  needs to be unique among all servers in Azure"
      }
    },
    "Admin Login": {
      "defaultValue": "myadmin",
      "type": "string",
      "metadata": {
        "description": "SQL Server Admin login name"
      }
    },
    "Admin Password": {
      "type": "securestring",
      "metadata": {
        "description": "SQL Server Admin login password"
      }
    },
    "Pool A Edition": {
      "defaultValue": "Standard",
      "type": "string",
      "allowedValues": [
        "Basic",
        "Standard",
        "Premium"
      ],
      "metadata": {
        "description": "Pool A Edition"
      }
    },
    "Pool B Edition": {
      "defaultValue": "Standard",
      "type": "string",
      "allowedValues": [
        "Basic",
        "Standard",
        "Premium"
      ],
      "metadata": {
        "description": "Pool B Edition"
      }
    },
    "DB Max Size": {
      "defaultValue": "10737418240",
      "type": "string",
      "allowedValues": [
        "104857600",
        "524288000",
        "1073741824",
        "2147483648",
        "5368709120",
        "10737418240",
        "21474836480",
        "32212254720",
        "42949672960",
        "53687091200",
        "107374182400",
        "161061273600",
        "214748364800",
        "268435456000",
        "322122547200",
        "429496729600",
        "536870912000"
      ],
      "metadata": {
        "description": "DB Max Size, in bytes"
      }
    }
  },
  "variables": {
    "Pool A": "Pool-A",
    "Pool B": "Pool-B",
    "DB A Prefix": "Pool-A-Db",
    "DB B Prefix": "Pool-B-Db",
    "Count A": 2,
    "Count B": 4
  },
  "resources": [
    {
      "name": "[parameters('Server Name')]",
      "type": "Microsoft.Sql/servers",
      "apiVersion": "2014-04-01-preview",
      "location": "[resourceGroup().location]",
      "dependsOn": [],
      "properties": {
        "administratorLogin": "[parameters('Admin Login')]",
        "administratorLoginPassword": "[parameters('Admin Password')]",
        "version": "12.0"
      },
      "resources": [
        {
          "type": "firewallRules",
          "kind": "v12.0",
          "name": "AllowAllAzureIps",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "startIpAddress": "0.0.0.0",
            "endIpAddress": "0.0.0.0"
          }
        },
        {
          "type": "elasticpools",
          "name": "[variables('Pool A')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool A Edition')]",
            "dtu": "200",
            "databaseDtuMin": "10",
            "databaseDtuMax": "50"
          }
        },
        {
          "type": "elasticpools",
          "name": "[variables('Pool B')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool B Edition')]",
            "dtu": "400",
            "databaseDtuMin": "0",
            "databaseDtuMax": null
          }
        }
      ]
    },
    {
      "type": "Microsoft.Sql/servers/databases",
      "copy": {
        "name": "DBs-A",
        "count": "[variables('Count A')]"
      },
      "name": "[concat(parameters('Server Name'), '/', variables('DB A Prefix'), copyIndex())]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]",
        "[resourceId('Microsoft.Sql/servers/elasticpools', parameters('Server Name'), variables('Pool A'))]"
      ],
      "tags": {
        "displayName": "Pool-A DBs"
      },
      "apiVersion": "2014-04-01-preview",
      "properties": {
        "collation": "SQL_Latin1_General_CP1_CI_AS",
        "maxSizeBytes": "[parameters('DB Max Size')]",
        "requestedServiceObjectiveName": "ElasticPool",
        "elasticPoolName": "[variables('Pool A')]"
      }
    },
    {
      "type": "Microsoft.Sql/servers/databases",
      "copy": {
        "name": "DBs-B",
        "count": "[variables('Count B')]"
      },
      "name": "[concat(parameters('Server Name'), '/', variables('DB B Prefix'), copyIndex())]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]",
        "[resourceId('Microsoft.Sql/servers/elasticpools', parameters('Server Name'), variables('Pool B'))]"
      ],
      "tags": {
        "displayName": "Pool-B DBs"
      },
      "apiVersion": "2014-04-01-preview",
      "properties": {
        "edition": "[parameters('Pool B Edition')]",
        "collation": "SQL_Latin1_General_CP1_CI_AS",
        "maxSizeBytes": "[parameters('DB Max Size')]",
        "requestedServiceObjectiveName": "ElasticPool",
        "elasticPoolName": "[variables('Pool B')]"
      }
    }
  ]
}

We can deploy the template as is.  We’ll need to enter at least an Admin password (for the Azure SQL server).

The “Server Name” parameter must be unique throughout Azure (not just your subscription).  So if it happens to be taken when you try to deploy the template (in which case you would receive an error message around Server ‘pooldemoserver’ is busy with another operation), try a new, more original name.

Each parameter is documented in the metadata description.

Results

Let’s look at the result.  Let’s first go in the resource group where we deployed the template.

In the resource list we should see the following:

image

We first have our server, with default name pooldemoserver, our two pools, Pool-A & Pool-B, and 6 databases.

Let’s select Pool-A.

image

We can see the pool is of Standard edition, has 200 eDTUs with a minimum of 10 eDTUs and maximum 50 per databases, which is faithful to its ARM definition (line 10-13).

        {
          "type": "elasticpools",
          "name": "[variables('Pool A')]",
          "apiVersion": "2014-04-01-preview",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', parameters('Server Name'))]"
          ],
          "properties": {
            "edition": "[parameters('Pool A Edition')]",
            "dtu": "200",
            "databaseDtuMin": "10",
            "databaseDtuMax": "50"
          }
        }

Similarly, Pool-B has a minimum of 0 and a maximum of 100.  The maximum was set to null in the template and hence is the maximum allowed for a standard pool of 400 DTUs.

Let’s select the databases in Pool-B.  Alternatively, we can select the Configure pool tool bar option.

image

The following pane shows us the eDTUs consumed in the last 14 days.  It also allows us to change the assigned eDTUs to the pool.

It is in this pane that we can add / remove databases from the pool.

image

In order to remove databases from the pool, they must first be selected in the lower right pane corner.  We will have to chose a standalone pricing tier for each DB and hit save.  As of today (December 2016), there are no way to move databases from one pool to another directly, i.e. they must first be converted as a stand alone.  It is possible to move databases from a pool to another using PowerShell as I’ll demonstrate in a future article though.

If we go back to the resource group and select any of the database, we have a link to its parent pool.

image

Summary

Despite the current lack (as of December 2016) of documentation around it, it is quite possible to create databases within an elastic pool using ARM templates as we’ve demonstrated here.

Azure SQL Elastic Pool Overview

What is rubber-bands-1158199_640[1] Azure SQL Elastic Pool and what it brings to Azure SQL Database, the SQL Azure Platform as a Service (PaaS).

Traditional model

Let’s look at how Azure SQL works without elastic pools first.

imageAzure SQL Database comes with an Azure SQL Server.  This shouldn’t be confused with SQL Server installed on a VM:  it is a logical server holding everything that doesn’t belong to a database.  This model makes it compatible with SQL server “on premise”.

The important point here is that the compute sits with the database and not the server.  The edition (i.e. Basic, Standard & Premium) & Pricing Tier / DTUs are set at the database level, not the server level.  Actually, the server doesn’t even have a cost associated to it.

In some ways, this is the opposite to what SQL Server on premise got us used to.  On premise, we have a server sitting on an OS and the databases construct on top of it borrowing computes from the Server.  In Azure the compute sits at the database level  while the server is this pseudo centralized thing with no compute associated to it.

In that sense, Azure SQL DB has a much better isolation model out of the box although you can now do the same thing with SQL Server on premise using the Resource Governor.

Elastic Pool Conceptual model

imageAlong came Elastic Pool.  Interestingly, Elastic Pools brought back the notion of a centralized compute shared across databases.  Unlike on premise SQL Server on premise though, that compute doesn’t sit with the server itself but with a new resource called an elastic pool.

This allows us to provision certain compute, i.e. DTUs, to a pool and share it across many databases.

A typical scenario where that is beneficial is a lot of small databases which tend to be cost prohibitive with the traditional model.

That makes it an excellent solution for ISV / SaaS providers where different tenants have different spikes.

See this article for the different scenarios where elastic pools apply.

We could have “hybrid” scenarios where a server have  “traditional databases” with their own pricing tier and databases attached to a pool.

DTU policy

The pool can define a policy regarding the minimum and maximum DTUs per database.  This allows for each database

  • to have a minimum amount of compute dedicated to it, avoiding compute starvation
  • to have a maximum amount of compute, avoiding the noisy neighbour effect avoiding that one database starves all the others

Storage

On the other hand, a pool has a maximum storage size shared across the pool.

No policies limit an individual database to take more storage although a database maximum size can be set on a per-database basis.

It is important to note that once the maximum pool size has been reached by the sum of the databases’ size, all databases become read only.

Limits

I often find it useful to look at the different limits and quotas of Azure services to understand the structure of a service and inform design decisions.

http://aka.ms/azurelimits should never be too far in your links.

Looking at Azure SQL databases limits, we find those interesting facts:

  • Maximum number of databases per pool vary depending on the pool size, e.g. a Standard 100 DTUs can have 200 databases
  • A server can have up to 5000 databases associated to it
  • A server can have up to 45000 DTUs associated to it, either via elastic pools, databases directly or even Azure Data Warehouses
  • There is no documented limit on the number of pools per server
  • The server, its pools & databases must be in the same Azure region under the same subscription

Let’s look at a few design questions now.

Why use more than one pool?

Why not using a pool with a huge number of DTUs?

  • Ultimately a pool cannot be of infinite size (4000 DTUs / 750 GB for Premium, 3000 DTUs / 2.9 TB for Standard) so we’ll use multiple pools to scale
  • Policies, i.e. min / max DTU are setup at the pool level ; if we have a bunch of tiny DBs with little transactions on & a group of middle sized DBs with more traffic on them, we might want to have multiple pools with different policies to handle those

Should we have one server per pool or multiple pools per server?

An Azure SQL Server does very little:

  • Holds an Admin account for the entire server
  • Holds the pools & databases
  • Exists in a region

Obviously, multiple regions, multiple servers.

Why would we choose multiple servers over one server multiple pools?  Security:  if we want to segregate access to different databases at the administration level, we wouldn’t want to share one admin account for all.

A lot can be argued around that point, e.g. we could have one admin account for every DBs but different admins per DB for instance.  In compliance scenario, I could see this playing out, e.g. dev vs prod, banking vs federal customers, etc.  .

Why use a Premium elastic pool?

Standard pools have bigger storage and comparable parallelism specs, so why go Premium and pay a…  Premium?

The main spec where Premium shines is for min / max DTUs per DB:  Premium allows us to have bigger databases within a pool while Standard is geared to have smaller DBs.

More concretely, standard pools allow to have up to 100 DTUs per database while in Premium, it goes up to 4000.

As a comparison, 100 DTUs is equivalent to a standalone S3 database.

Summary

We did look at Azure SQL Database Elastic Pool feature.

Elastic Pool really is an economic feature as it’s a way to increase the number of databases ran on the same compute and hence reducing the cost.

In scenarios where we have lots of small databases, it can drastically reduce costs.

In a future post, I’ll cover how to provision an Elastic pool using ARM template.

Primer on Azure Monitor

pexels-photo-42384[1]

Azure Monitor is the latest evolution of a set of technologies allowing Azure resources monitoring.

I’ve written about going the extra mile to be able to analyze logs in the past.

The thing is that once our stuff is in production with tons of users hitting it, it might very well start behaving in unpredictable ways.  If we do not have a monitoring strategy, we’re going to be blind to problems and only see unrelated symptoms.

Azure Monitor is a great set of tools.  It doesn’t try to be the end all solution.  On the contrary, although it offers analytics out of the box, it let us export the logs wherever we want to go further.

I found the documentation of Azure Monitor (as of November 2016) a tad confusing, so I thought I would give a summary overview here.  Hopefully it will get you started.

Three types of sources

First thing we come across in Azure Monitor’s literature is the three types of sources:  Activity Logs, Diagnostic Logs & Metrics.

There is a bit of confusion between Diagnostic Logs & Metrics, some references hinting towards the fact that metrics are generated by Azure while diagnostics are generated by the resource itself.  That is confusing & beside the point.  Let’s review those sources here.

Activity logs capture all operations performed on Azure resources.  They used to be called Audit Logs & Operational Logs.  Those comes directly from Azure APIs.  Any operations done on an Azure API (except HTTP-GET operations) traces an activity log.  Activity log are in JSON and contain the following information:  action, caller, status & time stamp.  We’ll want to keep track of those to understand changes done in our Azure environments.

Metrics are emitted by most Azure resources.  They are akin to performance counter, something that has a value (e.g. % CPU, IOPS, # messages in a queue, etc.) over time ; hence Azure Monitor, in the portal, allows us to plot those against time.  Metrics typically comes in JSON and tend to be emitted at regular interval (e.g. every minute) ; see this articles for available metrics.  We’ll want to check those to make sure our resources operate within expected bounds.

Diagnostic logs are logs emitted by a resource that provide detailed data about the operation of that particular resource.  That one is specific to the resource in terms of content:  each resource will have different logs.  Format will also vary (e.g. JSON, CSV, etc.), see this article for different schemas.  They also tend to be much more voluminous for an active resource.

That’s it.  That’s all there is to it.  Avoid the confusion and re-read the last three paragraphs.  It’s a time saver.  Promised.

We’ll discuss the export mechanisms & alerts below, but for now, here’s a summary of the capacity (as of November 2016) of each source:

Source Export to Supports Alerts
Activity Logs Storage Account & Event Hub Yes
Metrics Storage Account, Event Hub & Log Analytics Yes
Diagnostics Logs Storage Account, Event Hub & Log Analytics No

Activity Log example

We can see the activity log of our favorite subscription by opening the monitor blade, which should be on the left hand side, in the https://portal.azure.com.

image

If you do not find it there, hit the More Services and search for Monitor.

Selecting the Activity Logs, we should have a search form and some results.

image

ListKeys is a popular one.  Despite being conceptually a read operation, the List Key action, on a storage account, is done through a POST in the Azure REST API, specifically to trigger an audit trail.

We can select one of those ListKeys and, in the tray below, select the JSON format:


{
"relatedEvents": [],
"authorization": {
"action": "Microsoft.Storage/storageAccounts/listKeys/action",
"condition": null,
"role": null,
"scope": "/subscriptions/<MY SUB GUID>/resourceGroups/securitydata/providers/Microsoft.Storage/storageAccounts/a92430canadaeast"
},
"caller": null,
"category": {
"localizedValue": "Administrative",
"value": "Administrative"
},
"claims": {},
"correlationId": "6c619af4-453e-4b24-8a4c-508af47f2b26",
"description": "",
"eventChannels": 2,
"eventDataId": "09d35196-1cae-4eca-903d-6e9b1fc71a78",
"eventName": {
"localizedValue": "End request",
"value": "EndRequest"
},
"eventTimestamp": "2016-11-26T21:07:41.5355248Z",
"httpRequest": {
"clientIpAddress": "104.208.33.166",
"clientRequestId": "ba51469e-9339-4329-b957-de5d3071d719",
"method": "POST",
"uri": null
},

I truncated the JSON here.  Basically, it is an activity event with all the details.

Metrics example

Metrics can be accessed from the “global” Monitor blade or from any Azure resource’s monitor blade.

Here I look at the CPU usage of an Azure Data Warehouse resource (which hasn’t run for months, hence flat lining).

image

Diagnostic Logs example

For diagnostics, let’s create a storage account and activate diagnostics on it.  For this, under the Monitoring section, let’s select Diagnostics, make sure the status is On and then select Blob logs.

image

We’ll notice that all metrics were already selected.  We also noticed that the retention is controlled there, in this case 7 days.

Let’s create a blob container, copy a file into it and try to access it via its URL.  Then let’s wait a few minutes for the diagnostics to be published.

We should see a special $logs container in the storage account.  This container will contain log files, stored by date & time.  For instance for the first file, just taking the first couple of lines:


1.0;2016-11-26T20:48:00.5433672Z;<strong>GetContainerACL</strong>;Success;200;3;3;authenticated;monitorvpl;monitorvpl;blob;"https://monitorvpl.blob.core.windows.net:443/$logs?restype=container&amp;comp=acl";"/monitorvpl/$logs";295a75a6-0001-0021-7b26-48c117000000;0;184.161.153.48:51484;2015-12-11;537;0;217;62;0;;;"&quot;0x8D4163D73154695&quot;";Saturday, 26-Nov-16 20:47:34 GMT;;"Microsoft Azure Storage Explorer, 0.8.5, win32, Azure-Storage/1.2.0 (NODE-VERSION v4.1.1; Windows_NT 10.0.14393)";;"9e78fc90-b419-11e6-a392-8b41713d952c"
1.0;2016-11-26T20:48:01.0383516Z;<strong>GetContainerACL</strong>;Success;200;3;3;authenticated;monitorvpl;monitorvpl;blob;"https://monitorvpl.blob.core.windows.net:443/$logs?restype=container&amp;comp=acl";"/monitorvpl/$logs";06be52d9-0001-0093-7426-483a6d000000;0;184.161.153.48:51488;2015-12-11;537;0;217;62;0;;;"&quot;0x8D4163D73154695&quot;";Saturday, 26-Nov-16 20:47:34 GMT;;"Microsoft Azure Storage Explorer, 0.8.5, win32, Azure-Storage/1.2.0 (NODE-VERSION v4.1.1; Windows_NT 10.0.14393)";;"9e9c6311-b419-11e6-a392-8b41713d952c"
1.0;2016-11-26T20:48:33.4973667Z;<strong>PutBlob</strong>;Success;201;6;6;authenticated;monitorvpl;monitorvpl;blob;"https://monitorvpl.blob.core.windows.net:443/sample/A.txt";"/monitorvpl/sample/A.txt";965cb819-0001-0000-2a26-48ac26000000;0;184.161.153.48:51622;2015-12-11;655;7;258;0;7;"Tj4nPz2/Vt7I1KEM2G8o4A==";"Tj4nPz2/Vt7I1KEM2G8o4A==";"&quot;0x8D4163D961A76BE&quot;";Saturday, 26-Nov-16 20:48:33 GMT;;"Microsoft Azure Storage Explorer, 0.8.5, win32, Azure-Storage/1.2.0 (NODE-VERSION v4.1.1; Windows_NT 10.0.14393)";;"b2006050-b419-11e6-a392-8b41713d952c"

Storage Account diagnostics obviously log in semicolon delimited values (variant of CSV), which isn’t trivial to read the way I pasted it here.  But basically we can see the logs contain details:  each operation done around the blobs are logged with lots of details.

Querying

As seen in the examples, Azure Monitor allows us to query the logs.  This can be done in the portal but also using Azure Monitor REST API, cross platform Command-Line Interface (CLI) commands, PowerShell cmdlets or the .NET SDK.

Export

We can export the sources to a Storage Account and specify a retention period in days.  We can also export them to Azure Event Hubs & Azure Log Analytics.  As specified in the table above, Activity logs can’t be sent to Log Analytics.  Also, Activity logs can be analyzed using Power BI.

There are a few reasons why we would export the logs:

  • Archiving scenario:  Azure Monitor keeps content for 30 days.  If we need more retention, we need to archive it ourselves.  We can do that by exporting the content to a storage account ; this also enables big data scenario where we keep the logs for future data mining.
  • Analytics:  Log Analytics offers more capacity for analyzing content.  It also offers 30 days of retention by default but can be extended to one year.  Basically, this would upgrade us to Log Analytics.
  • Alternatively, we could export the logs to a storage account where they could be ingested by another SIEM (e.g. HP Arcsight).  See this article for details about SIEM integration.
  • Near real time analysis:  Azure Event Hubs allow us to send the content to many different places, but also we could analyze it on the fly using Azure Stream Analytics.

Alerts (Notifications)

Both Activity Logs & Metrics can trigger alerts.  Currently (as of November 2016), only Metrics alert can be set in the portal ; Activity Logs alerts must be set by PowerShell, CLI or REST API.

Alerts are a powerful way to automatically react to our Azure resource behaviors ; when certain conditions are met (e.g. for a metric, when a value exceeds a threshold for a given period of time), the alert can send an email to a specified list of email addresses but also, it can invoke a Web Hook.

Again, the ability to invoke a web hook opens up the platform.  We could, for instance, expose an Azure Automation runbook as a Web Hook ; it therefore means an alert could trigger whatever a runbook is able to do.

Security

There are two RBAC roles around monitoring:  Reader & Contributor.

There are also some security considerations around monitoring:

  • Use a dedicated storage account (or multiple dedicated storage accounts) for monitoring data.  Basically, avoid mixing monitoring and “other” data, so that people do not gain access to monitoring data inadvertently and, vis versa, that people needing access to monitoring data do not gain access to “other” data (e.g. sensitive business data).
  • For the same reasons, use a dedicated namespace with Event Hubs
  • Limit access to monitoring data by using RBAC, e.g. by putting them in a separate resource group
  • Never grant ListKeys permission across a subscription as users could then gain access to reading monitoring data
  • If you need to give access to monitoring data, consider using a SAS token (for either Storage Account or Event Hubs)

Summary

Azure Monitor brings together a suite of tools to monitor our Azure resources.  It is an open platform in the sense it integrates easily with solutions that can complement it.