Author Archives: Vincent-Philippe Lauzon

Invoking a Stored Procedure from a partitioned CosmosDB collection from Logic Apps

I struggled a little to make that work, so I thought I would share the learning in order to accelerate your future endeavour.

I was looking at a way to populate a CosmosDB quickly with random data.

Stored Procedures came to mind since they would skip client-server latency.  We can call a stored procedure creating hundreds of documents with random data.

Each Stored Procedure runs in a partition, so we need something external to the stored procedure to loop and decide of the partition key.

Enter Logic Apps:  cheap to run and quick to setup.

Stored Procedure

Something important to realize is that some portal features aren’t supported when we deal with a partitioned collection.

One of them is to update the content of a stored procedure (same thing for triggers).  We therefore need to delete it and re-create it.

Here is the stored procedure we used:

function createRecords(recordCount) {
    var context = getContext();
    var collection = context.getCollection();
    var createdIds = [];

    for (i = 0; i < recordCount; i++) {
        var documentToCreate = { part: "abc", name: "sample" + i };
        var accepted = collection.createDocument(
            collection.getSelfLink(),
            documentToCreate,
            function (err, documentCreated) {
                if (err) {
                    throw new Error('Error' + err.message);
                }
                else {
                    createdIds.push(documentCreated.id);
                }
            });

        if (!accepted)
            return;
    }

    context.getResponse().setBody(createdIds)
}

We take the number of documents to create in parameter, loop & create documents.  We return the document IDs in a list in the output.

The documents we create are trivial:  no random data.

Logic App

On the canvas, let’s type Cosmos in the search box for actions.

image

Let’s choose Execute stored procedure.

We are prompted to create a new Cosmos DB connection.  We need to:

  • Type a name for the connection (purely for readability, can be anything)
  • Select an existing Cosmos DB collection

We can then pick the database ID, the collection ID & the stored procedure ID.

image

Stored Procedure parameters are expressed as a JSON array.  For instance here, we want to pass 1000 as the recordCount parameter, so we type [1000]:  no parameter name and always square brackets.

If we would run the app now we would get an error stating the operation requires the partition key.

In order to set the partition key, we need to Show advanced options.

image

In order to specify the partition key value, we simply type its value:  no square bracket, no quotes.

Now we can run the Logic App and it should execute the stored procedure and get its output in the action’s output.

Summary

Invoking a Cosmos DB stored procedure from Logic App isn’t rocket science but there are a few items to get straight in order for it to work properly.

Advertisements

Renaming Virtual Machine Disks

pexels-photo-208637[1]Let’s say we would like to rename disks on a Virtual Machine (VM).  Here we mean renaming the Azure Resource Name of the managed disk.  How would we go about that?

Why would we want to?  Primarily to get our internal nomenclature right.  A typical example is when we do migrate from unmanaged to managed disk (see article here) using the ConvertTo-AzureRmVMManagedDisk command.  This command converts all disks from page blobs to managed disks ; it gives the managed disks the name of the page blob and prepend the name of the VM.  That might not be your nomenclature & there is no way to override the names.

Nomenclature / naming convention is important if only to ensure clarity for human operators.

The Challenge

Our first challenge is that disks, like most Azure resources, can’t be renamed.  There is no such command.  For instance, if we look at Update-AzureRmDisk, it takes a disk object and the disk name is read only.

So we’ll need to actually copy the disks to change their names:  good old copy then delete the original scheme.

Our second challenge is that, as we’ve seen with the Virtual Machine anatomy, although data disks can be added and removed on the fly, the OS disk (i.e. primary disk) cannot.  That means we cannot swap the OS Disk to another disk.

We’ll need to recreate the VM to make it point to the disk copy with a new name.

So much for renaming, right?

The Solution

The solution we lay out here is based on ARM template.  You could accomplish something similar using PowerShell or Command Line Interface (CLI) scripts.

A demo of the solution is available on GitHub.  It deploys a Linux VM behind a public load balancer with SSH being routed to the VM.  In order to fully explore the demo, we need to initializes the data disks.

In general, the solution follows five steps:

  1. Determine the Virtual Machine ARM template
  2. Delete the Virtual Machine
  3. Copy disks with new names
  4. Re-create the Virtual Machine and attach to disk copies
  5. Delete original disks

Determine the Virtual Machine ARM template

Since we’ll recreate the VM using ARM template, we need to determine the ARM Template of the VM.

If we already have it because we proceed with ARM template in general, then done.  Otherwise we need to work a little bit.

The best approach usually is to use the Automation Script option on the left hand side menu of either the VM or its resource group.

image

From there we can find the node for our VM and then mechanically we can clean up the template.

We do not need the template for the entire resource group.  We only need the template for the VM itself (not its NICs or VNET, etc.).

Delete the Virtual Machine

Let’s delete the Virtual Machine to better recreate it.

We will use a PowerShell command.  Using the Azure Portal would yield the same result.


$rgName = 'ren' # or the Resource Group name we use
$vmName = 'Demo-VM'    # or the name of the VM we use
Remove-AzureRmVM -Force -ResourceGroupName $rgName -Name $vmName

Of course, we need to replace the variable with values corresponding to our case at hand.

This deletes the VM but leaves all its artefact behind:  VNET, Public IP, NIC, Disks, etc.  .  We’ll be able to attach back to those.

Copy disks with new names

We’re going to use a new ARM template to copy disks.  Here is our demo solution’s template.

Basically, the ARM templates create new disks using the creationOption value copy, pointing to the original disk of the VM.

For the demo solution, we use a fancy trick where we map the old and new disk name in a variable:


"disks": [
  {
    "oldName": "Demo-VM-OS",
    "newName": "Clone-Demo-OS"
  },
  {
    "oldName": "Demo-VM-data2",
    "newName": "Clone-Demo-data2"
  },
  {
    "oldName": "Demo-VM-data3",
    "newName": "Clone-Demo-data3"
  }
]

and then we use a copy construct to loop to the JSON array:


    {
      "comments": "Copy existing disks in order to change their names",
      "apiVersion": "2017-03-30",
      "copy": {
        "name": "snapshot-loop",
        "count": "[length(variables('disks'))]"
      },
      "type": "Microsoft.Compute/disks",
      "name": "[variables('disks')[copyIndex()].newName]",
      "location": "[resourceGroup().location]",
      "sku": {
        "name": "Premium_LRS"
      },
      "properties": {
        "creationData": {
          "createOption": "copy",
          "sourceUri": "[resourceId('Microsoft.Compute/disks', variables('disks')[copyIndex()].oldName)]"
        }
      }
    },


One of the advantage of using ARM templates to copy the disks is that the copy is parallelized:  in the case of our demo solution, we have 3 disks and they are copied in parallel instead of one after.  The is of course faster.

Re-create the Virtual Machine and attach to disk copies

In the same ARM template, we can recreate the VM.  This is what we do in our demo solution’s template by adding a dependency on the disks.

The VM is recreated by attaching to the disk copies.  Similarly, it links back to its NIC.

Delete original disks

At this point we did “rename the disks”.  We just have some cleanups to do with the original disks.

We simply delete them:


$rgName = ‘ren’ # or the Resource Group name you used
$oldDisks = 'Demo-VM-OS', 'Demo-VM-data2', 'Demo-VM-data3'

$oldDisks | foreach {Remove-AzureRmDisk -ResourceGroupName $rgName -Force -DiskName $_}

Again, replacing the first two variables by what make sense in our use case.

Summary

We did come up with a recipe to rename managed disks by copying them and attaching the copies to a recreated VM.

Our demo example had a lot of specifics:

  • It’s a Linux VM (Windows would be very similar)
  • It’s exposed through a load balancer on a public IP (this doesn’t matter, only its NIC matter ; the NIC is the one being load balanced)
  • It had 2 data disks

The solution would change depending on the specifics of the VM but the same steps would apply.

Azure Virtual Machines Anatomy

hand-2194170_640Virtual Machines can be pretty complex little beast.  They can have multiple disks, multiple NICs in different subnets, can be exposed on the public internet either directly or through a load balancer, etc.  .

In this article, we’ll look at the anatomy of a Virtual Machine (VM):  what are the components it relates to.

We look at the Azure Resource Model (ARM) version of Virtual Machine, as opposed to Classic version.  In ARM, Virtual Machines have a very granular model.  Most components that relate to a VM are often assimilated to the VM itself when we conceptualize them (e.g. NIC).

Internal Resource Model

Here is component diagram.  It shows the different components, their relationship and the cardinality of the relationships.

image

Virtual Machine

Of course, the Virtual Machine is at the center of this diagram.  We look at the other resources in relationship to a Virtual Machine.

Availability Set

A Virtual Machine can optionally be part of an availability set.

Availability Set is a reliability construct.  We discuss it here.

Disk

A Virtual Machine has at least one disk:  the Operating System (OS) disk.  It can optionally have more disks, also called data disks, as much as the Virtual Machine SKU allows.

Network Interface Controller (NIC)

NIC is the Networking bridge for the Virtual Machine.

A Virtual Machine has at least one (and typical VMs have only one) but can have more.  Network Virtual Appliances (NVAs) are typical cases where multiple NICs are warranted.

We often say that a Virtual Machine is in a subnet / virtual network and we typically represent it that way in a diagram:  a VM box within a subnet box.  Strictly speaking though, the NIC is part of a subnet.  This way a Virtual Machines with multiple NICs could be part of multiple subnets which might be from different Virtual Networks.

A NIC can be load balanced (in either a private or public load balancer) or can also be exposed directly on a Public IP.

Subnet / Virtual Network

Azure Virtual Network are the Networking isolation construct in Azure.

A Virtual Network can have multiple subnets.

A NIC is part of a subnet and therefore has a private IP address from that subnet.  The private IP address can be either static (fixed) or dynamic.

Public Azure Load Balancer

On the diagram we distinguish between Public & Private Load Balancers but they are the same Azure resource per se although used differently.

A Public Load Balancer is associated with a Public IP.  It is also associated to multiple NICs to which it forwards traffic.

Public IP

A public IP is exposed on the public internet.  The actual IP address can be either static or dynamic.

A public IP routes traffic to NICs either through a public load balancer or directly to a NIC (when the NIC exposes a public IP directly).

Private Azure Load Balancer

A private load balancer forwards traffic to multiple NICs like a public load balancer.

A private load balancer isn’t associated to a public IP though.  It has a private IP address instead and is therefore part of a subnet.

Cast in stone

pexels-photo-96127[1]We looked at VM components.  That gives us a static view of what a VM is.

Another interesting aspect is the dynamic nature of a VM.  What can change and what cannot?

For better or worse we can’t change everything about a VM once it’s created.  So let’s mention the aspect we can’t change after a VM is created.

The primary NIC of a VM is permanent.  We can add, remove or change secondary NICs but the primary must stay there.

Similarly, the primary disk, or OS disk, can’t be changed after creation while secondary disks, or data disks, can be changed.

The availability set of a VM is set at creation time and can’t be changed afterwards.

Summary

We did a quick lap around the different resources associated to a Virtual Machine.

It is useful to keep that mental picture when we contemplate different scenarios.

Virtual Network Service Endpoint – Hello World

In our last post we discussed the new feature Virtual Network Service Endpoint.

In this post we’re going to show how to use that feature.

We’re going to use it on a storage account.

We won’t go through the micro steps of setting up each services but we’ll focus on the Service Endpoint configuration.

Resource Group

As usual for demo / feature trial, let’s create a Resource Group for this so we can wipe it out at the end.

Storage Account

Let’s create a storage account in the resource group we’ve just created.

Let’s create a blob container named test.  Let’s configure the blob container to have a public access level of Blob (i.e. anonymous read access for blobs only).

Let’s create a text file with the proverbial Hello World sentence so we can recognize it.  Let’s name that file A.txt in it and copy it in the blob container.

We should be able to access the file via its public URL.  For instance, given a storage account named vplsto we can find the URL by browsing the blobs.

image

Then selecting the container we can select the blob.

image

And there we should have access to the blob URL.image

We should be able to open it in a browser.

image

Virtual Machine

Let’s create a Virtual Machine within the same resource group.

Here we’re going to use a Linux distribution in order to use the CURL command line later on but obviously something quite similar could be done with a Windows Server.

Once the deployment is done, let’s select the Virtual Network.

image

Let’s select the Subnet tab and then the subnet where we deployed the VM (in our case the subnet is names VMs).

image

At the bottom of the page, let’s select the Services drop down under Service Endpoints section.  Let’s pick Microsoft.Storage.

image

Let’s hit save.

Separation of concerns

This is the Virtual Network configuration part we had to do.  Next we’ll need to tell the storage account to accept connections only from our subnet.

By design the configuration is split between two areas:  the Virtual Network and the PaaS Service (Storage in our case).

The aim of this design is to have potentially two individuals with two different permission sets configuring the services.  The network admin configures the Virtual Network while the DBA would configure the database, the storage admin would configure the storage account, etc.  .

Configuring Storage Account

In the Storage Account, main screen, let’s select Firewalls and virtual networks.

image

From there, let’s select the Selected Networks radio button.

Then let’s click on Add existing virtual network and select the VNET & subnet where the VM was deployed.

Let’s leave the Exceptions without changing it.

image

Let’s hit save.

If we refresh our web page pointing to the blob we should have an Authorization error page.

image

This is because our desktop computer isn’t on the VNET we configured.

Let’s SSH to the VM and try the following command line:

curl https://vplsto.blob.core.windows.net/test/A.txt

(replacing the URL by the blob URL we captured previously).

This should return us our Hello World.  This is because the VM is within the subnet we configured within the storage account.

Summary

We’ve done a simple implementation of Azure Virtual Network Service Endpoints.

It is worth nothing that filtering is done at the subnet level.  It is therefore important to design our Virtual Network with the right level of granularity for the subnets.

VNET Service Endpoints for Azure SQL & Storage

internet-1676139_640It’s finally here, it has arrived:  Azure Virtual Network Service Endpoints.

This was a long requested “Enterprise feature”.

Let’s look at what this is and how to use it.

Please note that at the time of this writing (end-of-September 2017) this feature is available only in a few region in Public Preview:

  • Azure Storage: WestCentralUS, WestUS2, EastUS, WestUS, AustraliaEast, and AustraliaSouthEast
  • Azure SQL Database: WestCentralUS, WestUS2, and EastUS

Online Resources

Here is a bit of online documentation about the topic:

The problem

The first (historically) Azure Services, e.g. Azure Storage & Azure SQL, were built with a public cloud philosophy:

  • They are accessible through public IPs
  • They are multi-tenant, e.g. public IPs are shared between many domain names
  • They live on shared infrastructures (e.g. VMs)
  • etc.

Many more recent services share many of those characteristics, for instance Data Factory, Event Hub, Cosmos DB, etc.  .

Those are all Platform as a Service (PaaS) services.

Then came the IaaS wave, offering more control and being less opinionated about how we should expose & manage cloud assets.  With it we could replicate in large parts an on premise environment.  First came Virtual Machines, then Virtual Networks (akin to on premise VLANs), then Network Security Groups (akin to on premise Firewall rules), then Virtual Network Appliances (literally a software version of an on premise Firewall), etc.  .

Enterprises love this IaaS as it allows to more quickly migrate assets to the cloud since they can more easily adapt their governance model.

But Enterprises, like all Cloud users, realize that the best TCO is in PaaS services.

This is where the two models collided.

After we spent all this effort stonewalling our VMs within a Virtual Network, implementing access rules, e.g. inbound PORT 80 connections can only come from on premise users through the VPN Gateway, we were going to access the Azure SQL Database through a public endpoint?

That didn’t go down easily.

Azure SQL DB specifically has an integrated firewall.  We can block all access, leave only IP ranges (again good for connection over the internet) or leave “Azure connections”.  The last one look more secure as no one from an hotel room (or a bed in New Jersey) could access the database.  But anyone within Azure, anyone, could still access it.

The kind of default architecture was something like this:

image

This did put a lot of friction to the adoption of PaaS services by Enterprise customers.

The solution until now

The previous diagram is a somewhat naïve deployment and we could do better.  A lot of production deployments are like this though.

We could do better by controlling the access via incoming IP addresses.  Outbound connections from a VM come through a Public IP.  We could filter access given that IP within the PaaS Service integrated Firewall.

In order to do that, we needed a static public IP though.  Dynamic IP preserves their domain name but they aren’t guaranteed to preserve their underlying IP value.

image

This solution had several disadvantages:

  • It requires a different paradigm (public IP filtering vs VNET / NSGs) to secure access
  • It requires static IPs
  • If the VMs were not meant to be exposed on the internet, it adds on configuration and triggers some security questions during reviews
  • A lot of deployment included a “force tunneling” to the on premise firewall for internet access ; since the Azure SQL DB technically is on the internet, traffic was routed on premise, increasing latency substantially

And this is where we were at until this week when VNET Service Endpoints were announced at Microsoft Ignite.

The solution

The ideal solution would be to instantiate the PaaS service within a VNET.  For a lot of PaaS services, given their multi-tenant nature, it is impossible to do.

That is the approach taken by a lot of single-tenant PaaS services though, e.g. HD Insights, Application Gateway, Redis Cache, etc.  .

For multi-tenant PaaS where the communication is always outbound to the Azure service (i.e. the service doesn’t initiate a connection to our VMs), the solution going forward is VNET Service Endpoints.

At the time of this writing, only Azure Storage, Azure SQL DB & Azure SQL Data Warehouse do support that mechanisms.  Other PaaS services are planned to support it in the future.

VNET Service Endpoints does the next best thing to instantiating the PaaS service in our VNET.  It allows us to filter connections according to the VNET / Subnet of the source.

This is made possible by a fundamental change in the Azure Network Stack.  VNETs now have identities that can be carried with a connection.

So we are back to where we wanted to be:

image

The solution isn’t perfect.  For instance, it doesn’t allow to filter for connections coming from on premise computers via a VPN Gateway:  the connection needs to be initiated from the VNET itself for VNET Service Endpoints to work.

Also, the access to PaaS resources is still done via the PaaS public IP so the VNET must allow connection to the internet.  This is mitigated by new tags allowing to target some specific PaaS ; for instance, we could allow traffic going only to Azure SQL DBs (although not our Azure SQL DB instance only).

The connection does bypass “force tunneling” though and therefore the traffic stays on the Microsoft Network thus improving latency.

Summary

VNET Service Endpoints allow to secure access to PaaS services such as Azure SQL DB, Azure SQL Data Warehouse & Azure Storage (and soon more).

It offers something close to bringing the PaaS services to our VNET.

Moving from Standard to Premium disks and back

Azure Managed Disks (introduced in February 2017) simplified the way Virtual Machine disks are managed in Azure.

A little known advantage of that resource is that it exposes its storage type, i.e. Standard vs Premium, as a simple property that can easily be changed.

Why would we do that?  Typically we’ll move from standard to premium storage to improve the disk latency but also its resilience (for instance, only VMs with Premium disks can have a Single-VM SLA).  We might want to move from Premium to Standard in order to drive the cost of solution down, although the storage rarely dominates the cost of a solution.

In general, it can be interesting to test performance on both.  As with many things in Azure, you can quickly do it, so why not?

Managed Disks

For this procedure to work, we need managed disk.

If our Virtual Machine have unmanaged disks (aka .vhd file in a blob storage container), we do need to convert them to managed disk first.

Fortunately, there is a simple procedure to migrate to managed disk.

Portal Experience

Let’s start with the portal experience.

First, let’s open a Resource Group where I know I do have some VMs.

image

There are two resources that should interest us in there.

The first one is a Virtual Machine.  We’ll need to make sure Virtual Machines are shutdown from the portal’s perspective, i.e. they aren’t provisioned anymore (as opposed to doing a shutdown from within the VMs).

The second resource is a disk.  Let’s click on that one.

image

Changing the account type is right on the overview tab of the disk resource.  We can simply change it, hit save, and within seconds the disk is marked as changed.

What really happens is that a copy is triggered in the background.  The disk can be used right way thanks to a mechanism called “copy on read”:  if the VM tries to read a page of the disk which hasn’t been copied yet, that page will be copied first before the read can occur.

For this reason we might experiment a little more latency at first so for performance test it is better to wait.  There are no mechanism to know when the background copy is completed so it is best to assume the worst for performance test.

PowerShell Script

The Portal Experience is quite straightforward, but as usual, automation via PowerShell scripts often is desirable if we have more than a handful of migration to do.  For instance, if we have 10 disks to migrate

As with the Portal Experience, we need to shutdown the impacted Virtual Machines first.  This can also be done using PowerShell scrip but I won’t cover it here.

The main cmdlets to know here are Get-AzureRmDisk & Update-AzureRmDisk.

We first do a GET in order to get the disk meta-data object, we then change the AccountType property and do an UPDATE to push back the change.

In the following example, I zoom in to a Resource Group and convert all the disks to Premium storage:


$rg = "Docker"

Get-AzureRmDisk -ResourceGroupName $rg | foreach {
    $disk = $_
    $disk.AccountType = "PremiumLRS"
    Update-AzureRmDisk -ResourceGroupName $disk.ResourceGroupName -DiskName $disk.Name -Disk $disk
}

The property AccountType can take the following values:

  • StandardLRS
  • PremiumLRS

Summary

We’ve seen how to easily migrate from one type of storage to another with Azure Virtual Machine Managed Disks.

This allows us to quickly change the property of an environment either permanently or in order to test those parameters (e.g. performance, stability, etc.).

How to know where a Service is Available in Azure

pexels-photo-269633[1]Azure has a Global Footprint of 40 regions at the time of this writing (mid-September 2017).

Not all services are available in every regions.  Most aren’t in fact.  Only foundational services (e.g. storage) are available everywhere.

In order to know where a service is available, we can look at:

https://azure.microsoft.com/en-us/regions/services/

This is handy when we’re building an architecture or a quote.

What if we want to build some automation around the availability of a service or simply check it via PowerShell because opening a browser is too hard today?

There are really 2 ways to get there.  Either we look at a specific region and query that services are in there or we look at a service and query where it’s available.

Provider Model

Services aren’t “first class citizens” in Azure.  Resource Providers are.

Each resource provider offers a set of resources and operations for working with an Azure service.

Where is my service available?

Let’s start by finding the regions where a given service is available.

The key PowerShell cmdlet is Get-AzureRmResourceProvider.

Let’s start by finding the service we’re interested at.


Get-AzureRmResourceProvider | select ProviderNamespace

This returns the name of all the Azure provider namespaces (around 40 at the time of this writing).

Let’s say we are interested in Microsoft.DataLakeStore.


Get-AzureRmResourceProvider -ProviderNamespace Microsoft.DataLakeStore

This returns the resource providers associated with the given namespace.

We now need to pick the one with the resource types interesting us.  In this case, let’s say, we are interested in Azure Data Lake Store accounts (the core resource for the service).  We can see it’s available in three regions:


ProviderNamespace : Microsoft.DataLakeStore
RegistrationState : Registered
ResourceTypes     : {accounts}
Locations         : {East US 2, North Europe, Central US}

Which services are available in my region?

Now, let’s take the opposite approach.  Let’s start with a region and see what services are available in there.

Here the key cmdlet is Get-AzureRmLocation


Get-AzureRmLocation | select Location

This lists the region we have access to.  A user rarely have access to all region which is why the list you see likely is smaller than 40 items at the time of this writing.

Let’s look at what’s available close to my place, canadaeast.


Get-AzureRmLocation | where {$_.Location -eq "canadaeast"} | select -ExpandProperty Providers

This gives us a quick view of what’s available in a region.

Summary

We saw how to query Azure REST API using PowerShell in order to know where a service is available or what services are available in a region.

This could be especially useful if we want to automate such a check or doing more sophisticated queries, e.g. which region have service X & Y available?