Recreating VMs in Azure

From this article I’m going to explain how to destroy VMs, keep their disks on the backburner and re-create them later.

Why would you do that?

After all, you can shut down VMs and not be charged for it.  You can later restart them and incur the compute cost only once it’s started.  So why destroy / recreate VMs?

In general, because you get an allocation failure.  I did touch upon that in a past article.

Azure Data Centers are packaged in stamps (group of physical servers).  A dirty secret of Azure is that your VM stays in a stamp even if you shut it down.  So if you need to move your VM, you need to destroy it and recreate it.

Two typical use cases:

  • You want to change the size of your VM to a size unsupported by its current stamp (e.g. D-series stamp often do not support G-series VMs)
  • You want to restart your VM after a bit of time and the stamp is full so you can bring your VM in

Frankly, this is all sad and I wish it will be addresses in a future service update.  In the mean time, we do need a way to easily destroy / recreate VMs.

Before I dive into it, there is of course no magic and you have to make sure the Azure region you deploy in support the service you’re trying to deploy.  Many services aren’t available everywhere (e.g. G-series VMs, Cortana Analytics services, etc.).  Consulting the list of services per region to validate your choice.

The technique I show here is based on a my colleague’s article.  Alexandre published that article before the Export Template feature was made available.  This article will therefore use ARM templates generated by Export Template in order to resuscitate VMs.

An alternative approach would be to use PowerShell scripts to recreate the VMs.

Also I based this article on the deployment of a SQL Server Always on Cluster (5 VMs).

Exporting current state

I assume you have a running environment within an Azure Resource Group.

The first step is to export your Azure Resource Group as an ARM Template.  Refer to this article for details.

Save the JSON template somewhere.

Deleting VMs

You can now delete your VMs.  You might want to hook that to a Run Book as I did in Shutting down VMs on schedule in Azure (e.g. to shut down the VMs frequently) or might just do it once (e.g. to resize VM once).

One way to do it, with PowerShell, is to use the following command:

Get-AzureRmVM | Where-Object {$_.ResourceGroupName -eq "NAME OF YOUR RESOURCE GROUP"} | Select-Object Name, ResourceGroupName | ForEach-Object {Remove-AzureRmVM -ResourceGroupName $_.ResourceGroupName -Name $_.Name -Force}

Here I basically list the VMs within one resource group and delete them.

This will not delete the disks associated to them.  It will simply shut down the compute and de-allocate the compute resource.  Everything else, e.g. networks, subnets, availability sets, etc. stay unchanged.

If you’re afraid to delete VMs you didn’t intend to delete you can call the Remove-AzureRmVM explicitly on each VM.

Adapting ARM Template

Export Template does the heavy lifting.  But it can’t be used as is.

We have to make several changes to the template before being able to use it to recreate the VMs.

  1. Remove SecureString password parameters.  We will remove all reference to those so you shouldn’t need it.  The reason is the admin password of your VM is stored on disks and will restored with those disks.  This isn’t essential, but it will avoid you being prompted for password when you’ll run the template.
  2. Change all the createOption attributes to Attach.  This tells Azure to simply take the disks in storage as opposed to creating a disk from a generalized image.
  3. Just next to the createOption for the os-disk, add a osType attribute of value either Windows or Linux.
  4. Remove (or comment out) a few properties:
    • imageReference:  that is under storageProfile
    • osProfile:  that is after storageProfile
    • diskSizeGB:  that is under each dataDisks

Here’s how the osDisk property should look after modifications

"osDisk": {
"name": "osdisk",
"osType": "Windows",
"createOption": "Attach",
"vhd": {

After this little massage of the ARM template you should be able to run the template and recreate your VMs as is.

Making your own modifications

In some cases you might want to modify the ARM template some more.  For instance, by changing the size of VMs.  You can do this now.

Caveat around availability set

There are some funny behaviours around availability sets and VM size I found while writing this article.

One thing is that you can’t change the availability set of a VM (as of this writing).  So you need to get it right the first time.

Another is that a load balancer needs to have VMs under the same availability set.  You can’t have two or VMs without availability sets.

The best one is that you can’t have an availability set with VMs of different size families.  In the case I used, i.e. the SQL Always on cluster, the SQL availability set has two SQL nodes and one witness in it.  The original template let you only configure those as D-series.  You can’t change them to G-series later on and this is one of the reason you’ll want to use the technique laid out here.  But…  even then, you can’t have your witness as a D-series and the SQL nodes as G-series.  So you need to have at least a GS-1 as the witness (which is a bit ridiculous considering what a witness does in a SQL cluster).

That last one cost me a few hours so I hope reading this you can avoid wasting as much on your side!

Running the template

You can then run the template.

My favorite tool for that is Visual Studio but you can do it directly in the portal (see this article for guidance).


Destroying / recreating your VMs will ensure you to have a more robust restart experience.

It also allows you to get around other problems such as re-configuring multiple VMs at once, e.g. changing the size of all VMs in an availability set.

Thanks to Export Template feature, it isn’t as much work as it used to be a few months ago.

Azure Export Template – Your new best friend

From https://pixabay.comAzure Resource Manager (ARM) basically is Azure Infrastructure version 2.0.  It has been released for about a year now, although not all Azure Services have caught up yet.

With ARM comes ARM templates.  An ARM template is a description of a group of resources (and their dependencies) in JSON.  It’s a powerful mechanism to deploy resources in Azure and replicate environments (e.g. ensuring your pre-prod & prod are semi-identical).

Up until a few months ago, the only way to create an ARM template was to either build it from scratch or modify an existing one.  You can see examples of ARM templates in the Azure Quickstart Templates.

Enter Export Template.

Your new best friend

If you have authored ARM templates, you know this can be a laborious process.  The JSON dialect is pretty verbose with limited documentation and each iteration you trial involves deploying Azure resources which isn’t as fast as testing HTML (to put it lightly).

A more natural workflow is to create some resources in an Azure Resource Group, either via the portal or PowerShell scripts, and then have Azure authoring a template corresponding to those resources for you.

This is what Export Template does for you.

Open your favorite Resource Group and check at the bottom of the settings:


When you click that option, you’ll get a JSON template.  That template, running on an empty Resource Group would recreate the same resources.

One nice touch of that tool is that it infers parameters for you.  That is, knowing the resources you have, it figures out what attribute would make sense in parameters.  For example, if you have a storage account, since the name needs to be globally unique (across all subscriptions in Azure, yours & others), it would make sense you do not hardcode the name, so it would put it as a parameter with the current value as a default.



As usual, there is no magic ; so for services not yet supported in ARM, this tool won’t give you a template for those.  It will warn you about it though.

For all supported scenarios, this is a huge time saver.  Even if you just use it as a starting point and modify it, it’s much faster than starting from scratch.

Remember that the aim of an ARM template is to describe an entire Resource Group.  Therefore the Export Template is a Resource Group tool (i.e. it’s in the menu of your Resource Group) and it wraps all the resources in your group.

Training a model to predict failures

Today a quick entry to talk about a twist on Machine Learning for the predictive maintenance problem.

The Microsoft Cortana Intelligence team wrote an interesting blog the other day:  Evaluating Failure Prediction Models for Predictive Maintenance.

When you listen to all the buzz around Machine Learning, it sometimes feels as if we’ve solved all the ML problems and you just need to point your engine in the general direction of your data for it to spew some insights.  Not yet.

thumb-794696_640That article highlights a challenge with predictive maintenance.  Predictive maintenance is about predicting when a failure (of a device, hardware, process, etc.) will occur.  But…  you typically have very few failures:  your data set is completely imbalanced between success samples and failure samples.

If you go heads on and train a model with your data set, a likely outcome is that your model will predict success %100 of the time.  This is because it doesn’t get penalized that much by ignoring the few failures and gets rewarded by not missing any success.

Remember that Machine Learning is about optimizing a cost function over a sample set (training set).  I like to draw the comparison with humans on a social system with metrics (e.g. bonus in a job, laws in society, etc.):  humans will find any loophole in a metric in order to maximize it and rip the reward.  So does Machine Learning.  You therefore have to be on the lookout for those loopholes.

With predictive maintenance, failures often cost a lot of money, sometimes more than a false positive (i.e. a non-failure identified as a failure).  For this reason, you don’t want to miss failures:  you want to compensate for the imbalance in your data set.

The article, which I suggest you read in full when you’ll need to apply it, suggests 3 ways to compensate for the lack of failures in your data set:

  1. You can resample the failures to increase their occurrence ; for instance by using the Synthetic Minority Oversampling Technique (SMOTE) readily available in Azure ML.
  2. Tune the hyper-parameters of your model (e.g. by using a parameter sweep, also readily available in Azure ML) to optimize for recall.
  3. Change the metric to penalize false negative more than false positive.

This is just one of the many twists you have to think of when creating a predictive model.

My basic advice:  make sure you not only ask the right question but with the right carrot & stick Winking smile  If a failure, to you, costs more than a success identified as a failure (i.e. a false positive), then factor this in to your model.

How does criticism hit your brain?

Everybody loves a critic, right?  How do you give feedback to somebody and be effective, i.e. without your message getting deflected on their “shield”?


An area I found especially hard to deliver constructive feedback is presentation / public speaking skills.  Criticizing the way somebody speaks or organizes his / her thoughts often hits very close to home.

My experience is that getting feedback on presentations I do is though too.  Believe me:  it’s not because of lack of material as I’m very far from perfection!  Some people can’t articulate what you do wrong, most do not dare in fear of offending you.

Last week I witnessed a colleague giving feedback on somebody’s presentation.  I found him especially effective…  but I didn’t quite understand why.  He was positive and uplifting while still suggesting ways to improve.  Beyond that I couldn’t see how I could replicate.  The following day I read an article from Fast Company discussing criticism that explained exactly what my colleague was doing right.  So I thought I would share it here:  Why Criticism Is So Tough To Swallow (And How To Make It Go Down Easier) by Caroline Webb.

Form & Content

There are two parts about delivering criticism:

  • The form, which I’m going to discuss here
  • The content:  what do you see in somebody’s presentation that could be improved?

The second part requires experience in order to go beyond trivialities (e.g. we couldn’t hear you).  Unless you have a job where you are presenting & seeing colleagues present all the time, you won’t get experience quickly.  If that’s your case, I would suggest joining public speaking clubs (e.g. Toastmasters).

The Sandwich


The author mentions the good old praise sandwich.  If you aren’t familiar with the meal, let me get you two introduced.

Basically, it’s a ham sandwich where you replace the bread by praise and the ham by improvement suggestions (criticism).

I’ve been handed my fair share of that sandwich and I did cook a few myself.

Nothing could go wrong, right?  You manage the feelings of the other person by throwing praises for intro & conclusion, keeping the negative in the middle.

It is actually a good way to structure feedback.  The problem she points out is that many people keep the praise generic and the criticism specific.

The example she gave hit me because I’ve done a very similar one recently:  “your speech was great, I would do X differently, would improve Y this way and would consider doing Z, otherwise, fantastic speech!”

Why is this so ineffective?

Criticisms = Threat

pexels-photo-68421[1]Your ancestors were hunted by big animals.  They survived.

Why?  Because they were very alert about threats around them.  So alert that those threats were registering at an unconscious level, even when they were not looking for it.

They passed their genes to you, so your brain is constantly on the lookout for threats.  There are no wolves around anymore so it settles for other threats…  like criticism.

This is why we are very sensitive to criticism.  Criticism, socially, is dangerous.

By making criticism specific, a feedback raises the level of stress of the person you are delivering the feedback to.  They become defensive.  They stop listening.

Praises need to be concrete to be efficient

We love praises.  Praises are social rewards.  That’s good.

What we love even better is specific praises.

Our brain responds better to concrete ideas than abstract ones, despite all the linear algebra we did in college.

So when we say something like “you’re great but your vocabulary is repetitive”, the threat takes over the generic praise.

A better way to give feedback

So the better way the author suggests is:

  • Give specific praises:  give examples of what the person did well and expand on why you liked it.  “I really liked the way you started with a joke, you paused to let people laugh, you smiled, so you connected with the crowd, then you kicked in the topic”.
  • Build on the praises to suggest improvements:  “What could make it even more effective is if you would transition between sections by taking a pause of few seconds to let the last section sink and allow people top breath and be ready for more”.

Basically, you amplify your praise and you downplay your criticism, but in form, not in content

Actually, you don’t criticize, you suggest improvement on what’s already there.  It isn’t political correctness like calling a massive layoff “right sizing”.  The goal is different:  you don’t underline what’s missing, you suggest what could be done to make the whole better.


I really encourage you to read the original article.  It is a good read.

SAMSUNG CSCNow, something I would add is to go easy on criticism.  The truth of the matter is that people change very slowly.  So if you hit them with 8 criticisms, beside having them crying in fetal position in their shower, you won’t accomplish much.  It’s a long term game.  You give a few pointers at a time for them to improve a little at the time.

I took a badminton class when I was at the university and one day the teacher came in with a video camera.  He said something very wise, in the line of:

“A camera is a powerful tool to analyze the style of a player and it is used more and more in sports.  You have to be careful though.  A camera sees everything and you can destroy an athlete with it.”

When you look at somebody’s performance, be it a presentation or any kind of work that they do, you are the camera.  Be lenient.

But do give feedback.  Because you likely see something the other person doesn’t see.  Delivered appropriately, feedback is gift!

A first look at Azure Functions

Medium[1]Back in summer 2010 I called for a notification mechanism within Azure, something that would call customer-defined code to take action when something happen within your subscription, e.g. a file added to blob storage, a message added to a queue, etc.  .

Back then I called it an Event Bus, because buses were cool in 2010.

Finally, I got what I wanted in Azure Functions!

What it is

imageAzure Functions is a piece of code which executes when a trigger (timer, web hook, queue message, etc.) kicks in.

We can see Azure Functions as a micro-execution engine, or micro-compute on demand.

But really, an Azure Function is the Cloud equivalent to a C# event handler.  This is pretty powerful.

The model also includes Inputs & Outputs.  This integrates into the function code.  For instance, in C# inputs are passed as input parameter to the function.

You can hook the function to source control to quickly deploy new versions.

What it brings to the table

Why is this interesting?  Can’t you do that today by having a Web Job (or even a Worker Role) monitoring queues, blob storage and all?

Yes, you can.  In that sense, Azure Function doesn’t bring new capabilities, it facilitates it.

Instead of having a Web Job containing many handlers for disparate tasks, you have a lightweight, script-like component that react to one trigger.

Azure Functions really are an evolution of Web Jobs.  They use the same SDK, hook to the same triggers and…  run in the same compute:  App Service.

Another great things about Azure Functions is that it comes with the possibility of using a Dynamic App Service Plan.  With a Dynamic plan, you pay only for the compute milliseconds (up to 100ms) you use as oppose to standing up instances waiting for triggers to fire.  This brings a lot of cost agility to the table, although 10 instances still is the upper limit if many instances are required to run many functions at the same time.

Where to find more

As usual, I recommend the official documentation, more specifically, start with the overview and work your way down.


Azure functions bring a new way to create cloud event handlers.  It enables elastic computing by having those scripts running only when triggers are activated.

How to do Data Science

These days, it’s all about Data Science.

What is Data Science?

Last month Brandon Rohrer, from the Cortana Intelligence and Machine Learning Blog, came up with an excellent post.

How to do Data Science?

The post basically goes over the workflow I reproduced at the right here.

I found this article both complete and succinct:  a very good read.

It goes through all the motions you need to go through while analysing data, from fixing an objective (asking a question), formatting the data to manipulating the data and finally throw that to some Machine Learning.

It is something of a badly kept secret that %95 of Machine Learning is Yak Shaving around massaging data.  This is why I like the terminology Data Science because it does includes all the data manipulation you need to do to feed the beast.

So I strongly encourage you to read the post on

What is Statistics and why should you care?

Unless you graduated in art, chances are you did a course in Statistics.

Chances are you hated it.

Most people I know postponed that course until the end of their degree, didn’t understand much about it and hated it dearly.

I didn’t like it either and understood very little.

A few years later when I studied Machine Learning, I had to review Statistics on my own.  This is when I had the epiphany:  Wow!  This is actually not so complicated and can even be quite interesting!

There are two main reasons I hated my undergraduate course:

  • Examples were all around surveys:  I was studying physics at the time, I didn’t care about those
  • It was really geared towards a collection of recipes:  I love mathematics, elegant theories and understanding what I do, cheat sheets didn’t do it for me

I would like to share my epiphany from back then with you today.  Hopefully it will shade some light on the poorly understood topic of statistics.

This won’t be a deep dive in the science of statistics.  I want to explain what statistics is by capturing where it comes from and giving very simple examples.  I won’t make statisticians out of you today.  Sorry.

Layer Cake

I see statistics as a layer cake.  At the foundation we have combinatorics, then probability and finally at the top, we have statistics.


There lies one of the mistake, in my own opinion, of most statistics course:  they try to get statistics into your head without explaining the two colossus of science it is based on.

Try to explain calculus to somebody who has never seen f(x) = mx + b in 5 minutes and you won’t enlighten them either.

So let’s walk the layer cake from the bottom to the top.


Combinatoricsbranch of mathematics studying finite or countable discrete structures (Wikipedia).

Combinatorics is about counting stuff.  Counting elements in sets (cardinality) and then combining sets together.

You’ve done combinatorics, but you don’t remember, do you?  Let me jog your memory.

Ok, let’s say I have the set A = {1, 2, 3}.  How many pairs can I do with elements of A?  How many trios?  What if order is irrelevant and I just want to know the possible distinct pairs?

Yes, you’ve done that type of problems.  This is where you’ve learned a new meaning for the exclamation mark, i.e. the factorial:  n! = n x (n-1)! (with 0! = 1).

Let’s start an example I’ll carry over in the rest of the article.  Let’s say I have a die with six faces.  The set of possible outcome if I throw it is D = {1, 2, 3, 4, 5, 6}.

How many elements in D?  Six.  Not too hard?  Well, the point isn’t to be hard here.

You can get into quite complicated problems in combinatorics.  I remember an exam question where we had drawers filled with infinite amount of marbles having different colours and we had to mix them together…  that was quite fun.

A good example is the Rubik’s cube (from the Hungarian mathematician Ernő Rubik).  A Rubik cube has 6 faces, each having 9 squares.  6 colours, with 9 squares of each colours, 36 squares in total.  What is the number of possible configurations?  Are some configuration impossible given the physical constraints of the cube?


Probabilitymeasure of the likelihood that an event will occur.  Probability is quantified as a number between 0 and 1 where 0 indicates impossibility and 1 indicates certainty. (Wikipedia)

The canonical example is the toss of a coin.  Head is 1/2 ; tail also is 1/2.

What is an event?  An event is an element of the set of all possible event.  An event occurrence is random.

A special category of event is of great interest:  equipossible events.  Those are events which all have the same chance of occurence.  My coin tossing is like that.  So is my 6-faced die…  if it hasn’t been tempered with.

For those, we have a direct link with combinatorics:

P(event) = \frac{1}{\#events}

The probability of an event is one over the number of possible events.  Let’s come back to my die example:

P(1) =\frac{1}{\#D}=\frac{1}{6}

The probability to get a 1 is 1/6 since #D, the cardinality of D (the number of elements in D), is 6.  Same for all the other events, i.e. 2, 3, 4, 5, 6.

So you see the link with combinatorics?  You always compare things that you’ve counted.

If events aren’t equipossible, you transform your problem until they are.  This is where all the fun resides.

Now, as with combinatorics, you can go to town with probability when you start combining events, conditioning events and…  if you start getting into the objective versus subjective (Bayesian) interpretations.  But again, my goal isn’t to deep dive but just to illustrate what probability is.


Statisticsstudy of the collection, analysis, interpretation, presentation, and organization of data (Wikipedia).

I find that definition quite broad as plotting graph becomes statistics.

I prefer to think of statistics as a special class of probability.  In probability we define models by defining sets of events and the likelihood of events to occur ; in statistics we take samples and check how likely they are according to the model.

A sample is simply a real world trial:  throwing a die, tossing a coin, picking an individual from a population, etc.  .

For instance, given my 6 faces die, let’s say I throw it once and I get a 2.  How confident are we that my die is equipossible with 1/6 probability for each face?

Well…  it’s hard to tell, isn’t it?  A lot of probability model could have given that outcome.  Anything where 2 has a non-zero probability really.

What if I throw twice and get a 2 each time?  Well…  that is possible.  What about three 2?  What about 10?  You’ll start to get skeptical about my die being equipossible as we go, won’t you?

Statistics allow you to quantify that.

When you hear about confidence interval around a survey, that’s exactly what that is about.

If I take my sequence of ‘2’ in my die throwing experiment, we can quantify it this way.  Throwing a ‘2’ has a probability of 1/6, which is the same probability for any result.  Now having the same result n times has a probability P_{same} = (\frac{1}{6})^{n-1} while having a sequence of non-repeating results has probability of P_{different} = (\frac{5}{6})^{n-1}.  So as n increases, it gets less and less likely to have a sequence of the same result.  You can set a threshold and take a decision if you believe the underlying model or not ; in this case, if my die is or not a fair one.

Why should you care?

Statistics is at the core of many empirical sciences.  We use statistics to test theories, to test models.

Also, those three areas of mathematics (i.e. combinatorics, probability & statistics) spawn off into other theories.

For example, information theory is based on probability.  That theory in turn helps us understand signal processing & data compression.

Machine Learning can be interpreted as statistics.  In Machine Learning, we define a class of statistical model and then look at samples (training set) to find the best model fitting that sample set.

Moreover, those area of mathematics allow us to quantify observations.  This is key.  In Data Science, you take a volume of data and you try to make it talk.  Statistics help you do that.

When you take a data set and observe some characteristics (e.g. correlation, dependency, etc.), one of the first thing you’ll want to validate is the good old “is it statistically significant”?  This is basically figuring out if you could observe those characteristics by chance or are they a real characteristic of the data?  For instance, if I look at cars on the freeway and I observe a blue car then a red car a few times, is that chance or is there enough occurrence to think there is a real pattern?

So if you are an executive, you should care about statistics to go beyond just looking (visualizing) the data of your business and understanding, at least at a high level, what type of models and assumptions your data scientists are making on your data.  Are you training models to learn trend in your business?  If so, what are the models look like and how do they perform in terms of prediction?

If you are a developer / architect, you should care about statistics for two big reasons.  First, you are probably instrumental in taking decision on what type of data you collect from the application and at which frequency (e.g. telemetry).  If you log the number of users logged in once a day, your data scientists will have a hard time extracting information from that data.  The second reason is that you are likely going to use data to display and, more and more, to have your application take decision.  Try to understand the data and the models used for decision making.

We live in a world of data abundancy.  Data is spewing from every device & every server.  It is easy to see features in data that are simply noise or do not see feature because they aren’t visible when you visualize data.  Statistics is the key to your data vault.


I hope my article was more insightful than the statistic classes you remember.

Basically, combinatorics studies countable sets.  Probability uses combinatorics to assign probability (value between 0 & 1) to events.  Statistics takes sample and compare them to probability models.

Those fields of study have massive influence in many other fields.  They are key in Machine Learning and Data Science in general.