In this blog, I will assume you know how to setup your workbench.
In general, there are quite a few great resources for Azure ML:
On the agenda: I will take the sample set used in my previous post, perform a linear regression on it and validate that all I said about linear regression and Machine Learning is true.
This blog was done using Azure ML in mid-July 2015. Azure ML is a product in evolution and the interface will certainly change in the future.
Create Data Source
First let’s get the data.
I will work with the same data set than in the previous articles that can be found here. It is about the relationship between heights and weights in humans. We will try to predict the weight given the height of different individuals.
For the point of this blog, having 200 data points is plenty and will ease the manipulation. Let’s cut and paste their table into an Excel spreadsheet and save it into a CSV format.
Yes, believe it or not, Azure ML cannot load native Excel files! So you need to format it in CSV.
We will create an Azure ML Data set from it. In Azure ML workbench, select the big plus button at the bottom left of the screen, then select Data Set, then select From Local File
Select the file you just saved in Excel, and OK that.
Click the big plus button again (at the bottom left of the screen), then select Experiment:
and select Blank Experiment:
The Workbench will present you with a sort of experiment template. Don’t get too emotionally attached: it will disappear once you drop the first shape in (which will be in 30 seconds).
Right off the bat, you can change the name of the experiment in order to make it easier to find. Let’s call it Height-Weight. Simply type that in the canvas:
You should see your data set under “My Data Set” on the left pane:
Your data set will have the same name you gave it, or by default the CSV file name you uploaded.
Let’s drag the data set onto the canvas.
In the “search experiment items”, let’s type ‘project’
Then we can select Project Columns and drop it on the canvas
We then have to link the two shapes. In a rather counter-intuitive way, you have to start from the Project Columns shape towards the data set shape.
Why do I want to project column? Because I do not want to include the index column in. Actually I did that while preparing this blog entry and the index column was used by the regression and gave bizarre results.
Let’s select the Project Columns shape (or module). In the properties pane, on the right, you should be able to see a Launch column selector. Well, let’s select the selector.
We then simply select the two columns that has meaning: height and weight.
We are now going to train a model so let’s drop a Train Model module in there. If you followed so far, yes, type “train model” in the module search box, select train model and drop it under Project Columns.
Our model will be a linear regression so let’s find that module too. There are a few type of linear regression modules, today we’ll use the one named “Linear Regression”, found under “Regression”.
Let’s link the module this way:
We need to tell the model what variable (column) we want to predict. Let’s select the train model module which should allow us to launch the column selector (from the properties pane).
You should only have the choice between weight and height. Choose weight.
You should notice that you do not see the index column. That’s because we projected it out basically.
Now you can run the whole thing: simply click the Run button at the bottom of the screen.
It takes a little while. You’ll see a clock icon on your different module until it becomes a green check as they all get run.
Wow, we have our first linear regression. What should we do with it?
Let’s plot its prediction against the data in Excel. First, let’s find the computed parameters of the model.
Let’s right click on the bottom dot of the train model module and select View Results.
At the bottom of the screen you should have those values:
Remember the formula f(x) = m*x + b? Well, the bias is b while the other one (oddly name I should add) is the slope m.
In Excel, we can punch the following formula: =-105.377+3.42311*B2 and copy it for every row. Here I assume the A column is the index, B is the “Height(Inches)”, C is the “Weight(Pounds)” and D is the column where you’ll enter that formula. You can add a title to the column “Computed”.
You spreadsheet should look like:
You can see that the computed value isn’t quite the value of the weight but is in the same range. If you plot all of that you should get something like:
The blue dots is the data while the orange ones are the prediction.
You can see that the line is passing through the cloud of data, predicting the data as well as a line can do.
Maybe you found the way I extracted the model parameter to then enter an equation in Excel a bit funny. We could actually ask AzureML to compute the prediction for the data.
For that, let’s drop a Score Model module on the canvas and link it this way:
Basically we are using the model on the same data that was used to train it (output from the Project Columns module).
Let’s run the experiment one more time (run button at the bottom of the screen). We can then right click on the bottom dot of the Score Model module and select View Results.
You can then compare the values in the Scored Labels column to the one in the computed columns in Excel and see they are the same.
We could have exported the results using a writer module, but it does require quite a few configuration to do.
I assume very little knowledge of the tool so this blog post was a bit verbose and heavy in images.
My main goal was to show you concrete representation of the concepts we discussed before:
- Independent / Dependent variable in a data set
- Predictive model
- Linear Regression
- Optimal parameters through training
The cost function was implicit here as an option in the Linear Regression module. You can see that by clicking on the module.
Another good introduction to AzureML is the Microsoft Virtual Academy training course on Machine Learning.
UPDATE: See a slightly more advance example in AzureML – Polynomial Regression with SQL Transformation.