Machine Learning - An introduction - Part 1

As I mentioned before, I did specialize (through graduated studies) in Machine Learning only to drop the field after a few years of trial on the Marketplace.  I felt the field wasn’t ready for prime industrial applications.

011215_0459_TwentyYears1.jpgYears have past, the field has matured and now is an exciting time to be working in Machine Learning!  The possibilities have far outgrown the labs where they were born.

Yet, it still is a quite complex field being at the intersection of statistics, data analysis and, if you want to do it right, Big Data.

Before diving into Azure Machine Learning, I wanted to first give an overview of what Machine Learning is.  My favourite 10 minutes story is an example that is simple enough to grasp without prior ML knowledge:  few dimensions, few data points, simple ML algorithm.

In ML parlance, I’m going to give a linear regression example but you do not need to know about that to understand it.

The example

Machine learning is all about building models.  What is a model?  A model is a simplified version of reality:  it “embodies a set of assumptions concerning the generation of the observed data, and similar data from a larger population” (wikipedia).

In my example, we are going to predict the weight of person given its height.

We are going to build a model and that model will be able to predict the weight you should have if you measure 6 feet tall.

We already made quite a few assumptions.  We assumed the weight of a person is dependant on its height.  Written in mathematics:

weight = f(height)

That might remind you of a typical formulation

y = f(x)

But we’ll go further and assume that the weight has a linear relationship (y = m*x + b) with the height:

weight = m*height + b

Now of course, this is a very simplified model, a very naïve one.  There are many reasons why you might think this model is incomplete.  First, it doesn’t include a lot of variables, for instance, the age, the gender, nationality, whatever.  It’s alright, it’s a model.  Our goal is to make the best out of it.

Let’s look at some sample data I found on the web.  I’ve entered the data in my #1 analysis tool, Excel, and plotted it:


We can see a nice cloud of data and we could guess there is sort of a linear relationship between the height and the weight.  That is, there is a line sort of carrying this cloud.  Now the question I’ll ask you is:  where would you put the line?


I’ve hand drawn 4 lines.  The two bottom ones aren’t very convincing, but what about the two top ones?  Which one would you choose and why?  What criteria do you use?

The mathematical problem

y = m*x + b:  given a x, we can compute a y.  We have an independent variable, x, the height and a dependant variable, y, the weight.

We also have parameters:  the slope, m and the origin, b.

The model is described by its parameters.  Guessing what the line should be is guessing its slope and origin.

We also have sample data:  a data set of sample x’s & y’s.  We want to use that sample to deduce the parameters:

parameters = F(sample data)

This is the learning in Machine Learning.  We are showing examples of correct predictions to an algorithm (a machine) and we want the algorithm to figure out what is the best model to predict them all with the minimum number of errors and also to be able to predict new data it has never seen.

Simple enough?  What is the recipe?

Basically, we are going to consider many models (many values of m & b) and compare them using a cost function to select the best.  We will use a cost function to evaluate models.

A cost function of a given model on a data set is the sum of the cost function applied to each point in the data set:

Cost(model, data set) = Sum of Cost(model, point) over all points

In our example, an intuitive cost function would be the distance of a sample point to the line.  After all, we want the line to be “in the thick” of the cloud, ideally (impossible here) the points should all lie on the line.  The distances is represented by the green lines on the follow graph (I’ve just plotted the first 3 data points for clarity).


To make a story short, a more tractable cost function is the square of the distance measured on the y-axis:

Cost(model, {x, y}) = (predicted(x) - x)^2

If you are curious why, well, squares are easier to tackle mathematically than absolute values while yielding the same result in optimization problems.

Putting it all together, the machine learning problem is:

Find m & b that minimizes the sum of (m*x + b – y)^2 for all {x, y} in the data set

I formulated the problem in terms of our example, but in general, a machine learning is an optimization problem:  minimizing or maximizing some function of the sample set.

It happens that the problem as stated here can be resolve by linear algebra analytically, i.e. we can find the exact solution without approximation.

I won’t give the solution since the details of the solution aren’t the point of the article.


Let’s recapitulate what we did:

  1. We chose a prediction we wanted to make:  predict the weight of a person
  2. We chose the independent variables, only one, the height, and the dependant variables, only one, the weight
  3. We found a sample set to learn from
  4. We posited a class of models:  linear regressions with slope and origin as parameters
  5. We chose a cost function:  the square of the difference between sample values and predictions
  6. We optimize (minimize in this case) the sum of the cost function to find the optimal parameters

We end up with the optimal values for m and b.  We therefore have our model, f(x)=m*x+b, and we can make prediction:  for any x we can predict f(x), i.e. for any height we can predict the weight.

We used examples to infer rules.  This is Machine Learning in a nutshell.  We let the Machine extract information from a sample set as opposed to trying to understand the field (in this case biology I suppose) and trying to derive the rules from that understanding.


I hope this gave you an overview of what Machine Learning is, what type of problems it is aiming at saving and how those problems are solved.

In the next entry, I’ll try to give you an idea of what more realistic Machine Learning problems look like by adding different elements to the example we have here (e.g. number of variables, complexity of the model, splitting sample data into training and test sets, etc.).

UPDATE:  See part 2 of this article.

Leave a comment