# Tag Archives: Machine Learning

Machine Learning theory & application, Azure ML & Cognitive Services.

# Hyperspheres & the curse of dimensionality

I previously talked about the curse of dimensionality (more than 2 years ago) related to Machine Learning.

Here I wanted to discuss it in more depth and dive into the mathematics of it.

High dimensions might sound like Physics’ string theory where our universe is made of more than 4 dimensions.  This isn’t what we are talking about here.

The curse of dimensionality is related to what happens when a model deals with a data space with dimensions in the hundreds or thousands.

As the title of this article suggests, we’re going to take the angle of the properties of Hyperspheres (spheres in N dimensions) to explore high dimension properties.

## Why should I care about High Dimension?

When introducing Machine Learning concepts, we typically use few dimensions in order to help visualization.  For instance, when I introduced linear regression or polynomial regression in past articles, I used datasets in two dimensions and plot them on a chart.

In the real world, typical data sets have much more dimensions.

A typical case of high dimension is image recognition (or character recognition as a sub category) where even a low resolution pictures will have hundreds of pixels.  The corresponding model would take gray-scale input vector of dimension 100+.

With fraud detection, transactions do not contain only the value of the transaction, but the time of day, day of week, geo-location, type of commerce, type of products, etc.  .  This might or might not be a high dimension problem, depending on the available data.

In an e-commerce web site, a Product recommendation algorithm could be as simple as an N x N matrix of 0 to 1 values where N is the number of products.

With IoT, multiple sensors feed a prediction model.

In bioinformatics, DNA sequencing generates a huge amount of data which often is arranged in high dimensional model.

Basically, high dimensions crop up everywhere.

## What happens as dimension increases?

For starter a space with more dimensions simply is…  bigger.  In order to sample a space with 2 dimensions with a resolution of 10 units, we need to have 10^2 = 100 points.  Having the same sampling in a space of dimension 3 would require 10^3 = 1000 points.  Dimension 20?  20 would require 10^20 = 100 000 000 000 000 000 000 points.

Right off the bat we can tell that sampling the space of dimension 2 & 3 is realistic while for a space of dimension 20, it’s unlikely.  Hence we are likely going to suffer from under-sampling.

Yoshua Bengio has a nice discussion about Curse of Dimensionality here.

## Hypersphere in a cube

Beyond sampling problems, metrics & measures change behaviour at high dimensions.  Intuitively it makes sense since a measure takes a vector (vectors) and squeeze it (them) into a numerical value ; the higher the dimension, the more data we squeeze into one number & hence we should lose information.

We use metrics & measures heavily in Machine Learning.  For instance, a lot of cost (or loss) functions are based on Euclidean’s distance:

$dist(x,y) = \displaystyle\sum_{i=1}^N (x_i-y_i)^2$

Now if x and / or y are random variables (e.g. samples), the law of large numbers applies when N becomes large.  This implies the sum will trend to the expected value with a narrower standard deviation as N increases.  In turns, this means there is less and less information in the distance as the number of dimensions increases.

This brings us to the hypersphere.  An hypersphere’s equation is

$\displaystyle\sum_{i=1}^N x_i^2 = R^2$

where x is a point of dimension N and R is the radius of the hypersphere.

An hypersphere of dimension 1 is a line, an hypersphere of dimension 2 is a circle, dimension 3 is a sphere, dimension 4 is an…  expending universe?  and so on.

A theorem I’ll demonstrate in a future article is that the volume of an hypersphere of radius 1 tends to zero as the dimension increases.

UPDATE (12-07-2017):  Demonstration of hypersphere hyper volume is done in this article.

This is fairly unintuitive, so let me give real numerical values:

Dimension Hyper Volume
1 2
2 3.141592654
3 4.188790205
4 4.934802201
5 5.263789014
6 5.16771278
7 4.72476597
8 4.058712126
9 3.298508903
10 2.55016404
11 1.884103879
12 1.335262769
13 0.910628755
14 0.599264529
15 0.381443281
16 0.23533063
17 0.140981107
18 0.082145887
19 0.046621601
20 0.025806891
21 0.01394915
22 0.007370431
23 0.003810656
24 0.001929574
25 0.000957722
26 0.000466303
27 0.000222872
28 0.000104638

If we plot those values:

We see the hyper volume increases in the first couple of dimensions.  A circle of radius 1 has an area of pi (3.1416) while a sphere of radius 1 has a volume of 4.19.  It peaks at dimension 5 and then shrinks.

It is unintuitive because in 2 and 3 dimensions (the only dimensions in which we can visualize an hypersphere), the hypersphere pretty much fills its embedding cube.  A way to “visualize” what’s happening in higher dimension is to consider a “diagonal” into an hypersphere.

For a circle, the diagonal (i.e. 45’) intersects with the unit circle at

$(\frac {1} {\sqrt {2}}, \frac {1} {\sqrt {2}})$ since $(\frac {1} {\sqrt {2}})^2 + (\frac {1} {\sqrt {2}})^2 = 1^2$

In general, at dimension N, the diagonal intersects at

$x_i = \frac {1} {\sqrt {N}}$

So, despite the hypersphere of radius 1 touches the cube of side 2 centered at the origin on each of its walls, the surface of the hypersphere, in general, gets further and further away from the cube surface as the dimension increases.

## Consequences of the hypersphere volume

A straightforward consequence of the hypersphere volume is sampling.  Randomly sampling a square of side 2 centered at the origin will land points within the unit circle with probability $\frac{\pi}{4} = \%79$.  The same process with an hypersphere of dimension 8 would hit the inside of the hypersphere with a probability of %1.6.

A corollary to the hypersphere volume is that at higher dimension, the bulk of the volume of the hypersphere is concentrated in a thin annulus below its surface.  An obvious consequence of that is that optimizing a metric (i.e. a distance) in high dimension is difficult.

## What should we do about it?

First step is to be aware of it.

A symptom of high dimensionality is under sampling:  the space covered is so large the number of sample points required to learn the underlying model are likely to be over the actual sample set’s size.

The simplest solution is to avoid high dimensionality with some pre-processing.  For instance, if we have a priori knowledge of the domain, we might be able to combine dimensions together.  For example, in an IoT field with 10 000 sensors, for many reasons, including curse of dimensionality, it wouldn’t be a good idea to consider each sensor inputs as an independent input.  It would be worth trying to aggregate out sensor inputs by analyzing the data.

## Summary

Some Machine Learning algorithms will be more sensitive to higher dimensionality than others but the curse of dimensionality affects most algorithms.

It is a problem to be aware of and we should be ready to mitigate it with some good feature engineering.

# Training a model to predict failures

Today a quick entry to talk about a twist on Machine Learning for the predictive maintenance problem.

The Microsoft Cortana Intelligence team wrote an interesting blog the other day:  Evaluating Failure Prediction Models for Predictive Maintenance.

When you listen to all the buzz around Machine Learning, it sometimes feels as if we’ve solved all the ML problems and you just need to point your engine in the general direction of your data for it to spew some insights.  Not yet.

That article highlights a challenge with predictive maintenance.  Predictive maintenance is about predicting when a failure (of a device, hardware, process, etc.) will occur.  But…  you typically have very few failures:  your data set is completely imbalanced between success samples and failure samples.

If you go heads on and train a model with your data set, a likely outcome is that your model will predict success %100 of the time.  This is because it doesn’t get penalized that much by ignoring the few failures and gets rewarded by not missing any success.

Remember that Machine Learning is about optimizing a cost function over a sample set (training set).  I like to draw the comparison with humans on a social system with metrics (e.g. bonus in a job, laws in society, etc.):  humans will find any loophole in a metric in order to maximize it and rip the reward.  So does Machine Learning.  You therefore have to be on the lookout for those loopholes.

With predictive maintenance, failures often cost a lot of money, sometimes more than a false positive (i.e. a non-failure identified as a failure).  For this reason, you don’t want to miss failures:  you want to compensate for the imbalance in your data set.

The article, which I suggest you read in full when you’ll need to apply it, suggests 3 ways to compensate for the lack of failures in your data set:

1. You can resample the failures to increase their occurrence ; for instance by using the Synthetic Minority Oversampling Technique (SMOTE) readily available in Azure ML.
2. Tune the hyper-parameters of your model (e.g. by using a parameter sweep, also readily available in Azure ML) to optimize for recall.
3. Change the metric to penalize false negative more than false positive.

This is just one of the many twists you have to think of when creating a predictive model.

My basic advice:  make sure you not only ask the right question but with the right carrot & stick   If a failure, to you, costs more than a success identified as a failure (i.e. a false positive), then factor this in to your model.

# How to do Data Science

These days, it’s all about Data Science.

What is Data Science?

Last month Brandon Rohrer, from the Cortana Intelligence and Machine Learning Blog, came up with an excellent post.

The post basically goes over the workflow I reproduced at the right here.

It goes through all the motions you need to go through while analysing data, from fixing an objective (asking a question), formatting the data to manipulating the data and finally throw that to some Machine Learning.

It is something of a badly kept secret that %95 of Machine Learning is Yak Shaving around massaging data.  This is why I like the terminology Data Science because it does includes all the data manipulation you need to do to feed the beast.

So I strongly encourage you to read the post on https://blogs.technet.microsoft.com/machinelearning/2016/03/28/how-to-do-data-science/

# What is Statistics and why should you care?

Unless you graduated in art, chances are you did a course in Statistics.

Chances are you hated it.

Most people I know postponed that course until the end of their degree, didn’t understand much about it and hated it dearly.

I didn’t like it either and understood very little.

A few years later when I studied Machine Learning, I had to review Statistics on my own.  This is when I had the epiphany:  Wow!  This is actually not so complicated and can even be quite interesting!

There are two main reasons I hated my undergraduate course:

• Examples were all around surveys:  I was studying physics at the time, I didn’t care about those
• It was really geared towards a collection of recipes:  I love mathematics, elegant theories and understanding what I do, cheat sheets didn’t do it for me

I would like to share my epiphany from back then with you today.  Hopefully it will shade some light on the poorly understood topic of statistics.

This won’t be a deep dive in the science of statistics.  I want to explain what statistics is by capturing where it comes from and giving very simple examples.  I won’t make statisticians out of you today.  Sorry.

## Layer Cake

I see statistics as a layer cake.  At the foundation we have combinatorics, then probability and finally at the top, we have statistics.

There lies one of the mistake, in my own opinion, of most statistics course:  they try to get statistics into your head without explaining the two colossus of science it is based on.

Try to explain calculus to somebody who has never seen f(x) = mx + b in 5 minutes and you won’t enlighten them either.

So let’s walk the layer cake from the bottom to the top.

## Combinatorics

Combinatoricsbranch of mathematics studying finite or countable discrete structures (Wikipedia).

Combinatorics is about counting stuff.  Counting elements in sets (cardinality) and then combining sets together.

You’ve done combinatorics, but you don’t remember, do you?  Let me jog your memory.

Ok, let’s say I have the set A = {1, 2, 3}.  How many pairs can I do with elements of A?  How many trios?  What if order is irrelevant and I just want to know the possible distinct pairs?

Yes, you’ve done that type of problems.  This is where you’ve learned a new meaning for the exclamation mark, i.e. the factorial:  n! = n x (n-1)! (with 0! = 1).

Let’s start an example I’ll carry over in the rest of the article.  Let’s say I have a die with six faces.  The set of possible outcome if I throw it is D = {1, 2, 3, 4, 5, 6}.

How many elements in D?  Six.  Not too hard?  Well, the point isn’t to be hard here.

You can get into quite complicated problems in combinatorics.  I remember an exam question where we had drawers filled with infinite amount of marbles having different colours and we had to mix them together…  that was quite fun.

A good example is the Rubik’s cube (from the Hungarian mathematician Ernő Rubik).  A Rubik cube has 6 faces, each having 9 squares.  6 colours, with 9 squares of each colours, 36 squares in total.  What is the number of possible configurations?  Are some configuration impossible given the physical constraints of the cube?

## Probability

Probabilitymeasure of the likelihood that an event will occur.  Probability is quantified as a number between 0 and 1 where 0 indicates impossibility and 1 indicates certainty. (Wikipedia)

The canonical example is the toss of a coin.  Head is 1/2 ; tail also is 1/2.

What is an event?  An event is an element of the set of all possible event.  An event occurrence is random.

A special category of event is of great interest:  equipossible events.  Those are events which all have the same chance of occurence.  My coin tossing is like that.  So is my 6-faced die…  if it hasn’t been tempered with.

For those, we have a direct link with combinatorics:

$P(event) = \frac{1}{\#events}$

The probability of an event is one over the number of possible events.  Let’s come back to my die example:

$P(1) =\frac{1}{\#D}=\frac{1}{6}$

The probability to get a 1 is 1/6 since #D, the cardinality of D (the number of elements in D), is 6.  Same for all the other events, i.e. 2, 3, 4, 5, 6.

So you see the link with combinatorics?  You always compare things that you’ve counted.

If events aren’t equipossible, you transform your problem until they are.  This is where all the fun resides.

Now, as with combinatorics, you can go to town with probability when you start combining events, conditioning events and…  if you start getting into the objective versus subjective (Bayesian) interpretations.  But again, my goal isn’t to deep dive but just to illustrate what probability is.

## Statistics

Statisticsstudy of the collection, analysis, interpretation, presentation, and organization of data (Wikipedia).

I find that definition quite broad as plotting graph becomes statistics.

I prefer to think of statistics as a special class of probability.  In probability we define models by defining sets of events and the likelihood of events to occur ; in statistics we take samples and check how likely they are according to the model.

A sample is simply a real world trial:  throwing a die, tossing a coin, picking an individual from a population, etc.  .

For instance, given my 6 faces die, let’s say I throw it once and I get a 2.  How confident are we that my die is equipossible with 1/6 probability for each face?

Well…  it’s hard to tell, isn’t it?  A lot of probability model could have given that outcome.  Anything where 2 has a non-zero probability really.

What if I throw twice and get a 2 each time?  Well…  that is possible.  What about three 2?  What about 10?  You’ll start to get skeptical about my die being equipossible as we go, won’t you?

Statistics allow you to quantify that.

When you hear about confidence interval around a survey, that’s exactly what that is about.

If I take my sequence of ‘2’ in my die throwing experiment, we can quantify it this way.  Throwing a ‘2’ has a probability of 1/6, which is the same probability for any result.  Now having the same result n times has a probability $P_{same} = (\frac{1}{6})^{n-1}$ while having a sequence of non-repeating results has probability of $P_{different} = (\frac{5}{6})^{n-1}$.  So as n increases, it gets less and less likely to have a sequence of the same result.  You can set a threshold and take a decision if you believe the underlying model or not ; in this case, if my die is or not a fair one.

## Why should you care?

Statistics is at the core of many empirical sciences.  We use statistics to test theories, to test models.

Also, those three areas of mathematics (i.e. combinatorics, probability & statistics) spawn off into other theories.

For example, information theory is based on probability.  That theory in turn helps us understand signal processing & data compression.

Machine Learning can be interpreted as statistics.  In Machine Learning, we define a class of statistical model and then look at samples (training set) to find the best model fitting that sample set.

Moreover, those area of mathematics allow us to quantify observations.  This is key.  In Data Science, you take a volume of data and you try to make it talk.  Statistics help you do that.

When you take a data set and observe some characteristics (e.g. correlation, dependency, etc.), one of the first thing you’ll want to validate is the good old “is it statistically significant”?  This is basically figuring out if you could observe those characteristics by chance or are they a real characteristic of the data?  For instance, if I look at cars on the freeway and I observe a blue car then a red car a few times, is that chance or is there enough occurrence to think there is a real pattern?

So if you are an executive, you should care about statistics to go beyond just looking (visualizing) the data of your business and understanding, at least at a high level, what type of models and assumptions your data scientists are making on your data.  Are you training models to learn trend in your business?  If so, what are the models look like and how do they perform in terms of prediction?

If you are a developer / architect, you should care about statistics for two big reasons.  First, you are probably instrumental in taking decision on what type of data you collect from the application and at which frequency (e.g. telemetry).  If you log the number of users logged in once a day, your data scientists will have a hard time extracting information from that data.  The second reason is that you are likely going to use data to display and, more and more, to have your application take decision.  Try to understand the data and the models used for decision making.

We live in a world of data abundancy.  Data is spewing from every device & every server.  It is easy to see features in data that are simply noise or do not see feature because they aren’t visible when you visualize data.  Statistics is the key to your data vault.

## Summary

I hope my article was more insightful than the statistic classes you remember.

Basically, combinatorics studies countable sets.  Probability uses combinatorics to assign probability (value between 0 & 1) to events.  Statistics takes sample and compare them to probability models.

Those fields of study have massive influence in many other fields.  They are key in Machine Learning and Data Science in general.

# Where is the statistics in Machine Learning?

I often try to explain what Machine Learning is to people outside the field.  I’m not always good at it but I am getting better.

One of the confusion I often get when I start to elaborate the details is the presence of statistics in Machine Learning.  For people outside the field, statistics are the stuff of survey or their mandatory science class (everyone I know outside of the Mathematics field who had a mandatory Statistic class in their first Uni year ended up postponing it until the end of their undergraduate studies!).

It feels a bit surprising that statistics would get involved while a computer is learning.

So let’s talk about it here.

Back in the days there was two big tracks of Artificial Intelligence.  One track was about codifying our knowledge and then run the code on a Machine.  The other track was about letting the machine learn from the data ; that is Machine Learning in a nutshell.

The first approach has its merit and had a long string of successes such as Expert Systems based on logic and symbolic paradigms.

The best example of those I always give is in vaccine medicine.  The first time I went abroad in tropical countries, in 2000, I went to a vaccine clinic and met with a doctor.  The guy asked me where I was going, consulted a big reference manual, probably used quite a bit of his knowledge and prescribed 3 vaccines his nurse administered to me.  In 2003, when I came back to Canada, I had a similar experience.  In 2009…  nope.  Then I met with a nurse.  She sat behind a computer and asked me similar questions, punching the answer in the software.  The software then prescribed me 7 vaccines to take (yes, it was a more dangerous place to go and I got sick anyway, but that’s another story).

That was an expert system.  Somebody sat with experts and asked them how they proceeded when making a vaccine prescriptions.  They took a lot of notes and codify this into a complex software program.

Expert systems are great at systematizing repetitive tasks requiring a lot of knowledge that can ultimately be described.

If I would ask you to describe me how you differentiate a male from a female face while looking at a photograph, it might get quite hard to codify.  Actually, if you would answer me at all, you would probably give a bad recipe since you do that unconsciously and you aren’t aware of %20 of what your brain does to take the decision.

For those types of problems, Machine Learning tends to do better.

A great example of this is how much better Machine Learning is at solving translation problems.  A turning point was the use of Canadian Hansards (transcripts of Parliamentary Debates in both French & English) to train machines how to translate between the two languages.

Some people, e.g. Chomsky, opposes Machine Learning as it actually hides the codification of knowledge.  Actually there is an interesting field developing to use machines to build explanation out of complicated mathematical proofs.

Nevertheless, why is there statistics in Machine Learning?

When I wrote posts about Machine Learning to explain the basics, I gave an example of linear regression on a 2D data sets.

I glossed over the fact that the data, although vaguely linear, wasn’t a perfect line, far from it.  Nevertheless, the model we used was a line.

This is where a statistical interpretation can come into play.

Basically, we can interpret the data in many ways.  We could say that the data is linear but there were errors in measurements.  We could say that there are much more variables involves that aren’t part of the dataset but we’ll consider those as random since we do not know them.

Both explanation links to statistics and probability.  In the first one we suppose the errors are randomly distributed and explain deviation from linear distribution.  In the second we assume missing data (variables) that if present would make the dataset deterministic but in its absence we model as a random distribution.

Actually, there is a long held debate in physics around the interpretation of Quantum Mechanics that oppose those two explanations.

In our case, the important point is that when we build a machine learning model out of a data set, we assume our model will predict the expectation (or average if you will) of an underlying distribution.

Machine Learning & Statistics are two sides of the same coin.  Unfortunately, they were born from quite different scientific disciplines, have different culture and vocabularies that entertain confusion to this day.  A good article explaining those differences can be found here.

# Strong AI & Existential Risks

There has been a recrudescence of hysterical talks about Strong Artificial Intelligence (AI) lately.

Strong AI is artificial intelligence matching and eventually going beyond the full human cognitive capacity.  Weak AI, by opposition, is the replication of some facets of human cognition:  face recognition, voice recognition, pattern matching, etc.  .

The ultimate goal of AI is to get to what researchers call Artificial General Intelligence, i.e. an AI that could duplicate human cognitive capacity in every aspects, including creativity and social behaviour.

Today, what we have is weak AI.  For some tasks, weak AI perform better than humans already though.  For instance, the Bing Image team has observed that for classifying images in multiple categories (thousands), AI performed better (did less errors) than their human counterparts.  For a low amount of categories, humans are still better.  IBM Deep Blue was able to beat the world chess champion, IBM Watson is able to win at Jeopardy, etc.  .

But we are still far from Artificial General Intelligence.  Machines lack social aptitudes, creativity, etc.  .  Nevertheless they are evolving at a rapid pace.

You might have read about the triumvirate of Elon Musk, Stephen Hawking and Bill Gates warning the world of the danger of strong AI.

Those declarations made quite some noise as those three gentlemen are bright and sober people you do not see wearing tin foil hats on national TV.  They also are super stars of their field.

Although their arguments are valid, I found the entire line of thoughts to be a bit vague and unfocussed to be compelling.

I compare this to when Hawking has been warning us about toning down the volume of our space probe messages in order not to attract malevolent aliens.  There was some merits to the argument I suppose, but it wasn’t well developed in my own opinion.

For strong AI, I found a much more solid argument at .

Ah, TED…  What did I do before I could get my 20 minutes fixes with you?  Thank you Chris Anderson!

Anyway, there was Nick Bostrom, a Swedish philosopher which came to discuss What happens when our computers get smarter than we are?

Right off the bat, Bostrom makes you think.  He shows a photo of Milton from Office Space and mentions “this is the normal ways things are”, you can’t help but reflect on a few things.

Current office work / landscape is transitional.  There has been office workers since the late 19th century / early 20th century, as Kafka has depicted so vividly.  But the only constant in office work has been change.

In the time of Kafka, computers meant people carrying out computations!

Early in the banking career of my father, he spent the last Thursday night of every month at the office along a bunch of other workers computing the interests on the bank accounts!  They were using Abacus and paper since they didn’t have calculators yet.  You can imagine the constraints that put on the number of products the banks were able to sell back in those days.

I’ve been in IT for 20 years and about every 3 years my day to day job is redefined.  I’m still a solution architect, but the way I do my work, I spend my time, is different.  IT is of course extreme in that respect but the point is, office work has been changing since it started.

This is what you think about when you look at the photo Bostrom showed us.  This is what office work looked like in the 1990’s and already is an obsolete model.

The point of Bostrom was also that we can’t take today’s office reality as an anchor on how it will be in the future.  He then goes on to define existential risks (a risk that, if realised, would cause humanity extinction or drastic drop in the quality of life of human beings) and how Strong AI could pose one.  He spends a little time debunking typical arguments about how we could easily contain a strong AI.  You can tell it is a simplification of his philosophical work for a big audience.

All of this is done in a very structured and compelling way.

Parts of his argument resides in the observation that in the evolution of AI from Weak AI to Artificial General Intelligence to Strong AI, there is no train stop.  Once AI reaches our level it will be able to evolve on its own, without the help of human researchers.  It will continue to a Superintelligence, an AI more capable than humans in most ways.

His conclusion isn’t that we should burn computers and go back to compute on paper but that we should prepare for a strong AI and work hard on how we will instruct it.  One can’t help to think back to Isaac Asimov 3 laws of robotics which were created, in the fictional world, for the exact same purpose.

He casts Strong AI as an optimization machine, which is how we defined Machine Learning at the end of a previous post.  His argument is that since AI will optimize whatever problem we give it, we should think hard about how we defined the problem so that we don’t end up being optimized out!

Nick Bostrom actually joined Elon Musk, Stephen Hawking and others in signing the open letter of the Future of Life Institute.  That institution is doing exactly what he proposed in the Ted Talk:  research on how to mitigate the existential risk posed by Strong AI.

If you are interested by his line of thoughts, you can read his book Superintelligence: Paths, Dangers, Strategies, where those arguments are developed more fully.

As mentionned above, I found this talk gives a much better shape to the existential risk posed by super intelligence argument we hear these days.

Personally, I remain optimistic regarding an existential risk posed by Strong AI, probably because I remain pessimistic for its realisation in a near future.  I think our lives are going to be transformed by Artificial Intelligence.  Super Intelligence might not occur in my life time, but pervasive AI definitely will.  It will assist us in many tasks and will gradually take off tasks.  For instance, you can see with the like of Skype Translator that translation will, in time, be done largely by machines.

Projections predict a huge shortage of software engineer.  For me this is an illusion.  What it means is that tasks currently requiring software engineering skills will become more an more common in the near future.  Since there aren’t enough software engineer around to do them, we will adapt the technology so that people with a moderate computer literacy background can perform them.  Look at the like of Power BI:  it empowers people with moderate computer skills to do analysis previously requiring BI experts.  Cortana Analytics Suite promises to take it even further.

I think we’ll integrate computers and AI more and more in our life & our work.  It will become invisible and part of us.

I think there are 3 outcomes of Strong AI:

1. We are doom, Terminator 2 type of outcome
2. We follow Nick Bostrom and make sure Strong AI won’t cause our downfall
3. We integrate with AI in such a way there won’t be a revolution but just a transition

Intuitively I would favor the third option as the more likely.  It’s a sort of transhumanism, where humanity will gradually integrate different technologies to enhance our capacities and condition.  Humanity would transition to a sort of new species.

All depends on how fast the Strong AI comes I suppose.

Ok, enough science-fiction speculation for today!  AI today is weak AI, getting better than us at very specific tasks, otherwise, we still rule!

Watch the TED talk, it’s worth it!

# Free ebook: Azure Machine Learning

You’re into Machine Learning, got into Azure ML, looked at my couple of blogs about it and want to take it to the next level?

Microsoft released an eBook for that exact purpose:

Free ebook: Azure Machine Learning (Microsoft Azure Essentials)

That book is targeted at people who want to get better with Azure ML tool:  developer, data scientist, data analyst, etc.  .  It goes into a bit of conceptual (Machine Learning in general) and quickly dive into practical examples with step by step and screen shots.

The book is available in ePub (ideal for commuting!), PDF & Mobi.

Here are the chapters:

• Chapter 1, “Introduction to the science of data,” shows how Azure Machine Learning represents a critical step forward in democratizing data science by making available a fully-managed cloud service for building predictive analytics solutions.
• Chapter 2, “Getting started with Azure Machine Learning,” covers the basic concepts behind the science and methodology of predictive analytics.
• Chapter 3, “Using Azure ML Studio,” explores the basic fundamentals of Azure Machine Learning Studio and helps you get started on your path towards data science greatness.
• Chapter 4, “Creating Azure ML client and server applications.” expands on a working Azure Machine Learning predictive model and explores the types of client and server applications that you can create to consume Azure Machine Learning web services.
• Chapter 5, “Regression analytics,” takes a deeper look at some of the more advanced machine learning algorithms that are exposed in Azure ML Studio.
• Chapter 6, “Cluster analytics,” explores scenarios where the machine conducts its own analysis on the dataset, determines relationships, infers logical groupings, and generally attempts to make sense of chaos by literally determining the forests from the trees.
• Chapter 7, “The Azure ML Matchbox recommender,” explains one of the most powerful and pervasive implementations of predictive analytics in use today on the web today and how it is crucial to success in many consumer industries.
• Chapter 8, “Retraining Azure ML models,” explores the mechanisms for incorporating “continuous learning” into the workflow for our predictive models.

Enjoy!